Skip to main content
Intended for healthcare professionals
Free access
Review article
First published online February 2, 2012

Recent advances in the utility and use of the General Practice Research Database as an example of a UK Primary Care Data resource

Abstract

Since its inception in the mid-1980s, the General Practice Research Database (GPRD) has undergone many changes but remains the largest validated and most utilised primary care database in the UK. Its use in pharmacoepidemiology stretches back many years with now over 800 original research papers. Administered by the Medicines and Healthcare products Regulatory Agency since 2001, the last 5 years have seen a rebuild of the database processing system enhancing access to the data, and a concomitant push towards broadening the applications of the database. New methodologies including real-world harm–benefit assessment, pharmacogenetic studies and pragmatic randomised controlled trials within the database are being implemented. A substantive and unique linkage program (using a trusted third party) has enabled access to secondary care data and disease-specific registry data as well as socio-economic data and death registration data. The utility of anonymised free text accessed in a safe and appropriate manner is being explored using simple and more complex techniques such as natural language processing.

Introduction

The history of the General Practice Research Database (GPRD) has been described elsewhere [Parkinson et al. 2007; Wood and Coulson, 2001], and the various versions of the database can be tracked through this literature. A number of reviews throughout its history have described GPRD research in the area of pharmacoepidemiology, epidemiology, public health and clinical outcomes research [Lawson et al. 1998; Wood and Martinez, 2004]. Few of these have addressed any aspects of organising and maintaining such a data set or the role of an organisation with whom the responsibility of managing such a resource lays. These issues are however critical in terms of providing a vehicle for undertaking clinical research, and are applicable to all primary care data sets available in the UK. This review looks at some issues related to these questions as well as considering current research using GPRD. It also describes some of the areas of novel research being undertaken within or in collaboration with the GPRD organisation.
The GPRD evolved from an early general practice information system. Free computing equipment was supplied in return for standardised data recording and provision of patient-level data into a central database to be used for health research [Whalley and Mantgani, 1997]. Over 30 years later this concept has developed into the largest and most utilised verified primary care database in the UK and arguably the world. As of March 2011, it contains records from over 12 million patients contributing 64 million person years of prospectively recorded high-quality primary healthcare data. During its lifetime, it has evolved both in terms of data, technology and its interface with researchers.
The UK healthcare system involves a two-level service with a gatekeeper. Primary care is the frontline for healthcare. Secondary care (predominantly hospital-based care) provides healthcare within the context of this, either at the request of primary care, or directly but with full disclosure to the primary carer. Nearly all individuals in the UK are registered with a primary care physician (general practitioner [GP]) who oversees that patient’s healthcare and acts as the gatekeeper to the National Health Service (NHS). Thus prospective follow up of individuals is possible via the healthcare records of the GP, and it is for this reason that primary care databases offer such opportunities for research [Lawrenson et al. 1999; de Lusignan and van Weel, 2006].

Primary care data

Source data for primary care research databases is generated via GP systems themselves, and download software that collects data from practice servers. All GP systems use a combination of coded and free text data (including letters and emails) to record healthcare data although the balance between coded and free text data can differ between systems. Systems are flexible in terms of the ways in which data can be recorded and, in the UK, use Read terms to code healthcare information. A migration to SNOMED CT (Systematized Nomenclature of Medicine Clinical Terms) has been planned, but has not yet occurred. Data recorded is both clinical and administrative, with entire Read term chapters for recording administrative and procedural details of patient care.
The GPRD data is collected from the Vision GP system, supplied to GPs by In Practice Systems Ltd (INPS). Vision is a relatively highly coded system based around clinical ‘entities’. An entity is a specific piece of information ranging from a medical history note or a diagnosis to a mental state questionnaire score. Coded data is available on clinical history, diagnoses, signs and symptoms, prescribing of drugs and devices, test results, referrals to secondary care and non-practice-based primary care services, immunisations and lifestyle factors. There is also a range of additional data where more detailed information can be entered for a particular predefined event type via a specifically designed structured data area. For example, a smoking habit of 10–19 cigarettes per day may be recorded using a Read code (‘1374.’) or using a specific structured data area for smoking status where individual data items can be recorded for smoking status, quantity smoked and dates of starting and stopping smoking.
Free text information is a common feature of all GP systems and is utilised to varying degrees. Within Vision, it is always linked to specific coded data as part of the record. The balance between coded data and free text is largely dependent upon the coding style of the user but system design has an effect on this balance. Free text can be classified into two broad types: short notes and annotations, and healthcare provider communications. The former are often annotations to the Read term. These types of annotations may only have specific meaning in the context of the coded term. They frequently contain standard and nonstandard abbreviations, are entered manually and can contain misspellings and typographical errors. The healthcare provider communications relate to letters and emails sent to and from the GP. These include referral letters, hospital discharge letters and other communications. Where these communications are electronic, the text of the letter is retained in full in the free text data; however, where they are on paper they are either scanned in using text recognition scanning software, or scanned in as images linked to consultations. Key information from communications such as diagnoses is often entered in as coded data abstracted from the document. These longer forms of free text are generally more grammatically constructed and can be understood in isolation from the coded data to which they are linked.
The major GP systems in the UK include EMIS (Egton Medical Information System) System-One and Vision. All of these systems use essentially similar data models for primary care data with individual healthcare ‘events’ being stored as date-stamped coded records with a potential free text component. Some differences in organising this information exist and system design encourages slightly different recording styles, however the baseline data is similar. The implications for research associated with these differences are likely to be minimal as well-designed methodologies avoid making assumptions about recording strategies and focuses on the raw data whilst attempting to mitigate against the effects of potential recording anomalies.

Research using primary care databases

Primary care databases have been used extensively for research, and remain a major focus of observational data research where prospective healthcare records are required. The range of data sources is very wide, from groups of co-operating practices at the smaller geographical level to larger networks of practices participating in project based data extractions on a broader geographical (but not national) scale [de Lusignan et al. 2006]. These groups may often be focused on a particular clinical area, and to a certain extent can modify recording practice in these areas. Aside from GPRD, in terms of national or UK wide databases there are a handful of resources providing data for research purposes and in some cases market research purposes. These include non-commercial databases such as Q-Research [Hippisley-Cox et al. 2004] and DIN-Link [Carey et al. 2004]; and commercial databases such as UK IMS Disease Analyzer and The Health Improvement Network (THIN) [Bourke et al. 2004]. GPRD is the most established, and has the greatest track record in research and validation. The GPRD bibliography [GPRD, 2011] references over 850 peer-reviewed publications relating to GPRD, with over 800 of these original research papers. Such a track record has positive implications in terms of data quality, experience of use in observational research and appropriate management of the resource. GPRD data is available to both the commercial and academic sector but it is not available for market research.
GPRD as an organisation has an operational division, which creates and manages access to the data resource, and a research arm. The research arm of GPRD represents an expert unit in terms of the data resource as it relates to research. This remit encompasses an understanding of the data in terms of the information it holds, its provenance and how it relates to recording in general practice, and an appreciation of its temporal complexity and how that relates to proposed research. The GPRD research group provides it a high level of expertise in observational data analysis and epidemiology in general, especially relating to the strengths, limitations and general idiosyncrasies of the data. This expertise is made available to clients through workshops and more individual contact relating to study advice, as well as through contracted services.
The GPRD operates on a not-for-profit basis within the terms of the MHRA’s trading fund status [Wood and Coulson, 2001]. Previously, the cost of access to the data has proved a barrier to the extensive use of the data within academia. To address this and to broaden the use of the data a joint venture with the UK Medical Research Council (MRC) was launched in November 2005. This enabled access to the data for UK academic groups working on noncommercial projects, at no cost. During the term of this MRC scheme the number of academic protocols increased by 103.5%. In the same years, total numbers of submitted protocols increased by 99% in overall terms. In the years 2002 to 2004 the number of submitted protocols was split with 54.2% and 35.9% from academic- and industry-based organisations, respectively. From 2005 to 2010 the figure had changed to 60.3% and 23.5%. After its termination, the accessibility to the data for academic organisations was continued through the development of a risk sharing license giving access to the entire data set at a reduced cost under certain conditions.
GPRD aims to broker and undertake high-quality research, the control of which is mediated is via an Independent Scientific Advisory Committee (ISAC) that assesses protocols submitted in terms of their ethical and scientific merit and also their feasibility. This is a committee set up under NHS appointment commission terms which features scientists with statistical, epidemiological and specific GPRD-related expertise and also lay members. The GPRD Group has obtained ethical approval from a Research Ethics Committee (REC) for all purely observational research using GPRD data; namely, studies which do not include patient involvement, and as part of their assessment ISAC may recommend that study-specific REC approval is sought if ethical issues arise in relation to an individual study. Separate REC approval is required for any study which includes any form of direct patient involvement.

Data quality

GPRD has historically undertaken a set of internal data quality measurements in an effort to ensure high-quality data within its subset of UK practices. This data quality assessment is undertaken at the patient level and at the practice level. The practice-level quality assessment is manifested by the practice ‘up-to-standard’ (UTS) date and the patient quality level by a patient acceptability flag.
Patients are labelled as ‘acceptable’ for use in research by a process that identifies and excludes patients with noncontiguous follow up or patients with poor data recording that raises suspicion as to the validity of that patient’s record. Patient data is checked for the issues listed in Table 1. If any of these data values are found then the patient is labelled unacceptable, and is not recommended for use in research. The data however remains in the database. The process is broadly unchanged since the advent of the GPRD Gold system (see the next section). No specific validation work has been conducted on this method as much of it is based on logical inconsistency of the registration data. The breakdown of the acceptability status shows us that of the 11.89% of unacceptable patients, 10.44% are temporary patients and only 1.45% are unacceptable due to ‘inconsistent’ registration data.
Table 1. Data items showing unacceptable values that question the validity of a patient’s record.
Data itemUnacceptable value
First registration dateEmpty; invalid date; prior to year of birth
Year of birthMissing
Transferred out datePresent with no reason; prior to first registration date; prior to current registration date
A transferred out reasonPresent with no date
Current registration datePrior to first registration date; prior to year of birth
GenderOther than male, female or indeterminate
AgeOver 115 years at end of follow up
Registration statusOther than applied or permanent
Historically the UTS date was developed to measure adherence to recording guidelines provided with the VM system and subsequently the Vision system when it was introduced after 1995. The UTS process was developed in the data set derived from the older VAMP system involving ten data quality parameters. Each parameter generated an earliest date at which it is acceptable, and the UTS date was set to the earliest date at which nine of these were acceptable, with the exception of the mortality parameter which was mandatory. In the new Vision-based data set, these processes were ported to the new data as clearly as possible given a new more complex data structure. Having applied the UTS parameters to Vision data for a number of years, we were able to identify the key parameters which were instrumental in determining the actual UTS date. In its current form the UTS date is based on two central concepts: assurance of continuity in data recording, and mortality rate compared with an expected range. Monitoring of mortality rate allows us to identify the point at which previously registered patients have been deleted from the system.
Recording practices and GP systems in the post-2000 era represent a vastly different situation to the one which preceded it involving early GP systems such as VM. Today’s accredited systems are more complex but simpler to use and enable capture of a much richer data set than previous systems. Recording practices have changed enormously with nearly all healthcare episodes being recorded electronically and often according to incentivised regulations in certain areas with Quality Outcome Framework (QOF) rules [Vamos et al. 2011]. This increased complexity raises issues in terms of using the data appropriately for research. For example, such changes render the original UTS parameters redundant to a degree. There is a need to conduct a scientific research-based assessment of primary care data sources in general, specifically for the purpose of characterising more accurately the strengths and weaknesses of the available data itself. A collaboration involving GPRD has recently been undertaken to develop such data quality parameters, initial pilot phase results exploring baseline parameters across the data [Tate et al. 2011].
GPRD has a long history of validation studies which has been recently reviewed [Herret et al. 2010]. This study reviewed 212 publications involving 357 validations classifying them as either internal (manual/algorithmic review of database records or sensitivity analysis) or external (questionnaires or patient record requests to GPs and comparison with external data sources). Generally a high proportion of cases were confirmed but for the majority of studies only positive predictive values (PPVs) are obtainable. The authors also note that detailed case definitions and code lists used were seldom provided, and it is thus hard to assess whether any poor performance in terms of case identification is due to poor code selection. The importance of code selection is an area explored by a GPRD stroke study which stresses the need for transparency and an understanding of the context of code selection [Gulliford et al. 2009]. A smaller review of GPRD validation studies [Khan et al. 2010] came to similar conclusions reporting high PPVs, citing the Morbidity Statistics from General Practice 1991–1992 (MSGP4) as a frequent external comparator, and also noting the use of both Read codes and OXMIS codes (Oxford Medical Information System codes: an early clinical dictionary used by VM system) as a complication of coding. Current GPRD data contains only Read coding. Validation studies in other primary care data sources are limited and in the case of THIN mostly reassess validity in the same clinical areas already assessed using GPRD.
Recent innovations in terms of linking data sets within GPRD has provided further opportunities to compare data across data sets, not only in terms of overall prevalence rates, but also in terms of individual concordance at the patient level. Often, neither linked data sets can be considered GOLD standard and there will be genuine healthcare reasons for disconcordance between the two. Their role in data quality is thus complex, however, the benefit of linkage is already being seen in research [Boggon et al. 2011].

Database redevelopment

In its initial form, the GPRD was derived from a system called VM, a DOS-based system which produced downloads involving five files of a simple structure. This version of the database was used until early 2000 but ceased to be updated from 2001. The Vision system superseded VM in 1995 and from then until 2000 the vast majority of practices upgraded. The data collected from the Vision system was richer and more complex and could not be housed in existing VM data repositories. In 1999 the Medicines and Healthcare products Regulatory Agency (MHRA: an executive agency of the Department of Health, then named the Medicines Control Agency) invested in the development of the Vision-based data that had been accumulating since 1995 in some practices [Wood and Coulson, 2001]. Built by external contractors to a specification supplied by GPRD staff, the new system built was known as Full Feature GPRD (FF-GPRD) and its main features were a centralised online access database interfaced by a suite of data cutting and data reporting tools. Underlying these tools was a data warehouse relational database, with a proto-database in the form of an operational data store. By 2007, the FF-GPRD system was approaching the end of its natural life time and a replacement system was developed and implemented within the GPRD group. This system was designed to concentrate on the basic requirements of an online system. Such requirements were identified by a number of means:
An audit of study population requirements from a sample of ISAC (formerly SEAG) protocols revealed that the vast majority of study populations were identified by the presence or absence of either specific Read terms, or drugs, and a simple set of age, gender, study period and temporal ordering constraints.
An analysis of the relative strengths and weaknesses of the FF-GPRD system enabled us to identify candidate functionality for retention as well as those functions that were not of value and could be discarded.
The experience of GPRD researchers working with data to execute service provision (including data extraction to the undertaking of full scientific research projects) enabled us to easily identify the inefficiencies, generate benchmarks and detail the specific weaknesses of the existing system, as candidates for improvement.
Client feedback was essential due to the funding model of GPRD and it was essential that any new system addressed any areas of dissatisfaction with FF-GPRD.
Over a period of 2 years a replacement system was developed. The new system known as GPRD Gold was launched in March 2009 and is the current version of the database. It processes data practice by practice a collection at a time. Gains in performance have been significant and the process is completely scalable. These improvements, in conjunction with implementation of electronic delivery of incremental data collections, have led to a decrease in the data lag period (the difference between the date of the data in the database and the date of access). Thus, currently nearly all contributing practices have a lag period of less than 6 weeks in a monthly static database. This is limited only by the frequency of collections and not processing limits. It is now possible to rebuild the database from scratch in a 2–3 week window, which was not possible previously. This enables substantive changes to processing to be implemented in a realistic time frame making the database itself potentially reactive to changes in baseline systems or recording practices.
In GPRD Gold a static version of the database is produced every month and it is this which is accessed by researchers. All previous monthly databases are retained and at any one time six previous monthly versions are available for access on the online system. This ensures access to the most up to date data and also the ability to requery data on which a previous analysis is based if required: as the monthly versions are static, the same data will be available. The principle of using a central database via an online system is sound as it negates the need for extensive IT expertise at the client site, and it helps standardise the ‘forms’ of GPRD available for use. The term ‘GPRD’ is frequently used to describe a data source in publications, however it is rarely specified in any more detailed terms. The nature of the resource is such that in order to exactly replicate analysis one would have to analyse the same version of the database. A centralised database with versioned static releases goes some way towards introducing clarity. However, for an online system to be a viable option users need to be able to extract the data they require in a realistic time frame. A set of simple, fit for purpose and efficient data identification and extraction tools were developed, that provided for the requirements of in excess of 90% of all study populations. These tools also enable users to run simple feasibility counts, often the first step in a potential project. In benchmarking comparisons to the prior system, the simplified approach means even queries involving large numbers of patients run hundreds of times quicker and whilst not extracting very large data sets in real time, patient cohorts involving several 100,000 patients can realistically be built within a 24-hour period, and can be run asynchronously without user input.
GPRD is the only online accessible database. Access to data via other providers is either via a method of data extraction at the supplier organisation, and subsequent supply, or in the case of Q research, data is not supplied directly to clients but studies are run within the supplier organisation and reported on to the client. These models are also run by GPRD, and can be an efficient methodology. However with the prevalence of skilled primary care database researchers growing in the UK and elsewhere it is likely that the demand for raw data will grow. By allowing individuals to access the data directly a method is provided that gives researchers more confidence into the provenance of the data on which they are working.

GPRD linkage programme

For a number of years now, the idea of a joined-up health data network for research has been a lofty goal within the UK. Part of the NHS National Programme for Information Technology (NPfIT) program was the creation of a Secondary User System: a system conceived to provide pseudonymised patient-based data for purposes including healthcare planning, clinical audit, performance monitoring and research [NHS Connecting for Health Implementation Guidance team, 2007]. More recently the UK Research Capability Programme has looked at the concept of federating data sources in the UK.
Within the field of primary care databases, linkage to non-primary-care data has long been seen as desirable. Thus far, GPRD remains the only national database to have linked on a permanent and ongoing basis to external data sets. The linkage process itself uses a common methodology which enables linkage to several disparate data sets. Practices are required to consent to having their data linked, and if so they download identifiers, including patient NHS number, post code, date of birth and gender to a trusted third party (TTP). The methodology involves a first-pass match which identifies patients on the basis of NHS number, where this fails a second pass is undertaken matching probabilistically on gender, date of birth and postcode. The overall proportion of patients identified on NHS number is 91.73%. The TTP undertakes the linkage process and makes available a GPRD identifier–external linked data set identifier pair that can be used to integrate linked data with the primary care record. The data governance of this process is based around a separation of function and clear unambiguous rules that prevent any one party being in possession of any data sets that have potentially recognisable data signatures that could be used to identify patients. Linkages are subject to approval and regulation by the National Information Governance Board. Currently, the linkage is restricted to English practices as all approvals are required at the national level. Future plans include the consideration of extending the linkage plan to Scotland, Wales and possibly Northern Ireland. The numbers of practices consenting to linkage are 302 from a total of 465 English practices which represents approximately 65% of the contributing practice in England, and roughly 5% of the population of England.
Data sets that have been integrated include secondary care hospital episode statistics, ONS death certification data, socioeconomic classification data (aggregated at the lower super output area level) and disease registry data including the National Cancer Intelligence Network (NCIN) and the Myocardial Infarction National Audit Program (MINAP) register of myocardial infarctions. These linkages are ongoing and are updated on a quarterly basis. Additional linkages have been implemented for particular studies, including pollution-level data and linkages to other cohorts such as the Avon Longitudinal Study of Parents and Children (ALSPAC). Linkage to other data sets is pending.

Methodological advances

The last few years have seen an increase in the volume of research undertaken by the GPRD research department. A consequence of this has been to begin to explore new areas of application and methodology aided by the access to linked data sets. There has been much work required simply to understand the parameters of a new data set comprising data streams from multiple data sources. For example, considering the composite data comprising primary care data, hospital episode data and death certification data, we have three data streams with separate left and right censoring points all of which have their own characteristics in terms of purpose (clinical care or administrative), function and with differing recording characteristics. Under this situation, the simple task of attempting to match a diagnosis or a fatal event in two or all three of these data streams becomes a complex one. There is a need to investigate the best methods in terms of utilising these data, and a number of analyses [Eaton et al. 2010] have been initiated in GPRD to begin to develop these parameters.
The development of novel methodologies is occurring in a number of areas and a brief description of some of these programmes follows.
An area of research seen as central to GPRD is that of pharmacovigilance. Many studies have used GPRD as a data set in which to test hypotheses in response to safety signals generated by other data sources such as adverse event reports. Recent examples of these include studies by Douglas and colleagues and van Staa and colleagues [Douglas et al. 2009; van Staa et al. 2008a]. The potential to generate drug event pairs by data mining within GPRD has been resisted due to the fact that once utilised for this purpose this same data cannot be used to test this signal. Despite the large size of GPRD, hypothesis testing in newly marketed drugs often lacks sufficient power due to small numbers of users and ring fencing a proportion of the database for data mining would only exacerbate this problem. One study [van Staa et al. 2008b] presents an alternative innovative methodology utilising primary care data from GPRD to provide real-world estimates of the harm–benefit balance of cyclo-oxygenase-2 (Cox-2) inhibitors. The method incorporates the relative rates (for example, from RCTs) of various outcomes for Cox-2 and sets them within the observational real-world data of GPRD. This generates a harm–benefit ratio profile of Cox-2 inhibitors in a real-life study, across specific age, gender and risk factor combinations. This work represents a departure from the standard relative rate estimation of potential adverse events within databases such as GPRD, and is potentially much more useful as it enables individualised risk–benefit decisions to be considered when deciding whether or not to prescribe to patients. It is anticipated that this model could be used as a pharmacovigilance safety monitoring tool given appropriate levels of data for newly launched medicines that will be available upon the development of systems for extracting targeted data from large collections of GP systems on a cross-platform basis.
Research based upon observational data is susceptible to bias and confounding and it is an understanding of this which is critical to the interpretation of study findings. In the field of clinical trials, the concept of the double blind randomised controlled trial (RCT) is seen as the gold standard and often lauded as the only way to get true measurement of exposure effects. However, such studies operate within an artificial environment that does not reflect real-life clinical practice [van Staa et al. 2009]. GPRD is currently embarked on a project to enable randomisation of patients into a pragmatic RCT study direct from the primary care clinical practice, with standard routine collection of their electronic record continuing in the usual way. The focus of such studies will be low-risk interventions, such as randomisation to one or another statin. Such studies can have very long-term follow up from their standard primary care health record within GPRD, and with cause of death being available for nearly all patients. The model has been designed to minimise the impact on the GP in terms of recruiting and administering interventions in order to reduce obstacles to recruitment. Potential study patients can be identified in a number of ways either on the basis of recruitment at next visit, via an invitation to attend a clinic, or even in real time based on a diagnosis being recorded. Mediation of the notification to the GP and also of the recruitment process is via an automated system integrated to the GP patient system alerting the GP and directing them to a recruitment web site for consent and randomisation [Tyson et al. 2011]. As an interventional study the project is subject to full ethical approval and will be required to comply with Good Clinical Practice regulations and systems are being developed to ensure that all aspects of these regulations are conformed with. This new methodology provides a vehicle for the running of pragmatic real-world randomised trials at a fraction of the cost of full RCTs: the randomisation will go a considerable way to eliminating bias and confounding, and the setting within a real-world context will enable the results of such studies to relate more directly to actual clinical practice. As a tool in the epidemiological toolkit this will provide a useful adjunct to current methodologies and the results and conclusions they produce.
Similarly, a current project with GPRD collaboration involves interventions randomised at the practice level. Initiated by findings of prior work [Dregan et al. 2011] it involves cluster randomisation and patient recruitment and uses the same central methodology [Gulliford et al. 2011]. This model would also be useful to facilitate recruitment for an existing pharmacogenetic study: ‘Statin-Induced Muscle Toxicity: Exploration Using the UK General Practice Research Database (GPRD)’ [STAGE, 2011] being run collaboratively with the University of Liverpool and funded by the MRC and the Wellcome trust. Consenting practices have been recruiting statin users for the purpose of genetic sampling by providing either a saliva or blood sample. Once more, the intervention is minimal and the follow up is taken care of by standard GPRD data collections and the linked data sets. Such studies will enable us to look at the relationship between recognised rare adverse events and genetic profiles or particular genetic markers. Already well into the recruiting phase, this study could be set up more efficiently in the future by integrating it with the recruiting mechanism of the pragmatic RCTs.
Free text was described above as a component of primary care data. Its prevalence varies, partly in relation to the underlying system being used. The ability to utilise free text is potentially very useful, and has been demonstrated in the past relating to cause of death identification from free text [Shah and Martinez, 2004] and more recently in a project exploring the role of free text in identification of diagnosis, signs and symptoms in patients with ovarian cancer [Koeling et al. 2011]. Consequently, a further area of new research in GPRD is related to the utilisation of free text collected from the practices. As a critical component of any GP system, free text is used to annotate coded data and also in the form of letters and emails to communicate between disparate sets of healthcare providers. Free text, by its nature, has no prescribed structure and may well contain information that would be inappropriate for researchers to view directly, such as patient, doctor and hospital names and other identifiers. It is possible for individual free text records to be coded in such a way as to prevent it being downloaded, and there is significant variation between the volume and characteristics of the free text received as part of practice data downloads. As yet no specific studies testing the validity of free text have been conducted but considering the number of studies where free text and original patient paper records (provided by the GP and appropriately anonymised) have been utilised, there has been no reference to major inconsistencies with the original source data. Cross validation of free text with linked data sets will be possible in the future.
In the GPRD Gold system, free text is processed sight unseen and indexed such that it can be retrieved for the purposes of anonymisation and provision to researchers if required. At the time of the development of GPRD Gold, free text tools were enhanced. Currently it is possible to extract and anonymise selected text for use in studies, and it is also possible to undertake keyword searches within selected blocks of text or on the entire free text repository. For example, it has been demonstrated that using keyword searches for biological anticancer treatments (not prescribed at all in primary care) up to 30,000 patients can be identified: a potential if imperfect method for identifying patients using non- primary-care treatments.
Free text is associated with all clinical event types including diagnosis and history events, referral events, test result events, prescribing and immunisation. Different event types have characteristically different types of free text, with diagnosis and history events often containing annotation, and referral events being associated with a higher proportion of communications such as referral, outpatient or discharge letters. Test events can often have machine generated results in text format which will have regular structure and form. Prescribing free text is dosage instructions, which is largely formalised and currently handled within existing systems. GPRD are involved in a funded research program part of which is exploring free text using simple strategies as well as more complex natural language processing (NLP) approaches. Whilst currently in its infancy with primary care free text analysis, NLP techniques have shown that shorter unstructured annotations conform to different language models than the longer more grammatically structured letters, when used as a basis for defining synonyms for terms. This is mainly due to the fact that grammatically sound text can be analysed by parsing algorithms, providing us with highly informative grammatical relations between words, while the poor success rate of standard parsing techniques for the noisy unstructured annotations forces us to use much less informative proximity relations. However, there is evidence [McCarthy et al. 2007] that thesauruses built using proximity relations can be of similar quality as those that were built using grammatical relations when there is sufficient training material available.
The building of a type of free-text thesaurus has potential utility in terms of attempting to codify free text automatically. Research is planned on forms of free text to develop a suite of free-text tools to maximise the utility of this data whilst conforming to the strict data governance rules required with this form of data.

Future direction

Over the last 5 years since the latest reviews [Wood and Martinez, 2004; Parkinson et al. 2007] there have been many changes in the arena of primary care databases in the UK. Two new large-scale data sources have come into being, and access to the primary care data has been broadened by schemes such as the GPRD–MRC academic funding agreement. The pool of expertise in terms of use of primary care data has developed correspondingly and progress has been made in terms of the development of new methodologies. Pragmatic randomised trials within the database, cluster RCTs and pharmacogenetic studies are underway. In terms of supplementing primary care data, we are now in a situation where data from disparate NHS data sets have been linked at the patient level including hospital episode statistics, death certification and disease registries. In parallel the Research Capability Programme, part of the NHS National Institute for Health Research has been piloting new linkages and new honest broker methodologies of undertaking such linkages. 2012 will see a new service launched as a partnership between the MHRA, the GPRD host organisation and the NIHR. This new research service, called the Clinical Practice Research Datalink (CPRD), will be a national linkage service across the English 52 million population, have access to many NHS datasets, develop and enable an embedded electronic case report form for primary care clinical trials within the Primary Care EHR IT systems working in conjunction with the Primary Care Research Network (PCRN). The CPRD is a federated approach to delivery of research data and other services and will work collaboratively with all existing databases and services with the sole objective of increasing the volume of research undertaken in the UK; research that will improve both the health and the wealth of the nation.
With such a broad range of activity and progress, it is clear there will be further work required to develop the systems needed to administer and run such research on a large scale. Data quality assessment such as that being undertaken within GPRD will need to be developed to explore how various data sources interface with one another, and methodologies to utilise disparate data sets from different healthcare contexts will be required.

Acknowledgments

The views expressed in this paper are those of the authors and do not reflect the official policy or position of the Medicines and Healthcare products Regulatory Agency (MHRA). GPRD is owned by the UK Department of Health and operates within the MHRA. GPRD has received funding from the MHRA, Wellcome Trust, Medical Research Council, NIHR Health Technology Assessment programme, Innovative Medicine Initiative, UK Department of Health, Technology Strategy Board, Seventh Framework Programme EU, various universities, contract research organisations and pharmaceutical companies.

Competing Interests

The authors are employees within the GPRD group of MHRA.

Funding Information

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

References

Boggon R., van Staa T.P., Timmis A., Hemingway H., Ray K.K., Begg A., et al. (2011) Clopidogrel discontinuation after acute coronary syndromes: frequency, predictors and associations with death and myocardial infarction—a hospital registry-primary care linked cohort (MINAP-GPRD). Eur Heart J 32: 2376–2386.
Bourke A., Dattani H., Robinson M. (2004) Feasibility study and methodology to create a quality-evaluated database of primary care data. Inform Prim Care 12: 171–177.
Carey I.M., Cook D.G., De Wilde S., Bremner S.A., Richards N., Caine S., et al. (2004) Developing a large electronic primary care database (Doctors’ Independent Network) for research. Int J Med Inform 73: 443–453.
de Lusignan S., van Vlymen J., Hague N., Dhoul N. (2006) Using computers to identify non-compliant people at increased risk of osteoporotic fractures in general practice: a cross-sectional study. Osteoporosis Int 17: 1808–1814.
de Lusignan S., van Weel C. (2006) The use of routinely collected computer data for research in primary care: opportunities and challenges. Fam Pract 23: 253–263.
Dregan A., Toschke M.A., Wolfe C.D., Rudd A., Ashworth M., Gulliford M.C. (2011) Utility of electronic patient records in primary care for stroke secondary prevention trials. BMC Public Health 11 (1): 86.
Douglas I.J., Evans S.J., Pocock S., Smeeth L. (2009) The risk of fractures associated with thiazolidinediones: a self-controlled case-series study. PLoS Med 6 (9): e1000154.
Eaton S., Setakis E., Williams T., van Staa T.P. (2010) Linking Primary Care Data (UK GPRD) to Hospital Records (HES). Pharmacoepidemiol Drug Saf 19: S195.
Gulliford M.C., Charlton J., Ashworth M., Rudd A.G., Toschke M.A. (2009) Selection of medical diagnostic codes for analysis of electronic patient records. Application to stroke in a primary care database. PLoS ONE 4 (9): e7168.
Gulliford M.C., van Staa T., McDermott L., Dregan A., McCann G., Ashworth M., et al. for the electronic Cluster Randomised Trial Research Team eCRT Research Team (2011) Cluster randomised trial in the General Practice Research Database: 1. Electronic decision support to reduce antibiotic prescribing in primary care (eCRT study). Trials 12: 115.
Herrett E., Thomas S.L., Schoonen W.M., Smeeth L., Hall A.J. (2010) Validation and validity of diagnoses in the general practice research database: a systematic review. Br J Clin Pharmacol 69: 4–14.
Hippisley-Cox J., Stables D., Pringle M. (2004) QRESEARCH: a new general practice database for research. Inform Prim Care 12: 49–50.
Khan N.F., Harrison S.E., Rose P.W. (2010) Validity of diagnostic coding within the general practice research database: a systematic review. Br J Gen Pract 60 (572): e128–e136.
Koeling R., Tate A.R., Carroll J. (2011) Automatically estimating the incidence of symptoms recorded in GP free text notes. In Proceedings of the First International Workshop on Managing Interoperability and Complexity in Health Systems (MIXHS'11), Glasgow, UK.
Lawrenson R., Williams T., Farmer R. (1999) Clinical information for research; the use of general practice databases. J Public Health Med 21: 299–304.
Lawson D.H., Sherman V., Hollowell J. (1998) The general practice research database. Q J Med 91: 445–452.
McCarthy D., Koeling R., Weeds J., Carroll J. (2007) Unsupervised Acquisition of Predominant Word Senses. Computat Linguist 33: 553–590.
NHS Connecting for Health Implementation Guidance team (2007) The National Programme for IT Implementation Guide – version 5. Available at: http://www.connectingforhealth.nhs.uk/systemsandservices/implementation/docs/national_programme_implementation_guide.pdf
Parkinson J., Davis S., van Staa T. (2007) The general practice research database: now and the future. In Mann R.D., Andrews E.B. (eds), Pharmacovigilance, 2nd Ed. Chichester: John Wiley & Sons, Ltd.
Shah A.D., Martinez C. (2004) An algorithm to extract medical codes for diagnoses from unstructured free text recorded by general practitioners in the UK. Pharmacoepidemiol Drug Saf 13: S41–S42.
STAGE (2011) STAtin-induced muscle toxicity: exploration using the UK GEneral Practice Research Database (GPRD). http://www.gprd.co.uk/stage/home/.
Tate A.R., Beloff N., Puri S., Williams T., Van Staa T.P. (2011) Developing quality scores for electronic health records for clinical research: a study using the General Practice Research Database. In ACM Proceedings of MIXHS11, 28 October 2011, Glasgow, Scotland, UK. Glasgow: Sheridan.
Tyson G., Taweel A., Zschaler S., van Staa T., Delaney B. (2011) A model-driven approach to interoperability and integration in systems of systems. In Proceedings of the 7th European Conference on Modelling Foundations and Applications: Workshop on Model-Based Software and Data Integration (MBSDI), Birmingham, England.
Vamos E.P., Pape U.J., Bottle A., Hamilton F.L., Curcin V., Ng A., et al. (2011) Association of practice size and pay-for-performance incentives with the quality of diabetes management in primary care. CMAJ 183 (12): E809–E816.
van Staa T.P., Leufkens H.G., Zhang B., Smeeth L. (2009) A comparison of cost effectiveness using data from randomized trials or actual clinical practice: selective Cox-2 inhibitors as an example. PLoS Med 6 (12): e1000194.
van Staa T.P., Rietbrock S., Setakis E., Leufkens H.G. (2008a) Does the varied use of NSAIDs explain the differences in the risk of myocardial infarction? J Intern Med 264: 481–492.
van Staa T.P., Smeeth L., Persson I., Parkinson J., Leufkens H.G. (2008b) What is the harm-benefit ratio of Cox-2 inhibitors? Int J Epidemiol 7: 405–413.
Whalley T., Mantgani A. (1997) The UK General Practice Research Database. Lancet 350: 1097–1099.
Wood L., Coulson R. (2001) Revitalizing the general practice research database: plans, challenges, and opportunities. Pharmacoepidemiol Drug Saf 10: 379–383.
Wood L., Martinez C. (2004) The general practice research database: role in pharmacovigilance. Drug Saf 27: 871–881.

Cite article

Cite article

Cite article

OR

Download to reference manager

If you have citation software installed, you can download article citation data to the citation manager of your choice

Share options

Share

Share this article

Share with email
EMAIL ARTICLE LINK
Share on social media

Share access to this article

Sharing links are not relevant where the article is open access and not available if you do not have a subscription.

For more information view the SAGE Journals article sharing page.

Information, rights and permissions

Information

Published In

Article first published online: February 2, 2012
Issue published: April 2012

Keywords

  1. database
  2. pharmacoepidemiology
  3. primary care

Rights and permissions

© The Author(s), 2012.
Request permissions for this article.

History

Published online: February 2, 2012
Issue published: April 2012
PubMed: 25083228

Authors

Affiliations

Notes

Tim Williams, BSc, PhD, MSc Medicines and Healthcare Products Regulatory Agency – General Practice Research Database, London, UK [email protected]
Tjeerd van Staa, MD, PhD, MSc, MA Shivani Puri, BEng, MSc Susan Eaton, MSPH Medicines and Healthcare products Regulatory Agency - General Practice Research Database, London, UK

Metrics and citations

Metrics

Journals metrics

This article was published in Therapeutic Advances in Drug Safety.

VIEW ALL JOURNAL METRICS

Article usage*

Total views and downloads: 2313

*Article usage tracking started in December 2016

Altmetric

See the impact this article is making through the number of times it’s been read, and the Altmetric Score.
Learn more about the Altmetric Scores


Articles citing this one

Web of Science: 0

Crossref: 206

  1. 5‐α reductase inhibitors and the risk of anaemia among men with benign...
    Go to citation Crossref Google Scholar
  2. Safety of disinvestment in mid- to late-term follow-up post primary hi...
    Go to citation Crossref Google Scholar
  3. Long-term impact of pre-incision antibiotics on children born by caesa...
    Go to citation Crossref Google Scholar
  4. Using primary care databases for addiction research: An introduction a...
    Go to citation Crossref Google Scholar
  5. Long term impact of prophylactic antibiotic use before incision versus...
    Go to citation Crossref Google Scholar
  6. Association between opioid‐related deaths and persistent opioid prescr...
    Go to citation Crossref Google Scholar
  7. Interactive Visualization Applications in Population Health and Health...
    Go to citation Crossref Google Scholar
  8. Mortality Among Patients With Polymyalgia Rheumatica: A Retrospective ...
    Go to citation Crossref Google Scholar
  9. Statin prescription in patients with chronic obstructive pulmonary dis...
    Go to citation Crossref Google Scholar
  10. Evidence on nursing methodological approach on adult endocrine-psychia...
    Go to citation Crossref Google Scholar
  11. Shorter and Longer Courses of Antibiotics for Common Infections and th...
    Go to citation Crossref Google Scholar
  12. Adherence to Direct Oral Anticoagulants in Patients With Non-Valvular ...
    Go to citation Crossref Google Scholar
  13. Association between opioid-related deaths and prescribed opioid dose a...
    Go to citation Crossref Google Scholar
  14. The prevalence of Myocardial Infarction among Multiple Sclerosis Patie...
    Go to citation Crossref Google Scholar
  15. Disease severity, flares and treatment patterns in adults with systemi...
    Go to citation Crossref Google Scholar
  16. Hormone Replacement Therapy and Risk of Severe Asthma Exacerbation in ...
    Go to citation Crossref Google Scholar
  17. Social determinants of pertussis and influenza vaccine uptake in pregn...
    Go to citation Crossref Google Scholar
  18. All-Cause and Cardiovascular Mortality Among Insulin-Naïve People With...
    Go to citation Crossref Google Scholar
  19. Hormone replacement therapy and asthma onset in menopausal women: Nati...
    Go to citation Crossref Google Scholar
  20. Prescribers' compliance with summary of product characteristics of dab...
    Go to citation Crossref Google Scholar
  21. Variation in availability and use of surgical care for female urinary ...
    Go to citation Crossref Google Scholar
  22. Data extraction for epidemiological research (DExtER): a novel tool fo...
    Go to citation Crossref Google Scholar
  23. Hormonal contraception and the risk of severe asthma exacerbation: 17-...
    Go to citation Crossref Google Scholar
  24. Non-benzodiazepine hypnotic use for sleep disturbance in people aged o...
    Go to citation Crossref Google Scholar
  25. Examining opioid prescribing trends for non-cancer pain using an estim...
    Go to citation Crossref Google Scholar
  26. Using machine learning to predict anticoagulation control in atrial fi...
    Go to citation Crossref Google Scholar
  27. Determinants of referral of women with urinary incontinence to special...
    Go to citation Crossref Google Scholar
  28. Modeling of cumulative effects of time-varying drug exposures on withi...
    Go to citation Crossref Google ScholarPub Med
  29. Hormonal contraceptives and onset of asthma in reproductive-age women:...
    Go to citation Crossref Google Scholar
  30. Adverse pregnancy outcomes and subsequent development of connective ti...
    Go to citation Crossref Google Scholar
  31. Eleven-year multimorbidity burden among 637 255 people with and withou...
    Go to citation Crossref Google Scholar
  32. Delays in referral from primary care worsen survival for patients with...
    Go to citation Crossref Google Scholar
  33. Comparison of antibiotic prescribing records in two UK primary care el...
    Go to citation Crossref Google Scholar
  34. Utilisation of hospital information systems for medical research in Sa...
    Go to citation Crossref Google ScholarPub Med
  35. Development and validation of the DIabetes Severity SCOre (DISSCO) in ...
    Go to citation Crossref Google Scholar
  36. Cost-effectiveness of superabsorbent wound dressing versus standard of...
    Go to citation Crossref Google Scholar
  37. Primary Care Prescriptions of Potentially Nephrotoxic Medications in C...
    Go to citation Crossref Google Scholar
  38. <p>Existing Data Sources in Clinical Epidemiology: Laboratory Informat...
    Go to citation Crossref Google Scholar
  39. The impact of the enhanced recovery pathway and other factors on outco...
    Go to citation Crossref Google Scholar
  40. Evaluation of the epidemiology of peanut allergy in the United Kingdom
    Go to citation Crossref Google Scholar
  41. UK phenomics platform for developing and validating electronic health ...
    Go to citation Crossref Google Scholar
  42. Identifying undetected dementia in UK primary care patients: a retrosp...
    Go to citation Crossref Google Scholar
  43. Polypharmacy patterns in the last year of life in patients with dement...
    Go to citation Crossref Google Scholar
  44. Incidence of direct oral anticoagulant use in patients with nonvalvula...
    Go to citation Crossref Google Scholar
  45. Colorectal cancer patients under the age of 50 experience delays in pr...
    Go to citation Crossref Google Scholar
  46. ACE inhibitor use and risk of cataract: a case–control analysis
    Go to citation Crossref Google Scholar
  47. Concordance of hospitalizations between Clinical Practice Research Dat...
    Go to citation Crossref Google Scholar
  48. Long-term impact of giving antibiotics before skin incision versus aft...
    Go to citation Crossref Google Scholar
  49. Glycaemic, weight, and blood pressure changes associated with early ve...
    Go to citation Crossref Google Scholar
  50. Examining patterns in opioid prescribing for non-cancer-related pain i...
    Go to citation Crossref Google Scholar
  51. Antibiotic prescribing for common infections in UK general practice: v...
    Go to citation Crossref Google Scholar
  52. Risk of adverse events in patients prescribed long‐term opioids: A coh...
    Go to citation Crossref Google Scholar
  53. The accuracy of date of death recording in the C ...
    Go to citation Crossref Google Scholar
  54. Fate of the metabolically healthy obese—is this term a misnomer? A stu...
    Go to citation Crossref Google Scholar
  55. Antibiotic management of urinary tract infection in elderly patients i...
    Go to citation Crossref Google Scholar
  56. Pneumonia incidence trends in UK primary care from 2002 to 2017: popul...
    Go to citation Crossref Google Scholar
  57. Association of Medication Intensity and Stages of Airflow Limitation W...
    Go to citation Crossref Google Scholar
  58. Comparison of cancer diagnosis recording between the Clinical Practice...
    Go to citation Crossref Google Scholar
  59. Temporal trends in use of tests in UK primary care, 2000-15: retrospec...
    Go to citation Crossref Google Scholar
  60. Zoster vaccination inequalities: A population based cohort study using...
    Go to citation Crossref Google Scholar
  61. Perspectives of General Practitioners on the Issues Surrounding the La...
    Go to citation Crossref Google Scholar
  62. Incidence of subsequent fractures in the UK between 1990 and 2012 amon...
    Go to citation Crossref Google Scholar
  63. Concomitant diagnosis of asthma and COPD: a quantitative study in UK p...
    Go to citation Crossref Google Scholar
  64. Increased risk of reproductive dysfunction in women prescribed long-te...
    Go to citation Crossref Google Scholar
  65. Depression, depressive symptoms and treatments in women who have recen...
    Go to citation Crossref Google Scholar
  66. Examining the risk of depression or self-harm associated with incretin...
    Go to citation Crossref Google Scholar
  67. A standardized methodology for the surveillance of antimicrobial presc...
    Go to citation Crossref Google Scholar
  68. Sex and BMI Alter the Benefits and Risks of Sulfonylureas and Thiazoli...
    Go to citation Crossref Google Scholar
  69. Trajectory of Total Cholesterol in the Last Years of Life Over Age 80 ...
    Go to citation Crossref Google Scholar
  70. Assumptions made when preparing drug exposure data for analysis have a...
    Go to citation Crossref Google Scholar
  71. Determinants of health care costs in the senior elderly: age, comorbid...
    Go to citation Crossref Google Scholar
  72. Long-term trends in antithrombotic drug prescriptions among adults age...
    Go to citation Crossref Google Scholar
  73. Cataract in patients with diabetes mellitus—incidence rates in the UK ...
    Go to citation Crossref Google Scholar
  74. Inequalities in zoster disease burden: a population-based cohort study...
    Go to citation Crossref Google Scholar
  75. Examining the potential preventative effects of minocycline prescribed...
    Go to citation Crossref Google ScholarPub Med
  76. A 15-year overview of increasing tramadol utilisation and associated m...
    Go to citation Crossref Google Scholar
  77. Alcohol Misuse and Injury Outcomes in Young People Aged 10–24
    Go to citation Crossref Google Scholar
  78. Undertreatment of hypertension and hypercholesterolaemia in children a...
    Go to citation Crossref Google Scholar
  79. Use of primary care data to predict those most vulnerable to cold weat...
    Go to citation Crossref Google Scholar
  80. Health Data for Research Through a Nationwide Privacy-Proof System in ...
    Go to citation Crossref Google Scholar
  81. Evolution of the “fourth stage” of epidemiologic transition in people ...
    Go to citation Crossref Google Scholar
  82. Identifying social factors amongst older individuals in linked electro...
    Go to citation Crossref Google Scholar
  83. The risk of Clostridium difficile infection in patients with perniciou...
    Go to citation Crossref Google ScholarPub Med
  84. Utilization of Standard and Target-Specific Oral Anticoagulants Among ...
    Go to citation Crossref Google Scholar
  85. Selective Serotonin Reuptake Inhibitors and Cataract Risk
    Go to citation Crossref Google Scholar
  86. Protective effect of antirheumatic drugs on dementia in rheumatoid art...
    Go to citation Crossref Google Scholar
  87. Inception and deprescribing of statins in people aged over 80 years: c...
    Go to citation Crossref Google Scholar
  88. Examining trends in type 2 diabetes incidence, prevalence and mortalit...
    Go to citation Crossref Google Scholar
  89. Validation and incidence of community-acquired pneumonia in patients w...
    Go to citation Crossref Google Scholar
  90. Methods for estimating costs in patients with hyperlipidemia experienc...
    Go to citation Crossref Google Scholar
  91. Adjuvanted (AS03) A/H1N1 2009 Pandemic Influenza Vaccines and Solid Or...
    Go to citation Crossref Google Scholar
  92. Early Clinical Features in Systemic Lupus Erythematosus: Can They Be U...
    Go to citation Crossref Google Scholar
  93. Prevalence and Incidence Trends for Diagnosed Prescription Opioid Use ...
    Go to citation Crossref Google Scholar
  94. Cohort Analysis of Exacerbation Rates in Adolescent and Adult Patients...
    Go to citation Crossref Google Scholar
  95. Big data in mental health research – do the ns ...
    Go to citation Crossref Google Scholar
  96. Temporal Trends in Incidence, Prevalence, and Mortality of Atrial Fibr...
    Go to citation Crossref Google Scholar
  97. Economic Evaluations Alongside Efficient Study Designs Using Large Obs...
    Go to citation Crossref Google Scholar
  98. Metformin and the risk of renal cell carcinoma: a case–control analysi...
    Go to citation Crossref Google Scholar
  99. External validation of a COPD prediction model using population-based ...
    Go to citation Crossref Google Scholar
  100. Recurrence risk of venous thromboembolism and hormone use in women fro...
    Go to citation Crossref Google Scholar
  101. Drug therapy for alcohol dependence in primary care in the UK: A Clini...
    Go to citation Crossref Google Scholar
  102. How useful is thrombocytosis in predicting an underlying cancer in pri...
    Go to citation Crossref Google Scholar
  103. Trends in oral anti-osteoporosis drug prescription in the United Kingd...
    Go to citation Crossref Google Scholar
  104. Comparability of the age and sex distribution of the UK Clinical Pract...
    Go to citation Crossref Google Scholar
  105. The historical development of the Dutch Sentinel General Practice Netw...
    Go to citation Crossref Google Scholar
  106. Glycated Hemoglobin, Body Weight and Blood Pressure in Type 2 Diabetes...
    Go to citation Crossref Google Scholar
  107. The economic burden of tuberous sclerosis complex in UK patients with ...
    Go to citation Crossref Google Scholar
  108. PLEASANT: Preventing and Lessening Exacerbations of Asthma in School-a...
    Go to citation Crossref Google Scholar
  109. Regression coefficient–based scoring system should be used to assign w...
    Go to citation Crossref Google Scholar
  110. The economic burden of tuberous sclerosis complex in the UK: A retrosp...
    Go to citation Crossref Google Scholar
  111. Risk of new onset autoimmune disease in 9- to 25-year-old women expose...
    Go to citation Crossref Google Scholar
  112. Association of Hypoglycemia With Subsequent Dementia in Older Patients...
    Go to citation Crossref Google Scholar
  113. Effect of Contemporary Bariatric Surgical Procedures on Type 2 Diabete...
    Go to citation Crossref Google Scholar
  114. Risks associated with antipsychotic treatment in pregnancy: Comparativ...
    Go to citation Crossref Google Scholar
  115. Proton pump inhibitor prescribing patterns in the UK: a primary care d...
    Go to citation Crossref Google Scholar
  116. A methodological comparison of two European primary care databases and...
    Go to citation Crossref Google Scholar
  117. Do health checks improve risk factor detection in primary care? Matche...
    Go to citation Crossref Google Scholar
  118. The Use of Telmisartan and the Incidence of Cancer
    Go to citation Crossref Google Scholar
  119. Changing Epidemiology of Bariatric Surgery in the UK: Cohort Study Usi...
    Go to citation Crossref Google Scholar
  120. Improved incidence estimates from linked vs. stand-alone electronic he...
    Go to citation Crossref Google Scholar
  121. Trends in long-term opioid prescribing in primary care patients with m...
    Go to citation Crossref Google Scholar
  122. Poor adherence to gonorrhoea treatment guidelines in general practice ...
    Go to citation Crossref Google Scholar
  123. Comparative effectiveness of incretin-based therapies and the risk of ...
    Go to citation Crossref Google Scholar
  124. Are healthcare costs from obesity associated with body mass index, com...
    Go to citation Crossref Google Scholar
  125. Clinical workload in UK primary care: a retrospective analysis of 100 ...
    Go to citation Crossref Google Scholar
  126. Association of smoking and concomitant metformin use with cardiovascul...
    Go to citation Crossref Google Scholar
  127. Is omission of free text records a possible source of data loss and bi...
    Go to citation Crossref Google Scholar
  128. Costs and outcomes of increasing access to bariatric surgery for obesi...
    Go to citation Crossref Google Scholar
  129. Influenza and Pneumococcal Vaccination Uptake in Patients with Rheumat...
    Go to citation Crossref Google Scholar
  130. Risk of Seizures Associated with Antidepressant Use in Patients with D...
    Go to citation Crossref Google Scholar
  131. Clinical inertia with regard to intensifying therapy in people with ty...
    Go to citation Crossref Google Scholar
  132. Prostate-specific antigen testing rates and referral patterns from gen...
    Go to citation Crossref Google Scholar
  133. Do case‐only designs yield consistent results across design and differ...
    Go to citation Crossref Google Scholar
  134. Designing and incorporating a real world data approach to internationa...
    Go to citation Crossref Google Scholar
  135. The clinical profile of tuberous sclerosis complex (TSC) in the United...
    Go to citation Crossref Google Scholar
  136. Morbidity and medication in a large population of individuals with Dow...
    Go to citation Crossref Google Scholar
  137. Cohort profile of the South London and Maudsley NHS Foundation Trust B...
    Go to citation Crossref Google Scholar
  138. Risks and benefits of psychotropic medication in pregnancy: cohort stu...
    Go to citation Crossref Google Scholar
  139. Calculating Total Health Service Utilisation and Costs from Routinely ...
    Go to citation Crossref Google Scholar
  140. NETIMIS: Dynamic Simulation of Health Economics Outcomes Using Big Dat...
    Go to citation Crossref Google Scholar
  141. Influenza-attributable burden in United Kingdom primary care
    Go to citation Crossref Google Scholar
  142. Adherence to Oral Glucose-Lowering Therapies and Associations With 1-Y...
    Go to citation Crossref Google Scholar
  143. Severity of obesity and management of hypertension, hypercholesterolae...
    Go to citation Crossref Google Scholar
  144. Weight change and healthcare resource use in English patients with typ...
    Go to citation Crossref Google Scholar
  145. Effect of the adjuvanted (AS03) A/H1N1 2009 pandemic influenza vaccine...
    Go to citation Crossref Google Scholar
  146. Changes in the incidence, prevalence and mortality of bronchiectasis i...
    Go to citation Crossref Google Scholar
  147. Pilot study linking primary care records to Census, cardiovascular hos...
    Go to citation Crossref Google Scholar
  148. The incidence of childhood and adolescent seizures in the UK from 1999...
    Go to citation Crossref Google Scholar
  149. Modelling estimates of the burden of Respiratory Syncytial virus infec...
    Go to citation Crossref Google Scholar
  150. Health and Employment after Fifty (HEAF): a new prospective cohort stu...
    Go to citation Crossref Google Scholar
  151. Modelling and extraction of variability in free-text medication prescr...
    Go to citation Crossref Google Scholar
  152. Delay in treatment intensification increases the risks of cardiovascul...
    Go to citation Crossref Google Scholar
  153. Half of UK patients with rheumatoid arthritis are prescribed oral gluc...
    Go to citation Crossref Google Scholar
  154. Determinants of saxagliptin use among patients with type 2 diabetes me...
    Go to citation Crossref Google Scholar
  155. Non-steroidal anti-inflammatory drugs and the risk of head and neck ca...
    Go to citation Crossref Google Scholar
  156. Did NICE guidelines and the Quality Outcomes Framework change GP antid...
    Go to citation Crossref Google Scholar
  157. Risk of spontaneous abortion and other pregnancy outcomes in 15–25 yea...
    Go to citation Crossref Google Scholar
  158. Development and Validation of the RxDx-Dementia Risk Index to Predict ...
    Go to citation Crossref Google Scholar
  159. Use of demographic and pharmacy data to identify patients included wit...
    Go to citation Crossref Google Scholar
  160. Risk patterns in drug safety study using relative times by accelerated...
    Go to citation Crossref Google Scholar
  161. The effect of sibutramine prescribing in routine clinical practice on ...
    Go to citation Crossref Google Scholar
  162. Probability of an Obese Person Attaining Normal Body Weight: Cohort St...
    Go to citation Crossref Google Scholar
  163. Der Nutzen großer Gesundheitsdatenbanken für die Arzneimittelrisikofor...
    Go to citation Crossref Google Scholar
  164. Determinants of oral anticoagulation control in new warfarin patients:...
    Go to citation Crossref Google Scholar
  165. Causes of death in people with coeliac disease in England compared wit...
    Go to citation Crossref Google Scholar
  166. The epidemiology of cardiovascular disease in the UK 2014
    Go to citation Crossref Google Scholar
  167. Glitazone Treatment and Incidence of Parkinson’s Disease among People ...
    Go to citation Crossref Google Scholar
  168. Nonsteroidal anti-inflammatory drugs and the risk of nonmelanoma skin ...
    Go to citation Crossref Google Scholar
  169. Changes in rates of recorded depression in English primary care 2003–2...
    Go to citation Crossref Google Scholar
  170. Big biomedical data and cardiovascular disease research: opportunities...
    Go to citation Crossref Google Scholar
  171. Why do we need observational studies of everyday patients in the real-...
    Go to citation Crossref Google Scholar
  172. Data Resource Profile: Clinical Practice Research Datalink (CPRD)
    Go to citation Crossref Google Scholar
  173. Estimating the yield of NHS Health Checks in England: a population-bas...
    Go to citation Crossref Google Scholar
  174. Diagnosis and treatment of chlamydia and gonorrhoea in general practic...
    Go to citation Crossref Google Scholar
  175. Missing laboratory test data in electronic general practice records: a...
    Go to citation Crossref Google Scholar
  176. Impact of Switching From High-Efficacy Lipid-Lowering Therapies to Gen...
    Go to citation Crossref Google Scholar
  177. Impact of bariatric surgery on clinical depression. Interrupted time s...
    Go to citation Crossref Google Scholar
  178. Childhood obesity trends from primary care electronic health records i...
    Go to citation Crossref Google Scholar
  179. Access to weight reduction interventions for overweight and obese pati...
    Go to citation Crossref Google Scholar
  180. Characterisation of Data Quality in Electronic Healthcare Records
    Go to citation Crossref Google Scholar
  181. Application of Privacy-Preserving Techniques in Operational Record Lin...
    Go to citation Crossref Google Scholar
  182. Long-term exposure to outdoor air pollution and the incidence of chron...
    Go to citation Crossref Google Scholar
  183. Incidence of type 2 diabetes after bariatric surgery: population-based...
    Go to citation Crossref Google Scholar
  184. Completeness and usability of ethnicity data in UK-based primary care ...
    Go to citation Crossref Google Scholar
  185. Risk factors for acute exacerbations of COPD in a primary care populat...
    Go to citation Crossref Google Scholar
  186. Metformin and the risk of head and neck cancer: a case-control analysi...
    Go to citation Crossref Google Scholar
  187. Changes in trends and pattern of strong opioid prescribing in primary ...
    Go to citation Crossref Google Scholar
  188. Prediction of Cardiovascular Risk Using Framingham, ASSIGN and QRISK2:...
    Go to citation Crossref Google Scholar
  189. Predictors and outcomes of increases in creatine phosphokinase concent...
    Go to citation Crossref Google Scholar
  190. Mortality of patients with multiple sclerosis: a cohort study in UK pr...
    Go to citation Crossref Google Scholar
  191. Validation of chronic obstructive pulmonary disease recording in the C...
    Go to citation Crossref Google Scholar
  192. Point-of-Care Cluster Randomized Trial in Stroke Secondary Prevention ...
    Go to citation Crossref Google Scholar
  193. Data quality in European primary care research databases. Report of a ...
    Go to citation Crossref Google Scholar
  194. Use of electronic healthcare records in large-scale simple randomized ...
    Go to citation Crossref Google Scholar
  195. Exploiting the potential of large databases of electronic health recor...
    Go to citation Crossref Google Scholar
  196. Commentary on Berlin et al.
    Go to citation Crossref Google ScholarPub Med
  197. Adherence to inhaled corticosteroids by asthmatic patients: measuremen...
    Go to citation Crossref Google Scholar
  198. Risk of Mortality (Including Sudden Cardiac Death) and Major Cardiovas...
    Go to citation Crossref Google Scholar
  199. SLCO1B1 Genetic Variant Associated With Statin-Induced Myopathy: A Pro...
    Go to citation Crossref Google Scholar
  200. A study of general practitioners’ perspectives on electronic medical r...
    Go to citation Crossref Google Scholar
  201. Variability of antibiotic prescribing in patients with chronic obstruc...
    Go to citation Crossref Google Scholar
  202. Declining Genital Warts in Young Women in England Associated With HPV ...
    Go to citation Crossref Google Scholar
  203. The efficiency of cardiovascular risk assessment: do the right patient...
    Go to citation Crossref Google Scholar
  204. Case–control analysis on metformin and cancer of the esophagus
    Go to citation Crossref Google Scholar
  205. Cancer recording and mortality in the General Practice Research Databa...
    Go to citation Crossref Google Scholar
  206. Preventing and lessening exacerbations of asthma in school-age childre...
    Go to citation Crossref Google Scholar

Figures and tables

Figures & Media

Tables

View Options

View options

PDF/ePub

View PDF/ePub

Get access

Access options

If you have access to journal content via a personal subscription, university, library, employer or society, select from the options below:


Alternatively, view purchase options below:

Purchase 24 hour online access to view and download content.

Access journal content via a DeepDyve subscription or find out more about this option.