Examining the prevalence and patterns of multimorbidity in Canadian primary healthcare: a methodologic protocol using a national electronic medical record database

In many developed countries, the burden of disease has shifted from acute to long-term or chronic diseases – producing new and broader challenges for patients, healthcare providers, and healthcare systems. Multimorbidity, the coexistence of two or more chronic diseases within an individual, is recognized as a significant public health and research priority. This protocol aims to examine the prevalence, characteristics, and changing burden of multimorbidity among adult primary healthcare (PHC) patients using electronic medical record (EMR) data. The objectives are two-fold: (1) to measure the point prevalence and clusters of multimorbidity among adult PHC patients; and (2) to examine the natural history and changing burden of multimorbidity over time among adult PHC patients. Data will be derived from the Canadian Primary Care Sentinel Surveillance Network (CPCSSN). The CPCSSN database contains longitudinal, point-of-care data from EMRs across Canada. To identify adult patients with multimorbidity, a list of 20 chronic disease categories (and corresponding ICD-9 codes) will be used. A computational cluster analysis will be conducted using a customized computer program written in JAVA. A Cox proportional hazards analysis will be used to model time-to-event data, while simultaneously adjusting for provider- and patient-level predictors. All analyses will be conducted using STATA SE 13.1. This research is the first of its kind using a pan-Canadian EMR database, which will provide an opportunity to contribute to the international evidence base. Future work should systematically compare international research using similar robust methodologies to determine international and geographical variations in the epidemiology of multimorbidity.


Introduction
In many developed countries, populations are experiencing a transition, where the burden of illness is shifting from acute to long-term or chronic diseases -producing new and broader challenges for patients, healthcare providers, and healthcare systems [1,2]. The progressive aging of individuals, improved medical services, and advancing health technologies have led to increased survival among patients with chronic disease. While this is a success of modern medicine, this increased survival has resulted in growing numbers of patients living with multiple chronic diseases and experiencing greater healthcare needs [3][4][5][6][7][8][9][10]. Multimorbidity, the coexistence of multiple chronic diseases within an individual, is now recognized as a significant health system cost and a major public health and research priority [6,9,[11][12][13][14][15].
Although the prevalence of multimorbidity increases substantially with age, this phenomenon is increasingly being seen in younger populations, as recent studies have found larger absolute numbers of primary healthcare (PHC) populations under the age of 65 years living with multimorbidity [4,8]. Generally, a PHC population consists of patients seeking integrated and accessible care from a practitioner who: (1) is the first level of contact with the healthcare system; (2) addresses the large majority of personal healthcare needs; and (3) develops a sustained partnership with patients in the context of family and community [16,17]. Multimorbidity is recognized as the norm, rather than the exception, in PHC populations [10,18]. In fact, the focus of PHC in many developed countries, including Canada, is principally centered on the treatment and management of chronic diseases, which are often occurring in multiples. Deemed an "endless struggle" by PHC providers, patients experiencing multimorbidity require an integrated healthcare system that adequately responds to their complex and changing needs [19,20]. These patients represent unique clinical profiles, suffering from distinct combinations of chronic diseases, which can escalate the challenge for providers [21][22][23]. Clinical and epidemiologic research has yet to provide robust data and evidence on multimorbidity, comparable to information that is readily available for single chronic diseases [24]. Enhanced understanding of multimorbidity prevalence, characteristics, determinants, and prognosis over time is still needed.
Multimorbidity has been conceptualized in many different ways in previous literature, and to date, no "gold standard" measure of multimorbidity has been established. Diederichs et al. [25] conducted a systematic review that identified 39 different multimorbidity measures. Some measures are based on simple counts of chronic diseases (with considerable variation in the "list" of diseases used), while other measures differentially weight diseases to account for burden of illness or number of body systems affected [25,26]. Many commonly used measures of multimorbidity were originally developed and validated among elderly patient populations or hospital-based populations [27]. The marked variation in study methodologies has produced differing prevalence estimates, even among similar PHC populations. In a recent comparison of three studies examining the prevalence of multimorbidity, prevalence levels reported among PHC patients ranged from 34% to 95%, indicating as much as 61% variation in estimates [24]. Not only does this persistent heterogeneity in methodology create incomparable research findings, it also hinders the ability to make informed health system and health policy decisions [11,24].
To contribute to the growing international evidence base, a national study examining the prevalence and patterns of multimorbidity from the Canadian PHC perspective will be conducted. Although principally used for clinical purposes, electronic medical records (EMRs) can provide rich insight for academic researchers. These clinical data contain longitudinal, patient-level information that present a unique opportunity to examine both the onset and changing burden of multimorbidity over time [3,6]. The protocol described herein aims to capitalize on this opportunity. This research will examine the burden of multimorbidity among adult PHC patients in Canada, through the use of EMR data.

Objectives
The objectives of this research are two-fold. Both objectives will contribute to the understanding of multimorbidity in PHC, using a national EMR database. The first objective is to measure the point prevalence and clusters of multimorbidity among adult PHC patients. This objective will aim to understand the overall burden of multimorbidity among adult PHC patients, as well as the most frequently occurring permutations and combinations of chronic disease diagnoses. The second objective is to examine the natural history and changing burden of multimorbidity over time among adult PHC patients. This objective will examine the timeto-event patterns of multiple chronic disease diagnoses, accounting for both provider-and patient-level baseline predictors.

Study design
The key methodologic considerations that should be explicitly described in cross-sectional and retrospective cohort studies examining multimorbidity are defined as the "Methods Crystals for Multimorbidity" by Stewart et al. [24]. These elements have been notably absent in previously published multimorbidity literature, yet are important to ensure comparable and transparent findings. Following the "Methods Crystals for Multimorbidity" structure, the main study design elements for this research protocol are described more fully in Table 1 [28][29][30]. While clinical events and encounters with patients are recorded in the EMR prospectively by PHC providers, this research will utilize a retrospective or historic cohort design using existing EMR data. To be included in both objectives, individuals must have at least one in-office encounter date recorded in the EMR and be identified as "adult" patients (at least 18 years of age) as of their first encounter date. Those patients who are under the age of 18 years at their first encounter date or who do not have a detectable in-office encounter recorded in the EMR will be excluded. Those patients who have opted-out of contributing their data to the EMR database will also be excluded from analyses. Ethical approval has been obtained from the Research Ethics Board at Western University (Approval Notice #104705).

Data source
For both objectives, data will be derived from the Canadian Primary Care Sentinel Surveillance Network (CPCSSN). The CPCSSN database contains longitudinal, point-of-care data from EMRs, which are extracted on a quarterly basis by CPCSSN data managers from participating PHC practices [31,32]. These data are then de-identified, cleaned, coded, and transformed into a common data format for compilation into the secure CPCSSN database. As of the data extraction period for this research (September 30, 2013), a total of 600,265 de-identified electronic patient records were collected from 475 PHC providers, referred to as "sentinels" by CPCSSN, in 10 regional networks across Canada. The CPCSSN data elements that will be used contain information on practice characteristics (e.g. geographical location); provider characteristics (e.g. provider birth year, provider sex); patient characteristics (e.g. patient birth year, patient sex, first three letters of residential postal code); and in-office encounters (e.g. encounter date, billing diagnosis codes, encounter diagnosis codes). The majority (approximately 95%) of diagnostic codes within the CPCSSN database are recorded using the International Classification of Disease, 9th Revision (ICD-9) system. As such, these codes will be used to identify chronic disease diagnoses.

Identifying chronic disease diagnoses
Within the CPCSSN EMR data, there are two potential sources of diagnostic codes that are accessible for research purposes. These two sources are the Billing Diagnosis Codes and the Encounter Diagnosis Codes. Both sets of diagnosis codes are recorded using the ICD-9 system, by administrative staff or PHC providers (e.g. nurses, nurses practitioners, medical residents, family physicians), to reflect the patient's ongoing health status. Each diagnostic code is documented with an associated date (day, month, and year) on which the diagnosis occurred. Initial data exploration indicated variation in where the majority of diagnosis codes were recorded, between these two sources. For example, some practice sites and/or providers primarily use the Billing Diagnosis Codes to record information, while others use the Encounter Diagnosis Codes to do so. Consequently, to capture the maximum amount of data from the patient record, the average number of Billing Diagnosis Codes (total number of billing diagnosis codes divided by the total number of patient encounters) and the average number of Encounter Diagnosis Codes (total number of encounter diagnosis codes divided by the total number of patient encounters) will be calculated on a patient-bypatient basis. The source (Billing Codes or Encounter Codes) with the larger average number of diagnostic codes will be selected for each patient. In addition to using the maximum amount of diagnostic information and avoiding duplicate diagnoses, this approach will also address the variability in diagnostic recording at the patient, provider and practice levels.

Identifying patients with multimorbidity
To identify adult patients with multimorbidity, we will use a list of 20 chronic disease categories (and corresponding ICD-9 codes) created by a nationally funded (Canadian Institutes of Health Research) research project examining Patient-Centred Innovations for Persons with Multimorbidity (PACE in MM). This community-based primary healthcare (CBPHC) project aims to improve the delivery of appropriate, high-quality, and patient-centered interventions to those with multimorbidity [33,34]. The list was created based on the international literature that examined the burden of multimorbidity among PHC patients, particularly using comprehensive national EMRs [6,8,25,[35][36][37][38][39][40]. The 20 chronic disease diagnoses in the list are particularly relevant in clinical and general populations in Canada. In a separate study, this list will also be validated to ensure it is fully capturing the complex concept of multimorbidity. The complete list of chronic disease categories, as well as corresponding ICD-9 disease codes, are presented in Table 2. In some categories, overlapping ICD-9 codes are presented to ensure that all relevant codes are captured. For example, in the disease category "Thyroid problem", a range of disease codes, as well as the individual codes, are presented and can be included. The comparison with previously used lists in multimorbidity research is presented in Table 3 [4,8,25,[35][36][37][38][39][40][41][42][43][44][45][46][47][48].

Data analyses
The first objective will examine the overall burden of multimorbidity in terms of its point prevalence and the clusters of multiple chronic disease diagnoses that tend to occur together. For this objective, patients will be followed over time and each chronic disease diagnoses (from the list of 20) received by each patient will be identified. Patient characteristics (e.g. patient age, patient sex, and residential location) will be compared with the broader CPCSSN PHC population, as well as with the general adult Canadian population. Prevalence estimates will be calculated using mutually exclusive count numerators (e.g. patients with 2, 3, 4, and 5 or more chronic diseases) and for each calculation, the denominator will be all eligible adult PHC patients (N=367,743). Prevalence estimates, and corresponding 95% confidence intervals, will be calculated using the proportion procedure in STATA SE 13.1 [49]. These estimates will be stratified by patient age and sex categories, as well as provider age and sex categories, to investigate distinct patterns of multimorbidity. Additionally, prevalence estimates will be stratified by the patient's residential location, which will be determined using the patient's forward sortation area. More specifically, the second character of the patients' postal code will determine their residence in a rural (second character is a zero) or urban (second character is a value from one to nine) setting as defined by Canada Post. Among patients with multimorbidity, the frequency of ordered and unordered clusters of chronic disease types will be computed using a customized computer program written in JAVA. The most commonly occurring combinations and permutations of chronic diseases will be presented.
The second objective will examine the time-to-event patterns of multimorbidity by observing the time elapsing between subsequent chronic disease diagnoses. For this objective, patients with at least one chronic disease diagnosis will be included and four patient groups will be created: (1) patients with one or more chronic disease diagnoses by the end of the observation period; (2) patients with two or more chronic disease diagnoses by the end of the observation period; (3) patients with three or more chronic disease diagnoses by the end of the observation period; and (4) patients with four or more chronic disease diagnoses by the end of the observation period. The details of these patient groups are depicted in Figure 1. The event of interest will be the next chronic disease diagnosis (regardless of diagnosis type). Survival analysis techniques allow for staggered entry dates of patients into the study, as well as right censoring if a patient does not experience the event of interest by the end of the observation period. This will maximize the amount of information contributed by each patient.   Cox proportional hazards analysis will be used to model time-to-event data, while simultaneously adjusting for provider-and patient-level predictors, and accounting for issues such as patient attrition or delayed entry into observation [50,51]. The Cox proportional hazards analysis will be conducted using the stcox procedure in STATA SE 13.1 [49], and the effects of clustering will be accounted for using a robust variance estimator. Each Cox proportional hazards model will then be built with the provider-and patient-level covariates that report p-values of <0.2 in univariate analyses. Interactions among included covariates will be explored, including relevant interaction terms (at a significance level of 0.05) in the final Cox proportional hazards model. The proportional hazards assumption that is inherent in Cox models will be assessed by including timedependent covariates in the model by using the tvc and the texp options in the stcox procedure. Time-dependent covariates capture interaction of covariates and time. If non-significant, the proportionality assumption is maintained by that covariate. Schoenfeld residuals will also be explored using the stphtest procedure, in which the proportionality of the model as a whole and the proportionality for each predictor will be assessed. Once again, non-significant tests indicate no violation of proportionality assumption.

Anticipated challenges
There are three anticipated challenges of this research: (1) degree of completeness, correctness, and comprehensiveness of the EMR data; (2) limited availability of socioeconomic variables in the EMR data; and (3) the limited generalizability of research findings to the general Canadian population. The first challenge has been well recognized in work that has examined the benefits and limitations of EMR data, particularly for clinical and epidemiologic research [52]. Incomplete or missing data are often a limitation of using EMRs for research, primarily because EMRs are designed to support clinical care delivery and are not structured in a way that easily facilitates use in research [53][54][55]. Incomplete or free-text data entry by providers may underestimate the prevalence of chronic diseases within the CPCSSN database as these data entries are not included in data extraction or final analysis. This may be particularly true for those diseases with less clear diagnostic features, such as asthma or depression [56,57]. Before being entered into the final statistical analyses, variables will be assessed for missingness and outliers that may indicate inaccurate data recording.
The second challenge is the lack of availability of sociodemographic variables (e.g. patient ethnicity, education level, employment status, income level) within the Canadian EMRs. When recorded, these variables often contain incomplete data that cannot be used reliably in statistical analyses. This represents an important limitation as previous literature has highlighted the impact of social deprivation (e.g. low income level, low education level, unemployment, barriers to housing) on the development of multimorbidity, particularly at younger ages [3,4,8]. Although each patient's age, sex, and residential location will serve as patient-level predictors of multimorbidity, these variables will not completely account for the socioeconomic factors impacting health. This is indeed an area that requires further attention from providers using EMRs for clinical care.
The third anticipated challenge is that the CPCSSN database does not contain comprehensive data for the entire Canadian population and, therefore, does not represent the burden of multimorbidity for the general adult population in Canada. The CPCSSN database is made up of a selected sample of PHC providers who use EMRs, as well as the patients of these providers. A recent study compared the characteristics of the CPCSSN providers with the respondents of the 2010 National Physician Survey; in which a higher proportion of CPCSSN PHC providers were women and slightly younger in age, while the geographic distribution of the providers was similar to the national characteristics [58]. Likewise, the representativeness of the CPCSSN population was assessed. While this study will compare the characteristics of the adult PHC patients with the characteristics of the broader adult population, in order to determine the degree of generalizability and representativeness of the CPCSSN data, the eventual findings will specifically present the burden of multimorbidity in the PHC setting.

Anticipated strengths
This research is the first of its kind using a national EMR database, which will provide needed insight and an opportunity to contribute to the international evidence base. Although this clinical information is not principally recorded for research purposes, the CPCSSN database has recently become more accessible to academic researchers for use in innovative projects relevant to CPCSSN's mission and vision. These data represent the only pan-Canadian EMR database and are recognized as a rich source of PHC information. The previously described approach of identifying chronic disease diagnoses on a patientby-patient basis will maximize the amount of clinical information derived from each patient's electronic record, providing insight into PHC beyond what is typically gained from population surveys, administrative databases, and billing information. Furthermore, the computational techniques to determine the most frequently occurring combinations and permutations of multiple chronic diseases will be made accessible to other multimorbidity researchers, with the potential for similar international work.

Anticipated research outcomes
The first objective will allow for comparisons with international prevalence estimates of multimorbidity and its associated burden; while the second objective will address an important and noted gap in understanding the prognosis of multimorbidity using longitudinal clinical data. The list of 20 chronic diseases for our multimorbidity definition is in accordance with a recent systematic review, which recommended that investigators "should consider the number of diagnoses to be assessed (with at least twelve frequent diagnoses of chronic diseases appearing ideal) and should attempt to report results for differing definitions of multimorbidity (both at least three disease and the classic at least two diseases)" [11]. Finally, this protocol responds to the call for publication of protocols in multimorbidity research and aims to support the transparency, reproducibility, and replication of this research methodology [59]. This could facilitate the creation of comparable estimates of multimorbidity across patient populations, both in Canada and abroad.

Anticipated clinical-and policy-level impact
This research will have both clinical and policy relevance. The complexities of multimorbidity create heterogeneity in the experiences of patients as they cope with and receive clinical management for their multiple chronic diseases. This is further complicated by the heterogeneity in the clinical profile, or disease combination, each patient experiences. Combined with the current lack of evidence-based clinical practice guidelines that facilitate patient-centered and coordinated care, these complex clinical pathways and clinical profiles have significant implications for healthrelated outcomes and use of healthcare resources. As such, national multimorbidity estimates will help to inform where the redevelopment of clinical practice guidelines must focus to have the greatest clinical impact. From a public health or health policy perspective, the growing burden of multimorbidity consumes considerable societal and economic resources, and negatively impacts satisfaction with care delivery, quality of life, and productivity of patients and their caregivers. Examining the most frequently occurring clusters of chronic disease, and patients who are most at risk of subsequent chronic disease diagnoses, can help inform the development of clinical-or population-level interventions to relieve this tsunami of health demands and to provide robust support needed by all stakeholders [12,18,24,60].

Conclusion
This protocol aims to examine the prevalence and changing burden of multimorbidity among adult PHC patients using EMR data. As electronic records are increasingly being used for academic research and health system planning, these data must be managed and analyzed properly. The findings of this research will be disseminated through publication and presentation to academic researchers, decision-makers, and healthcare professionals. Future work should systematically compare international research using similar methodologies (e.g. definitions of multimorbidity, data sources, populations of interest) to explore international and geographical variations in the epidemiology of multimorbidity. Finally, a concerted and multifaceted effort must be made to establish effective and patientcentered interventions that help to alleviate the burden of multi morbidity for patients, caregivers, and healthcare providers into the future.