Frequency and types of clusters of major chronic diseases in 0.5 million adults in urban and rural China

Background Little is known about the frequency and types of disease clusters involving major chronic diseases that contribute to multimorbidity in China. We examined the frequency of disease clusters involving major chronic diseases and their relationship with age and socioeconomic status in 0.5 million Chinese adults. Methods Multimorbidity was defined as the presence of at least two or more of five major chronic diseases: stroke, ischaemic heart disease (IHD), diabetes, chronic obstructive pulmonary disease (COPD) and cancer. Multimorbid disease clusters were estimated using both self-reported doctor-diagnosed diseases at enrolment and incident cases during 10-year follow-up. Frequency of multimorbidity was assessed overall and by age, sex, region, education and income. Association rule mining (ARM) and latent class analysis (LCA) were used to assess clusters of the five major diseases. Results Overall, 11% of Chinese adults had two or more major chronic diseases, and the frequency increased with age (11%, 24% and 33% at age 50–59, 60–69 and 70–79 years, respectively). Multimorbidity was more common in men than women (12% vs 11%) and in those living in urban than in rural areas (12% vs 10%), and was inversely related to levels of education. Stroke and IHD were the most frequent combinations, followed by diabetes and stroke. The patterns of self-reported disease clusters at baseline were similar to those that were recorded during the first 10 years of follow-up. Conclusions Cardiometabolic and cardiorespiratory diseases were most common disease clusters. Understanding the nature of such clusters could have implications for future prevention strategies.


Introduction
Multimorbidity, involving the occurrence of two or more chronic diseases (where none is considered an index condition), 1 is a growing public health problem worldwide. Improvements in access to health care and socioeconomic circumstances, in addition to epidemiological and demographic transitions, have resulted in significant improvements in life expectancy in low-and middle-income countries (LMICs), including China. 2 However, these improvements in life expectancy have been accompanied by an increased prevalence of major chronic noncommunicable diseases and multimorbidity. 3 Multimorbidity is associated with a reduced quality of life, 4 and results in substantial economic burden for healthcare systems. 5 Studies in high-income countries (HICs) have reported that the both frequency of multimorbidity and the proportion of the population affected by multimorbidity have increased in recent years. 6 However, the extent to which multimorbidity varies by age, sex and socioeconomic circumstances in LMICs, such as China, is uncertain. 7,8 The leading causes of years of life lost (YLLs) in China in 2017 included stroke, ischaemic heart disease (IHD), lung cancer, chronic obstructive pulmonary disease (COPD) and liver cancer. 3 The prevalence of cardiometabolic multimorbidity has doubled in recent years in China, 9 possibly reflecting the high prevalence of established risk factors for cardiovascular disease (CVD) including hypertension, smoking in men and type 2 diabetes (T2D). 10 Previous studies in Chinese adults have reported on the distribution of self-reported multimorbidity, but such studies have been constrained by reliance on self-reported diseases, small sample sizes 11 or use of cross-sectional study designs. 4,10,12,13 A systematic review of chronic disease multimorbidity in China reported that there was marked methodological heterogeneity among prevalence studies. 14 A review of methodology found that clustering methods were the most widely used analytical approaches to identifying disease patterns in order to group together similar diseases that are found in the same individuals. 15 For example, a recent paper used hierarchical agglomerative cluster analysis to assess multimorbidity patterns among a middle-aged population in Hong Kong. 16 Association Rule Mining (ARM) is an alternative descriptive method that can reveal strong and frequent associations between diseases based on measures of 'interestingness' related to the effect sizes associated with particular patterns. 17 As such, ARM can provide a different perspective to traditional clustering methods and has been utilized previously to assess multimorbidity in a Chinese population. 18 We used data from the large nationwide China Kadoorie Biobank (CKB) prospective study to: (i) investigate the frequency of multimorbidity involving five major chronic diseases (stroke, IHD, COPD, T2D and cancer) overall and by age, sex, region and socioeconomic status; and (ii) assess how these five major diseases clustered in this population using a novel 'association rules' method in order to rank important disease associations which occur more often than would be expected by chance.

Study population
Details of the CKB study design and methods of follow-up have been previously described. 19,20 Briefly, the baseline survey was conducted between June 2004 and July 2008 in 10 diverse regions (five rural and five urban), chosen from China's nationally representative Disease Surveillance Points (DSP) system. Regions were selected to maximize diversity in exposure and disease patterns, population stability and access to reliable evidence on death and disease incidence and local capacity. About 1.8 million people without major disability and aged 35-74 years were invited to participate in the study. Approximately 30% responded, and 512,726 individuals were enrolled, including 13,287 slightly outside the target age range of 30-79 years. Ethics approval was obtained from local, national and international ethics committees, and all participants provided written informed consent both for enrolment and for the follow-up study.

Data collection
Trained local health workers collected information on sociodemographic factors, lifestyle (e.g. alcohol consumption, smoking, diet, physical activity) and prior medical history (including doctor-diagnosed IHD, stroke or transient ischaemic attack [TIA], cancer, COPD and diabetes) using interviewer-administered laptop-based questionnaires. Current use of medications for treatment of CVD was recorded for individuals who reported a prior history of IHD, stroke/TIA, hypertension or diabetes. Physical measurements, including blood pressure, lung function, height, weight and other anthropometric measures, were recorded using calibrated instruments and standard methods in all study regions. BMI was calculated as weight in kilograms divided by the square of the height in meters. A non-fasting venous blood sample was collected, for onsite random plasma glucose (RPG) testing using the Surestep plus meter (LifeScan, Milpitas, CA, USA), and remaining plasma was stored in liquid nitrogen. Data were collected on the time since the last meal. Individuals without self-reported diabetes and with an RPG level of 7.8-11.0 mmol/L were invited back the following day for a fasting plasma glucose (FPG) test. Prevalent diabetes was defined as a combination of self-reported and plasma glucose measurements above pre-specified cutoffs. Specifically, individuals with prevalent diabetes were defined as those with random plasma glucose (RPG) ≥7.0 mmol/L and time since last meal ≥ 8 h, or RPG ≥11.1 mmol/L and time since last meal <8 h or FPG ≥7.0 mmol/L on subsequent testing at baseline or selfreported diabetes at baseline

Definition of multimorbidity
Major diseases for the present analyses were defined as the five major non-communicable diseases that accounted for most deaths in China, and included stroke, IHD, cancer, COPD and T2D. 21 We included both self-reported diseases at enrolment (prevalent cases) and new-onset diseases during the follow-up period (incident cases), and the numbers for prevalent and incident diseases are shown in Webfigure 1. Multimorbidity was defined as presence of two or more of these five major diseases at baseline and during the first 10 years of follow-up. A recent systematic review of multimorbidity reported that these five diseases were among the top 20 causes of high disability adjusted life years or high years of life lost, of which four (cancer, coronary heart disease, stroke and diabetes) were among the top five causes. 22

Follow-up for morbidity and mortality
The status of study participants was monitored every 6 months by linkage to death registries, checked annually against local residential and health insurance records and through active confirmation by street committees and village administrators. The causes of death were coded by trained public health workers, blinded to baseline information, using the International Statistical Classification of Diseases and Related Health Problems, 10th revision (ICD-10). Data on hospital admissions were collected by electronic linkage, via each participant's unique personal identification number, with established study-specific registries (for IHD, stroke, diabetes and cancer) and with the national health insurance reimbursement system for hospitalizations which provided almost universal coverage (∼97%) of study participants. Incident stroke, cancer, IHD, COPD and diabetes events were identified from the national health insurance system, disease registries and death certificates during follow-up until 31 December 2017. The ICD-10 codes are reported in Webtable 1. By 31 December 2017, 49,459 (9.6%) participants had died and 5302 (1.0%) were lost to follow-up.

Statistical analysis
The mean values or proportions of baseline characteristics were estimated by occurrence of the five major diseases, standardized to the CKB population by 10-year age groups, sex and region, where appropriate. The frequencies of different combinations of these diseases were estimated for 10-year age groups, after adjustment for sex and region. Association Rule Mining (ARM), that uses a rule-based machine learning method, was used to examine the associations between individual diseases, 23 and to identify combinations of the five major diseases. 24 The derived association rules were visualized by graphical techniques using the R package arulesViz. 25 The annotated values represent the diseases, and the arrows indicate the rules. The association rule yielded the ratios of observed/expected frequencies for each disease (the proportion of the dataset observed with a given comorbidity divided by the product of the prevalence of the individual diseases) to identify comorbidities that occur more often than would be expected by chance alone. This observed/expected ratio is referred to as the lift of an association rule and support is an indication of the frequency of each cluster of the five major chronic diseases. If the lift is > 1, this indicates that the two diseases were dependent on each other, and such rules were likely to be informative for predicting the risk of subsequent diseases. If the lift was < 1, this indicates that each of the two diseases had a negative effect on the presence of other disease and vice versa. The support and lift measures for each rule were displayed by the size and colour of the nodes. The support size was scaled for each plot individually (see eMethods for further details).
Sensitivity analyses were conducted after: a) excluding individuals with a prior history of stroke, IHD or cancer at baseline; and b) excluding participants with a prior history of any of the five major chronic diseases at baseline.
Additional sensitivity analyses utilized the more established Latent Class Analysis (LCA) in order to assess the concordance of the findings. For the CKB analyses of five major disease outcomes (stroke, cancer, IHD, COPD and diabetes), we conducted an LCA using the R package poLCA. 26 LCA takes multiple diseases into account, and enables modelling of multiple comorbidities. Models were fitted without covariates with different numbers of prespecified latent classes (between 2 and 5), with 5000 iterations to assess convergence of the model and the model was repeated 10 times with different starting values in order to assess the robustness of the estimation. The best fitting model was assessed via assessment of the Bayesian Information Criteria with the lowest value selected as this has been shown to be a more robust indicator of class enumeration. 27,28 The likelihood ratio and Chi-square statistics were also utilized to assess model fit. Each latent class was labelled according to those chronic conditions whose prevalence exceeded the prevalence in the full cohort. 29 After selection of an optimal model, each participant was assigned to the class for which they had the highest computed membership probability. An average posterior probability greater than 70% indicates an optimal fit. 30 The characteristics of participants in different latent classes were compared using the chi-squared test for sex and region and ANOVA for age as a continuous variable. The statistical analyses were performed using SAS version 9.4 and R version 4.0.5.

Population characteristics
Among the 512,726 participants, the overall mean (SD) age at baseline was 52 (10.7) years, 59% were women and 44% lived in urban areas. The mean (SD) levels of systolic blood pressure (SBP) and body mass index (BMI) were 131 (21) mmHg and 23.7 (3.4) kg/m 2 , respectively. At baseline, 74% of men and 3% of women reported smoking regularly, and 33% of men (n = 69 899) and 2% of women (n = 6245), respectively, reported drinking alcohol at least once each week (Webtable 2).

Distribution of individual major chronic diseases
Stroke and IHD were the most frequent self-reported diseases at baseline and also accounted for the most frequent incident diseases occurring during follow-up. Overall, 13% (n = 65 025) of participants suffered a stroke, 12% (n = 63 164) had IHD, 10% (n = 50 570) had COPD, 10% (n = 49 194) had diabetes and 6% (n = 32 660) had cancer ( Table 1). The frequency of all five major diseases increased linearly with increasing age, particularly for stroke, IHD and COPD. Stroke, cancer and COPD were more common among men than in women, but there were no sex-differences for IHD or diabetes.
All diseases were more common in urban areas with the exception of COPD, which was more common in rural than urban areas. Most of the five diseases were also more common among those who had completed only primary education (<6 years). Except for diabetes and IHD, which were more common among people with an annual income greater than 20, 000 Yuan, all diseases were more common among participants with lower annual household income (Table 1). When the 10 regions were investigated separately, IHD (27%) and stroke (27%), were most frequent in Harbin, while COPD (22%) was most frequently diagnosed in Sichuan and diabetes (13%) and cancer (9%) were most frequent in Qingdao (Webfigure 2).

Distribution of clusters of major chronic diseases
Overall, 2% (12,385) of participants had three or more of the major chronic diseases studied, 9% (45,237) participants had two diseases, 26% (131,441) participants had one disease and 63% (323,663) participants had none of the five diseases (Table 2). At baseline, participants with three or more major chronic diseases were older than those with none (63.5 [8.1] vs 48.8 [9.6] years, p < .001, respectively). After standardization for sex and region, only a few participants aged <60 years had any of the five diseases. Among participants aged 60-69 years, 37% had one disease, 18% had two diseases and 6% had three or more diseases. The highest proportion of participants aged 70-79 years at baseline had one disease (41%), one in four (25%) had two diseases and approximately one in ten (9%) had three or more diseases (Table 2). There were no differences between men and women in the frequency of major chronic diseases by age (Figure 1). The frequency of participants two or more chronic diseases was higher in urban than rural areas (9% vs 3%) as was the frequency of three or more conditions (8% vs 2%) (Table 2, Figure 1). Individuals who had completed only primary education had a higher frequency of two diseases, compared to those educated to middle-school level or higher (9% vs 8%). Participants whose annual income was less than 20 000 Yuan tended to have a similar frequency of two diseases as those with higher annual income (9% vs 9%). The proportions with two or more major diseases varied across the 10 study regions, ranging from 7% in Suzhou to 22% in Harbin, as did the proportion with three or more diseases (1% in Gansu to 6% in Harbin) (Figure 2). When prevalent and incident diseases were investigated separately, the proportions of participants with one, two and three or more diseases also increased with increasing age. For incident diseases, the overall proportions of participants with at least one disease were higher than for the prevalent diseases (Webfigure 3). Among participants aged 70-79 years, the proportions for new onset versus existing diseases were: 5% vs <1% for three or more diseases, 17% vs 5% for two diseases and for one disease 43% vs 31% (Webfigure 3). The support values suggest that overall, 4% of participants had multimorbidity related to stroke and IHD, 2% had multimorbidity based on diabetes and stroke, 2% had multimorbidity due to diabetes and IHD and 2% had multimorbidity due to COPD and IHD.

Patterns and clustering of major chronic diseases
Among people from any age group, 2% had multimorbidity combinations of three diseases and <1% had multimorbidity combinations of four of the five major chronic diseases (Webtable 4). Patterns of disease clustering were similar among men and women (Webfigures 4 and 5). The frequency of multimorbidity was greater in older people. Among participants aged 60-69 years, the most frequent patterns of multimorbidity included the combinations of IHD and stroke (9%), diabetes and stroke (5%) and diabetes and IHD (5%), while among those aged

Sensitivity analyses
After excluding 25,514 individuals with self-reported diagnoses of cancer, IHD or stroke at baseline similar patterns were observed for frequency of disease clusters involving incident cases of major chronic diseases by age, sex and region (Webtable 5). Moreover, consistent with the overall population, stroke and IHD were the most common manifestations of multimorbidity (Webfigure 7). In further sensitivity analyses after excluding 84,289 individuals with any of the five major chronic diseases that were self-reported at baseline, the combinations of diseases and associations with age, sex and region were unaltered (Webtable 6, Webfigure 8).
The LCA model fits for all participants without exclusions for prior disease are summarized in Webtable 7. The model with 4 latent classes and the smallest BIC (1638937) was selected. These classes were labelled 'mostly respiratory', 'mostly cardiometabolic', 'mostly diabetes' and 'relatively healthy' based on the estimated probabilities of any particular chronic disease given membership of a latent class (Webtable 8).
The LCA model fits for all participants with exclusions for all prior diseases are summarized in Webtable 9. The model with 4 latent classes with the smallest BIC (1113601) was selected. These classes were labelled 'mainly cancer', 'mostly cardiorespiratory', 'mostly cardiometabolic' and 'relatively healthy' based on the estimated probabilities of any particular chronic disease given membership of a latent class (Webtable 10). For both analyses, the 'relatively healthy' group included those with a low prevalence of all of the five major chronic diseases; the 'mostly cardiometabolic' group was populated by those with stroke, IHD and diabetes; 'mostly cardiorespiratory' contained those participants with COPD, stroke and IHD.
The relationships with age, sex and region of study participants by latent class membership without and with exclusions are presented in Webtables 11 and 12, respectively. For all participants, the 'relatively healthy group' was youngest with an average age of 50.3 whilst the 'mostly cardiometabolic group' was oldest with an average age of 61.7 (Webtable 11). There were clear sex-differences with a higher prevalence of females for three of the four latent classes, which is consistent with the CKB population as a whole. The 'mostly respiratory group included a higher proportion of in rural residents than urban (67.5% vs 32.5%), whereas the 'mostly cardiometabolic' group includes a higher proportion of urban residents (60.8% vs 39.2%, Webtable 11). There were also comparable differences by age, sex and region after excluding those with any of the prior diseases (Webtable 12). Overall, the LCA findings were broadly consistent with those of the main ARM analyses.

Discussion
This large prospective study demonstrated that about 11% of Chinese adults aged 35-79 years had multimorbidity. The prevalence of multimorbidity varied from one quarter in individuals aged 60-69 years to one-third in participants aged 70-79 years. Multimorbidity was more common in those living in urban than in rural areas, and in individuals with lower levels of education. The highest frequency of multimorbidity was observed in residents of Harbin (an urban area in the north of China) and the lowest frequency was observed in Gansu (a rural area in the northwest of China). The simple descriptive analyses suggested that there were no sex-differences in the combinations of the five major chronic diseases considered in this report.
Comparisons with previous studies of the prevalence of multimorbidity have been complicated by differences in diagnostic criteria or definitions used to define multimorbidity (mainly reflecting differences in the number of diseases or the conditions investigated), the age-range of participants and populations studied. 31 The present analyses used both prevalent and incident cases of five major chronic diseases, but the patterns were unaltered when the analyses were restricted to either prevalent or incident cases alone. The findings are consistent with those of a recent review of studies from LMIC and HICs that reported that multimorbidity was more common in individuals with lower levels of education and lower income. 32 There were no differences between men and women in our simple descriptive analyses, but our ARM and LCA analyses found sex-differences similar to previous reports from China. 13 However, this requires further investigation, including more diseases than those considered in the present study. A recent report from the China Health and Retirement Longitudinal Study, that used similar graphical methods to ours, but included more self-reported diseases at baseline, reported a higher prevalence of most major chronic diseases in women than in men, which they attribute to the longer life expectancy in women than in men. 33 Previous population-based studies in China, investigating self-reported major chronic diseases, have reported a similar prevalence and pattern of multimorbidity with age and identified stroke and IHD as the most common disease combinations. 12 , 6,13 A previous CKB report that defined multimorbidity as the presence of two or more of 16 selfreported conditions at baseline found a higher prevalence among the elderly and those of lower socioeconomic status. 34 Another CKB report investigated the impact of lifestyle factors (e.g. smoking, alcohol, diet, physical activity and body shape), on the development of cardiometabolic multimorbidity (CMM) and found that lifestyle played a role in transition from first cardiometabolic disease to CMM and to death. 35 In the present report, the ARM analyses demonstrated that stroke and IHD were the most frequent combinations of multimorbidity. The analyses also identified diabetes and stroke, IHD and diabetes, COPD and IHD and COPD and stroke as the most frequent combinations in this population. These findings were supported by similar results from a latent class analysis. Other studies of multimorbidity (that included more diseases and used a different methodology) in HICs have also reported a high frequency of cardiorespiratory diseases, 36,37 consistent with the findings of the present study.
The present study had several strengths, including large sample size, and a focus on well-characterized diseases involving a shared aetiology. The use of association rules enabled the evaluation of the strength of the relationships between different combinations of five major chronic diseases. The use of ARM facilitates assessment of the ranking of major chronic diseases which occurred more often than would be expected by chance alone. The ARM findings were also supported by LCA which also enabled comparisons of the estimated latent classes by age, sex and region. We also used a combination of incident and prevalent disease in our analyses in contrast to most previous studies which have restricted the definition of multimorbidity to prevalent diseases only.
However, this study had several limitations. Firstly, we only considered five major chronic diseases that have a high burden of disability adjusted life years and years of life lost in China. However, our analytical approach could easily be adapted to include a wider range of chronic diseases, and the present study serves as a useful exemplar of this analytical approach. Secondly, we did not consider risk factors for multimorbidity in this report. Given the definition of multimorbidity used, incorporating both selfreported disease at enrolment and incident disease during follow-up, investigation of the associations of individual risk factors (e.g. smoking, adiposity or blood pressure) with risk of multimorbidity would be constrained by the potential for reverse causality bias (whereby the onset of disease may be associated with changes in the levels of such risk factors). However, the inclusion of both prevalent and incident diseases ensured a comprehensive understanding of disease patterns and clustering. Moreover, the use of association rules to identify combinations of diseases is largely independent of reverse causality. Finally, the CKB is not representative of the Chinese population, although the associations of multimorbidity with age, income and education observed in the simple descriptive analyses used in present study are consistent with another nationally representative study from China that investigated multimorbidity using 14 diseases. 38 Moreover, previous analyses have demonstrated the consistency between the results of the CKB and nationally representative surveys of the prevalence of certain chronic diseases. 39 Overall, the higher frequency of multimorbidity based on five major chronic diseases in this Chinese population was positively correlated with age, differed by sex and region and was inversely related to duration of education. Cardiometabolic and cardiorespiratory diseases were identified as the most frequent disease clusters. The findings of the present study suggest that prevention strategies should target clusters of diseases, rather than individual diseases. Moreover, understanding disease clusters could inform appropriate therapeutic approaches and minimize risks of iatrogenic multimorbidity, for example, through use of medications that benefits some diseases but has an adverse effect on others.