A multiple sclerosis disease progression measure based on cumulative disability

Background: Existing severity measurements in multiple sclerosis (MS) are often cross-sectional, making longitudinal comparisons of disease course between individuals difficult. Objective: The objective of this study is to create a severity metric that can reliably summarize a patient’s disease course. Methods: We developed the nARMSS – normalized ARMSS (age-related MS severity score) over follow-up, using the deviation of individual ARMSS scores from the expected value and integrated over the corresponding time period. The nARMSS scales from −5 to +5; a positive value indicates a more severe disease course for a patient when compared to other patients with similar disease timings. Results: Using Swedish MS registry data, the nARMSS was tested using data at 2 and 4 years of follow-up to predict the most severe quartile during the subsequent period up to 10 years total follow-up. The metric used was area under the curve of the receiver operating characteristic (AUC-ROC). This resulted in measurements of 0.929 and 0.941. In an external Canadian validation cohort, the equivalent AUC-ROCs were 0.901 and 0.908. Conclusion: The nARMSS provides a reliable, generalizable and easily measurable metric which makes longitudinal comparison of disease course between individuals feasible.


Introduction
Multiple sclerosis (MS) is a chronic, neurodegenerative disorder of the central nervous system which requires lifelong care. As MS has a heterogeneous disease course, it is important to determine a patient's current and potential future severity in order to understand if interventions affect the disease course and to mitigate progression of disease. The most commonly used disability measure in both MS clinical practice and clinical trials is the expanded disability status scale (EDSS). The EDSS is an ordinal scale ranging from 0 (no neurological deficits) to 10 (death due to MS) and includes an assessment of eight functional systems by a neurologist during a clinical examination. 1 Due to limitations of the cross-sectional nature of the EDSS score, the MS severity score (MSSS) 2 and age-related MS severity score (ARMSS) 3 were created which enable a patient's EDSS to be ranked by years since disease onset or age, respectively. Compared to the MSSS score, the ARMSS may be more versatile as, instead of the date of MS symptom (which is often missing or imprecise), an individual's age is used. This may allow for inclusion of more patients within a given cohort. Also, use of the ARMSS score could eliminate any bias or imprecision introduced by the need for data on disease duration. However, there is still a need for more accurate disease severity and progression measurements that capture a complete overview of a patient's disease, regardless of when they are measured.
There has been a notable lack of a severity score to directly compare patient's overall disease course without undue sensitivity to follow-up time. The combinatorial weight-adjusted disability (COMBIwise) score was one notable effort to create a measure to predict future disability based on several criteria measured early in a patient's follow-up. 4 Although it made some progress towards a predictive score, some limitations were present such as (1) the requirement of precise information for the calculation, thereby reducing patient inclusion and (2) its comparatively low accuracy at predicting future disability. 5 In addition, an early effort to use serial EDSS measurements to construct a single metric based on area under the curve as an outcome for clinical trials did not attain widespread use. 6 Hence, we sought to use ARMSS to build a more comprehensive metric that would give a cross-sectional view of disease. The aim of this study was to investigate the performance of this new measure, normalized ARMSS (nARMSS), which we developed in order to create an instant overview of a patient's course as well as to provide a potentially enhanced ability to predict future disability. This score can be calculated solely from EDSS/ARMSS early in as well as throughout follow-up and shows strong correlation between early and future disability.

Materials and methods
This study included individuals with MS from two large clinical cohorts. The first, the Swedish MS registry (Swedish cohort) is a National registry of patients diagnosed with MS based on the McDonald criteria. 7 The registry is voluntary, and the vast majority of neurologists in Sweden electively participate, resulting in the inclusion of nearly 85% of MS patients in Sweden. 8 To be included in the study, individuals were required to have complete data on sex, date of birth, date of disease onset (first manifestation of MS) as well as >1 EDSS score (and the date of EDSS capture). These data were used to calculate the ARMSS against a published reference matrix of global values, 3 using the R package ms.sev version 1.0.4.
In this study, 20,025 individuals in the Swedish cohort with 121,616 clinical visits which included an EDSS measurement were eligible, of which 14,160 individuals were included based on the above criteria.
The second cohort for validation was from British Columbia, Canada (Canadian cohort) and has been previously described. 3,9,10 This cohort comprised 5989 eligible individuals.

Calculation of the nARMSS
The nARMSS was constructed based on serial EDSS scores for a patient and required at least two scores for calculation. Follow-up was defined as the time from the first recorded EDSS to the most recently recorded EDSS for all patients. First, an 'ARMSS integral' was calculated as the difference in the integrated area under the ARMSS scores from the expected median ARMSS of 5. The ARMSS integral can therefore be thought of as the area under the curve for all ARMSS measurements during follow-up in relation to the median value of 5. Positive values indicate an accumulation of disability greater than average for the patient's age(s) of disease. The equation for ARMSS integral is age age age n n ARMSS age age where n is the total number of ARMSS scores, age 1 is the age at first EDSS measurement, age n is the age at last EDSS measurement and ARMSS age is the ARMSS score at a given age.
The ARMSS integral can be considered a relative measure for the total disability experienced by a patient over follow-up years. In order to compare patients when ages and follow-up time are different, the nARMSS is the ARMSS integral which has been normalized using follow-up time. Figure 1 illustrates a sample calculation based on a single patient's EDSS and ARMSS scores.
The formula for the nARMSS is As such, the score varies between +5 and −5, since the ARMSS is scaled between 0 and 10 and the ARMSS integral is determined based on the deviation from 5. The nARMSS is therefore the normalized ARMSS over follow-up relative to median, and each increase in one unit gives the average increase in the ARMSS (and thus decile) from the average patient with identical disease timings.
An R shiny app for calculating nARMSS directly from serial EDSS scores for an individual can be found online at https://aliman.shinyapps.io/nARMSS/.
We also determined the ranges of ARMSS and nARMSS for all individuals in the Swedish cohort, defined as the difference in maximum and minimum values over follow-up. This was calculated to determine if any reduction in the range of nARMSS was present over ARMSS. These were then compared to determine the compression of the range in the nARMSS relative to ARMSS.

nARMSS association to future disability
We sought to determine if nARMSS early in followup could predict nARMSS for a patient's next years until 10 years of follow-up, without overlapping data.
The following procedures were therefore conducted in both the Swedish and Canadian cohorts.
nARMSS at 2 and 4 years were then used to predict nARMSS during the next 8 and 6 years of follow-up, respectively. The outcomes were recoded as binary where the most severe quartile of patients was coded as 1, and patients with scores in the remaining quartiles were coded as 0. To maximize the number of patients included, at 2 and 4 years of follow-up, the closest chronological EDSS was used (ranging from 0.5 to 3 and 2 to 5 years of follow-up, respectively). We used area under the curve (AUC) of the receiver operating characteristic (ROC), a metric constructing using the curve of sensitivity and specificity, to determine how well the nARMSS was associated with future disability. This procedure was repeated for nARMSS at 8 years to predict the following 7 years of follow-up, that is, 15 total years, as a final check of the reliability of this method.
Similar methods were conducted using both the MSSS and EDSS values at 2 and 4 years to predict the most severe nARMSS quartile over the next years to 10 years total follow-up, resulting in four separate AUC-ROCs per cohort. This was repeated using the average EDSS for all visits up to the timepoint used in the 2 and 4 year cutoffs to predict identical Q4 nARMSS binary outcomes. The average MSSS was also tested in an identical manner to average EDSS, in order to determine the Q4 nARMSS predictive ability.
Finally, the same approach was applied using secondary progressive MS (SPMS) as the outcome, coded as 1 for SPMS and 0 for those who had not converted to SPMS, as determined retrospectively by a neurologist. This included all previous tests using both the 2 and 4 year timepoints, with the outcome being SPMS status after 10 years of follow-up. Individuals who were SP at the 2 or 4 year timepoint were removed from the analysis.

nARMSS comparison to other MS outcomes
The various quartiles of nARMSS were compared to other measurements that are often used as severity outcomes in MS studies. The mean values of nARMSS, EDSS, first SDMT (Symbol Digit Modalities Test) ever completed at any point during follow-up time and first ever MSIS-29 (multiple sclerosis impact scale-29) physical and psychological scores were compared for patients in nARMSS quartiles, defined as the last nARMSS value. These were repeated for quartiles of EDSS and MSSS, defined as the average values over entire the follow-up. The SDMT provides a standardized measurement of cognitive ability. 11 Since a learning effect from repeated testing has been noted, only the first measurement was used. The MSIS-29 is a self-assessment consisting of 29 questions covering physical, psychological and well-being. 12 Similarly, the first scores for physical and psychological symptoms were measured separately. The SDMT and MSIS-29 were only available for the Swedish cohort, as these have become part of the routine clinical assessment in Sweden since the introduction of the second generation of DMTs in 2006. 13 Data availability Data from the Swedish MS registry used in this article can requested from the Karolinska Institutet. This requires both a data transfer agreement and required ethical permission facilitated between Karolinska Institutet and the institution requesting access to the data in accordance with the data protection legislation governing Europe, GDPR (General Data Protection Regulation). Researchers who are interested in obtaining data access should contact the corresponding author.

Results
Overall, we included 14,160 patients from the Swedish cohort and 5989 patients from the Canadian cohort. Sub-analyses used reduced sets of patients according to available data as indicated. Characteristics of the study population are presented in Table 1.

Swedish cohort data
The relationship between nARMSS from 2 to 10 years can be visualized in Figure 2.  Complete data on the AUC-ROC for each test made in both cohorts are given in Supplemental Table 1.
Supplemental Figure 1 illustrates all additional AUC-ROC plots using both Q4 of nARMSS and SP at 10 years of follow-up.

Validation in the Canadian cohort
Similar AUC-ROC curves were calculated in the Canadian cohort for nARMSS at 2 and 4 years to predict the most severe quartile after the subsequent years to 10 years of follow-up. These values were very similar to those of the Swedish cohort, with an AUC-ROC of 0.901 (95% CI = 0.877-0.924 for 2 years to predict next 8 years, n = 948) and 0.908 (95% CI = 0.886-0.929 for 4 years to classify the next 6 years, n = 904). All other variables used to predict the most severe nARMSS quartile showed slight reductions from the AUC-ROC values obtained in the Swedish cohort (Supplemental Table 1). A sub-analysis of nARMSS for those with and without missing SDMT indicated that mean nARMSS was significantly different for all quartiles. Q4 showed the largest increase in nARMSS between non-missing and missing data (quartile: available, missing, p value -Q1:  This indicates that more severe disability in Q4 is likely under-reported and that interquartile differences might increase with more complete data. This tendency towards less favourable outcomes with increasing quartiles of nARMSS is presented in Table 2.

Discussion
The AUC-ROCs show that the nARMSS has a strong capacity to predict future disability, even when only 2 years of follow-up is available. This allows the nARMSS to be used when categorizing individuals in severity studies with an improvement in accuracy when compared to the use of cross-sectional metrics such as the first or the last available EDSS/ARMSS or MSSS. A potential use of the nARMSS is as an outcome measure in studies such as genome wide association studies (GWAS) for disease severity and progression. In this setting, where severity is not stable and therefore noisy, isolating the signal using the nARMSS may allow for more accurate and reproducible results. Patients can also be included in research studies regardless of their age at measurement and follow-up time, without affecting the results due to large fluctuations in these factors.
When comparing the ranges for the ARMSS and nARMSS for patients over the entire course of their follow-up, the nARMSS in effect compresses the variability of values. The median nARMSS variation is approximately a third of that of the variation in ARMSS measurements, denoting close to a two-third reduction in the instability of serial EDSS/ARMSS measurements. It is precisely this reduction in variation, despite the similar scales, which gives the nARMSS increased utility as an overall marker of disease progression even when determined early in follow-up.
The reasons for using the ARMSS to construct such a metric, instead of the MSSS are (1) the potential increased size of the available patient pool, since recorded patient information may lack onset date but not age and (2) the elimination of the risk of systematic bias due to retrospectively assigning the date of onset. While the nARMSS has some power to predict SPMS after 10 years, EDSS scores alone have increased AUC-ROCs, likely due to the fact that individuals with high EDSS after 2 or 4 years of follow-up are more likely to convert to SP within 10 years. Since we have removed individuals with SP at the measurement point, these patients who remained and had high EDSS are closer to phenotype conversion. Other more accurate methods of predicting SPMS exist, which exceed the accuracy demonstrated here. 14 Similarly, in a clinical setting, the nARMSS may be useful as an additional data point for neurologists to use in combination with other factors and might be used to determine if a patient is likely to have a milder or more severe disease course early in treatment. Given data showing the importance of early treatment, this could be useful in clinical practice when making treatment decisions for early diagnosed patients. 15 However, since there are always exceptions to strong trends, clinical application should be undertaken with caution.
The main limitation of the nARMSS is that it is constructed from EDSS measurements and thus biased towards mobility, especially at higher scores (>5) which consist solely of physical factors. Cognitive disability, for example, is less well represented. Additional outcomes might better represent all aspects of the disease. For example, income and sicknessabsence data are available in Sweden, and both are correlated with cognitive decline. 16 However, comparisons between the nARMSS quartiles and first SDMT scores show negative correlation (Table 2), which as expected implies that cognitive decline is associated with the nARMSS as both indicate disability and SDMT may have a predictive role on motor disability. Similarly, both MSIS-29 physical and psychological scores showed correlation with the nARMSS, providing further confirmation of utility beyond only physical disability. Nevertheless, composite scores with SDMT, MSIS-29 and other metrics might lead to greater accuracies. However, it should be noted that relying on EDSS has the benefit of including nearly all patients due to the large data availability of such a metric.
While the correlation between the nARMSS early in follow-up and after 10+ years of follow-up is high, it should be noted from Figure 1 that the nARMSS does vary over a patient's disease course, ultimately reaching a nearly steady-state level. Therefore, it can be inferred that the metric becomes more accurate with more follow-up time, likely due to more complete information. Individuals enrolled into severity studies should use the most recent clinical measurement available to calculate nARMSS for greatest accuracy.
This metric could aid in the search for factors which are correlated with MS disease severity, such as genetic markers which are associated with increased disease severity. Furthermore, early identification of the potential future severity of an individual's disease could inform the most appropriate treatment option(s) for that patient. Finally, any alterations in disease progression could be more accurately captured, so that interventions and factors which improve disease course could be identified.

Supplemental material
Supplemental material for this article is available online.