A new protocol for exercise testing in COPD; improved prediction algorithm for WMAX and validation of the endurance test in a placebo-controlled double bronchodilator study

Background: Two new protocols have been developed for bicycle exercise testing in chronic obstructive pulmonary disease (COPD) with an individualized cardiopulmonary exercise test (ICPET) and subsequent customized endurance test (CET), which generate less interindividual spread in endurance time compared with the standard endurance test. Main objectives of this study were to improve the prediction algorithm for WMAX for the ICPET and validate the CET by examining treatment effects on exercise performance of indacaterol/glycopyrronium (IND/GLY) compared with placebo. Methods: COPD patients, with forced expiratory volume in 1 s (FEV1) 40–80% predicted, were recruited. Pooled baseline data from two previous studies (n = 38) were used for the development of an improved WMAX prediction algorithm. Additional COPD patients (n = 14) were recruited and performed the ICPET, using the new prediction formula at visit 1. Prior to the CET at visits 2 and 3, they were randomized to a single dose of IND/GLY (110/50 µg) or placebo. Results: The improved multiple regression algorithm for WMAX includes diffusing capacity for carbon monoxide (DLCO), FEV1, sex, age and height and correlated to measured WMAX (R2 = 0.89 and slope = 0.89). Treatment with IND/GLY showed improvement in endurance time versus placebo, mean 113 s [95% confidence interval (CI): 6–220], p = 0.037, with more prominent effect in patients with FEV1 < 70% predicted. Conclusion: The two new protocols for ICPET (including the new improved algorithm) and CET were retested with consistent results. In addition, the CET showed a significant and clinically relevant prolongation of endurance time for IND/GLY versus placebo in a small number of patients.


Introduction
Impaired exercise tolerance is a major problem in patients with chronic obstructive pulmonary disease (COPD) with usually multifactorial causes including airflow limitation, reduced gas exchange, peripheral muscle weakness and coronary heart disease. [1][2][3] It is associated with poor prognosis and impaired quality of life. 4 A bicycle constant work rate exercise test (CWRET) at 75-80% of peak maximum workload (W MAX ) is frequently used to assess the impact of COPD on functional capacity and the response to medical treatment or pulmonary rehabilitation. [5][6][7][8][9][10][11] Endurance time is a valuable outcome because it is related to multiple clinical aspects of disease severity in COPD, 12 and measured at a 'standard' CWRET, it would be most sensitive to interventions when it lasts between 3 and 8 min. 7,13,14 According to the American Thoracic Society/ European Respiratory Society Task force, an improvement of 46-105 s can be considered clinically important. 15 A limitation of the standard CWRET is the variation in endurance time among subjects. Reducing the interindividual variability in endurance time would allow smaller sample sizes and less costly clinical trials whose results may also be easier to interpret. 7 A standard CWRET is preceded by an incremental bicycle test designed to measure the patient's maximum peak work rate (W MAX ). A standard maximum test, which frequently uses a protocol similar to a cardiopulmonary exercise test (CPET), 16 may lead to either an underestimation or overestimation of W MAX , leading to a too long or short endurance time with the standard CWRET.
We embarked on a development programme with the objective to improve the protocol for the W MAX test and the standard CWRET. In the first of our studies 17 in patients with moderate to severe COPD, a protocol for a new, individualized, W MAX test was developed. This test will hereafter be denoted individualized cardiopulmonary exercise test (ICPET). During the ICPET, patients started cycling for 3 min at a workload of 40% of a predicted W MAX , calculated for each patient using a random forest model based on multicentre industry data. This part of the test was followed by a linear increase in load, estimated to reach predicted W MAX after an additional 8 min. Predicted values of W MAX correlated well with measured W MAX . However, the value for the slope for predicted versus measured W MAX was 0.50, which indicated that W MAX values in the high range (high performers) and the low range (low performers) were underestimated and overestimated, respectively. In Tufvesson and colleagues, 18 two different new test protocols for measuring bicycle exercise endurance time were compared with results from a standard CWRET. Both studies started with an initial period at 30-40% of W MAX . In the first new protocol, the endurance period started at 75% of W MAX , and thereafter, the workload increased stepwise until exhaustion. The second new protocol started at 70% of W MAX and thereafter increased in a linear fashion until exhaustion. Both protocols resulted in reductions of the standard deviation and the range of endurance time compared with the standard CWRET. Overall, the second new protocol showed more advantageous properties than the first when compared with results from the standard CWRET. 18 The second protocol served as a model for the design of the protocol used for the endurance test in this study, hereafter called the customized endurance test (CET).
The main objectives of this study were twofold: Part 1: to improve the prediction algorithm for W MAX to be used for the ICPET by using combined data from study cohorts included in our previous studies. 17,18 Part 2: to study the protocols for both the ICPET and the subsequent CET, the latter by examining the effect of the double bronchodilator [long-acting β 2 -agonist (LABA) plus long-acting muscarinic antagonist (LAMA), i.e. indacaterol and glycopyrronium (IND/GLY)] versus placebo on exercise performance.

Studies
The development programme to find a new exercise test consisted of three elements in three studies; A, B and C (this study): • Prediction algorithm: A prediction algorithm was constructed from baseline data derived from a multicentre study 6 and was used in studies A and B. 17,18 A new prediction algorithm was constructed based on baseline data from studies A and B, and applied in study C (see below). • ICPET: In study A, an ICPET with a linear increase of workload until exhaustion (W MAX ) was used. The ICPET in study A was compared with a standard, stepwise CPET. 17,18 In studies B 18 and C (see below), the same ICPET with minor changes was used. • CET: In study A, the CET used a stepwise increase of workload until exhaustion. The increase was deemed too aggressive and in study B, the CET was changed to a linear and lower increase of workload until exhaustion. Both studies A and B included a standard CWRET as control. 18 In this study, the same linear CET as in study B (with some minor changes, see below) was used.

Patients
All participants in the studies A to C had a COPD diagnosis and a postbronchodilator FEV 1 of 40-80% of predicted normal (%pred) and a ratio of FEV 1 to forced vital capacity (FVC) of ⩽0.7. No other lung function criteria were applied for selection of the study population. Patients with a history of COPD exacerbation within 6 months prior to the study or with cardiovascular disease or any other condition that was considered to put the patients at increased risk or interfere with the examinations in the study were excluded.

Ethics
The Regional Ethical Review Board in Lund, Sweden, approved the studies, which complied with the Declaration of Helsinki. Approval number: 2013/850. Written informed consent was obtained from all participants prior to any studyrelated procedure.

Part 1 Improvements of the prediction algorithm (studies A and B).
Pooled baseline data from studies A and B 17,18 were used for the development of an improved prediction algorithm for W MAX using multiple regression as described in the statistical section (for a flowchart of the patients, see Figure  S1 in the Online Supplement).

Part 2
Bronchodilator effect on exercise performance (study C). Study C was a single-centre, doubleblind, randomized, three-visit crossover trial. Visit 1 started with a screening examination including collection of demographic data together with medical and smoking history, COPD medications and concomitant medications. Furthermore, the participants answered the Clinical COPD Questionnaire (CCQ) 19  For an overview of the study design, see Figure 1.
The participants were to change their LABA and/or LAMA regimens to a short-acting muscarinic antagonist (SAMA) 3-4 times a day at least 48 h prior to visits 2 and 3. In addition, all participants received a short-acting β 2 -agonist (SABA) as reliever medication. Patients only treated with a SAMA and/or SABA as well as patients on inhaled corticosteroids continued as before. Patients were instructed to not use any SAMA or SABA before visits 2 and 3. Thus, a morning visit meant no morning administration, and for an afternoon visit, only an early morning inhalation was allowed.
Exercise testing: The duration of exercise was collected at all exercise tests. Workload was registered at baseline, after 3-4 min and at the end of the tests. Borg dyspnea scale score and Borg leg discomfort scale score 20 were measured every second minute. ECG and blood pressure were recorded before and after exercise. Ergospirometry was used for continuous measurement of VO 2 , VCO 2 , minute ventilation (VE) and respiratory rate (RR). Data were collected at 5-s intervals and processed in the following way: for each value of VO 2 , VCO 2 , VE and RR, the median of the five values encompassing the value was utilized. After exercise, the patients entered a recovery phase in the same way as during the standard tests. 6 ICPET: After a few min of sitting on the bicycle to stabilize oxygen kinetics measurement equipment, the patients had an approximately 1-min warm-up period of loadless pedalling. After this, the patients started cycling (=time point 0) at a load of 40% of predicted W MAX for 4 min, followed by a linear increase in load, calculated to reach predicted W MAX after an additional 7 min. The linear increase in load was continued until the patients reached their measured W MAX at the point of exhaustion. The patients then entered a recovery phase with loadless pedalling (Figure 1).
CET: After a few min of sitting on the bicycle to stabilize oxygen kinetics measurement equipment, the patients had an approximately 1-min warm-up period of loadless pedalling. After this, the patients cycled for 3 min at a load of 40% of measured W MAX from visit 1, followed by an increase to 70% of measured W MAX from visit 1. After this, the load was linearly increased by 1.0% per min until the patients stopped due to exhaustion. The patients then entered a recovery phase with loadless pedalling (Figure 1).

Statistical analyses
We designed the studies to include 15-25 patients to obtain enough data in our development programme, but no formal power calculations were performed for either study.
In studies A and B, 17,18 a model based on random forest regression was used for prediction of W MAX .
In study C, this was changed to a model based on multiple regression. The reason for changing regression method was at least twofold: (1) the random forest methods require considerably more data/patients than standard regression methods 22 and we previously used data from 261 patients in a multicentre study. 6 (2) Poor calibration may occur with the random forest method because the predictions by nature are biased away from the extreme values in the training dataset. 23 This may explain why the predicted W MAX from the random forest algorithm underestimated the high-range performers and overestimated the low-range performers.
Multivariate regression with backward deletion was performed by stepwise removing the variable with the highest p value until only three variables remained. To construct the prediction algorithm, data from two-thirds of the patients (selected by random assignment) were used. Data from the remaining one-third of the patients were thereafter used for validation of the resulting prediction algorithm. SPSS v. 25 for Windows was used for multiple regression.
In study C, demographics and patient data were expressed as mean ± standard deviation unless otherwise stated.

Results
Part 1 Improvement of the prediction algorithm for W MAX . Patient data. Table 1 presents baseline demographic and lung function data for the pooled dataset from study A + study B (n = 38). Patients in study B (n = 3), who also participated in study A, were removed from the pooled dataset. A flowchart of patients included in the pooled dataset is presented in Figure S1.
New algorithm for the prediction of W MAX . As described above in the 'Statistical analyses' section, multiple regression analyses were used for the development of a new prediction algorithm. Two other reasons to abandon the random forest prediction algorithm 17 based on the multicentre study were (1) the regression slope for the algorithm constructed from multicentre data 6 was 0.43 (intercept > 0) when applied to the pooled dataset (Table 2), that is, predicted W MAX values in the high and low range were underestimated and overestimated, respectively, and (2) one of the prime predictors, DLCO, showed univariate variability of R 2 from 0.01 to 0.83 17 between centres in the multicentre study, 6 while our pooled data showed an R 2 = 0.71 in univariate analyses, when baseline DLCO was plotted versus measured W MAX (Table 2).
Therefore, we developed a new prediction algorithm based on the pooled dataset. In the validation, the patients were randomly divided into two-thirds (n = 25) and one-third (n = 13) of the patients. The larger and prime dataset was subjected to multiple regression with backward deletion, until only three variables remained. The resulting three-variable algorithm which included DLCO, FEV 1 As expected, higher values of DLCO, FEV 1 and sex = male, and lower values for age, gave higher values of predicted W MAX . Multiple regression indicated a negative correlation between height and W MAX , but as expected, the univariate regression model showed a positive correlation. The negative correction factor for height in the multiple regression formula resulted partly from compensating for the strongly positive correction factor for FEV 1 (because height is positively correlated with FEV 1 ) and is a consequence of the fact that following multiple regression coefficients are estimated to give the best overall performance. In the pooled dataset, this new prediction algorithm resulted in a correlation between predicted W MAX and measured W MAX with an R 2 = 0.89 and slope = 0.89 (Figure 2(a)). A Bland-Altman plot (Figure 2(c)) showed the agreement between predicted and measured W MAX giving a mean difference of 0.0 (SD: 12.8 and limits of agreement from −25.0 to 25.0).
The new prediction algorithm hereby showed better agreement between the predicted and measured W MAX than the initial random forest algorithm (which showed R 2 = 0.43 and slope = 0.43, Figure  S2A), when plotted in the same pooled dataset. The accompanying Bland-Altman plot ( Figure S2B) showed a mean difference of 12.44 (SD: 17.9 and limits of agreement from −22.7 to 47.6).  Figure  2(b). The values for R 2 = 0.91 and slope = 0.82 were similar to those obtained for the pooled dataset (Figure 2(a)). A Bland-Altman plot (Figure 2(d)) showed the agreement between predicted and measured W MAX , giving a mean difference of −3.6 (SD: 13.6 and limits of agreement from −30.3 to 23.1).
Plots of the exercise time versus measured W MAX of the standard W MAX test in the multicentre study, 6 the ICPET in the pooled dataset and the ICPET in study C are shown in Figure 3. In the standard W MAX test from the multicentre study ( Figure 3(a)), the exercise time was proportional to the work capacity of the patient, and a range of the exercise time from 1-17 min was observed. In the pooled dataset, using an initial W MAX calculated from the random forest algorithm, a flatter relationship between exercise time and work capacity was observed (Figure 3(b)). In study C (Figure 3(c)), using an initial W MAX calculated from the new prediction algorithm, the plot became almost horizontal and the time interval for the test procedure was more narrow (8-13 min) than in the standard W MAX test.
The results from the ICPET were similar in study C compared with the pooled dataset regarding W MAX reached, Borg scale scores and reasons for stopping exercise (Table 3). However, the range of the duration of the ICPET was more narrow in study C. Figure S3A gives graphical presentations of workload versus time for the individual patients during the ICPET.
CET. The values for endurance time were normally distributed at both treatments, while the results for work capacity were not so at either treatment. Figure 4 gives a graphical presentation of the results from the CET performed at visits 2 and 3. Using a multiplicative model, statistical significance was observed for IND/GLY versus placebo for endurance time (difference 21%; 95%    Subanalyses based on baseline functional residual capacity (FRC) and FEV 1 . Explorative subanalyses, which were not prespecified, were performed to compare results from the CET for groups of patients with a baseline functional residual capacity (FRC) below and above 120 %pred and a baseline FEV 1 below and above 70 %pred, respectively.
The baseline mean FRC %pred was 120% in study C. The difference in mean endurance time at the CET between IND/GLY and placebo for patients with an FRC ⩾ 120 %pred (n = 8) and <120 %pred (n = 6) was 144 and 78 s, respectively (see Figure S4A for plots of individual treatment differences versus baseline FRC %pred).

Discussion
Three studies have been performed in our development programme to improve the standard bicycle exercise test. We have compared our ICPET with the standard stepwise CPET in one study and our CET with the CWRET in two studies. These comparisons showed similar results, but the objective of lower standard deviation for time to exhaustion was met for both ICPET and CET, thus showing construct validity. In addition, both the ICPET and CET showed generalizability, as tested in three studies with different patients and with the same inclusion criteria, that is, external validity. Furthermore, in this study, our CET showed a significant responsiveness of endurance time for IND/ GLY and hence also providing evidence for internal/criterion validity. Taken together, these results support the usefulness of the combined ICPET and CET to assess bronchodilator treatment effects on exercise performance.
The new prediction algorithm used pooled data (n = 38) from our previous single-centre studies 17,18 and was based on a multiple regression with backward deletion model instead of the previously used random forest method. FEV 1 and DLCO were found to be the best predictors which together with sex, age and height were used in the equation. A prediction algorithm based on results from a multicentre study may be more generalizable than if constructed on data from a single-centre study. However, in a multicentre study by Maltais and colleagues, 6 the primary objective was to study the endurance time, and these primary and secondary objectives were probably more closely monitored than the predictors used in our algorithm. For example, a large intercentre variability was observed for DLCO (only recorded as baseline parameter) as a function of measured W MAX . 6 In our single-centre studies, the procedures for ICPET and CET, as well as the baseline recordings, were closely monitored with the intention to find new predictors and an improved prediction algorithm for W MAX . Test results collected from only one qualified study centre can thus be expected to be exposed to a lower risk of quality problems and abnormal spread as compared with results from multicentre clinical trials. An important learning is, however, that when using our prediction algorithm in future settings, high quality of the defined predictors, for example, FEV 1 and DLCO, must be standardized to the same quality as the procedure of the exercise tests.
The new prediction algorithm based on the pooled dataset showed a high coefficient of determination and a slope close to 1 for predicted versus measured W MAX . Using this prediction algorithm in this study resulted in an R 2 of 0.91 and a slope of 0.82 for predicted versus measured W MAX . This provides strong support for the validity of this approach for the prediction of W MAX and an accurate determination of the patients' measured W MAX when using the ICPET.
A major advantage over a standard W MAX test is that the ICPET is individualized. In the ICPET, In the subsequent CET, the endurance performance derived from the ICPET in study C may benefit with a more accurate W MAX value (see above). The 70% start in the endurance test in studies B and C, compared with 75% in study A, may facilitate for low performers to cycle longer. Similarly, the 1% per minute of escalation will force high performers to face an earlier exhaustion than with the standard test. Decreasing the coefficient of variation for endurance time has been shown to diminish the required sample size of the study. 11,25 The CET showed a statistically significant prolongation of endurance time for IND/GLY versus placebo (113 s) in a crossover study in patients with moderate to severe COPD. The improvement in endurance time is above the minimally clinically important difference for submaximal exercise endurance time on a cycle ergometer of 46-105 s as suggested. 15  How do our results regarding the primary outcome endurance time relate to other published exercise studies between fixed combination LABA/LAMA and placebo using a standard CWRET? We found one single-dose study with tiotropium/olodaterol versus placebo reporting a significant increase in endurance time of 88 s in a highly selected patient population. 26 Two other long-term studies with LAMA/LABA versus placebo reported increases in endurance time of 24 s 27 after the initial dose and 40 s 28 after the second dose on day 2, respectively. We also found five studies comparing endurance time between different LABA/LAMA medications with placebo over 3-12 weeks. [27][28][29][30][31] The differences in endurance time between LABA/LAMA treatment and placebo at the end of treatment ranged from 55 to 85 s. Overall, the improvement in endurance time after single-dose treatment with IND/GLY in our CET seems to be numerically larger than in similar studies comparing LABA/LAMA treatment with placebo using the standard CWRET. 32 An important factor when comparing results between studies are differences in inclusion criteria. Additional criteria such as signs of lung hyperinflation, often defined as an increase of FRC ⩾ 120 %pred, 5,6,26,33 have been used in studies of treatment effects of bronchodilators on exercise performance in patients with COPD. An FRC ⩾ 120 %pred was also used for selection of the study population in three of the aforementioned studies. 26 36 In addition, the bronchodilator response of several advanced lung function parameters is most pronounced in COPD patients with an FEV 1 %pred of ⩽65 compared with >65, specifically the bronchodilator response of volume parameters. 37 A limitation to our study is that the results may only be applicable to COPD patients with FEV 1 %pred between 40% and 80%, and not be generalizable to other COPD patient populations, for example, patients with cardiovascular symptoms. For example, coexisting pulmonary hypertension may have an impact on W MAX and endurance time but was not investigated in this study. However, the crossover study design makes it unlikely that presence of pulmonary hypertension would have had any significant impact on the results. The size of this study was limited to 14 patients, why the results should be interpreted with some caution, and particularly those from subanalyses regarding FRC and FEV 1 . Such small subanalyses should only be regarded as hypothesis generating, although showing convincing results supported by published data. During the process of optimizing our methodology, the study protocols for both the ICPET and CET have been subjected to small modifications, for example, the initial workload in the ICPET used either 30% or 40% during 3 or 4 min together with 8-or 7-min escalation period, giving a total of 11 min. 17,18 Conclusions drawn from comparisons between studies should always be done with great caution. This is illustrated in the work by Puente-Maestu and colleagues, 14 where a wide range of differences are presented for different bronchodilators.
The next step would be to repeat the study in more patients with more centres and over longer treatment times. In such a study, the question about selection criteria will arise. Should it be with or without an FRC ⩾ 120 %pred or is the answer a lowering of FEV 1 from ⩽80 to ⩽70 %pred? 36,37 Our small explorative subanalyses indicate the latter. In future studies, it may also be possible to replace the ICPET with a baseline CET (using an initial load based on the predicted W MAX ) to reduce a potential increase in cardiac risks associated with maximum tests 9,10 and decrease the risk for training effects during the blinded crossover periods.
The objective of this project was to improve the standard CWRET by reducing the variation in endurance time among patients, leading to fewer patients needed to detect differences between treatments. We have in three sequential studies achieved a better prediction algorithm, an individualized linear maximal test with a low range of the variability in the exercise time and an endurance test which by design avoid short and long durations. In addition, the CET has been reproduced and showed a significant and clinically relevant prolongation of endurance time of IND/ GLY, supported by significant increases of total work capacity. Together, these results provide support that we have achieved the overall goal of our development programme of an improved exercise testing protocol (including an ICPET and a CET). Our results should be confirmed in a prospective study with a larger number of patients.