Abstract
Background:
Investigators conducting randomized clinical trials often explore treatment effect heterogeneity to assess whether treatment efficacy varies according to patient characteristics. Identifying heterogeneity is central to making informed personalized healthcare decisions. Treatment effect heterogeneity can be investigated using subpopulation treatment effect pattern plot (STEPP), a non-parametric graphical approach that constructs overlapping patient subpopulations with varying values of a characteristic. Procedures for statistical testing using subpopulation treatment effect pattern plot when the endpoint of interest is survival remain an area of active investigation.
Methods:
A STEPP analysis was used to explore patterns of absolute and relative treatment effects for varying levels of a breast cancer biomarker, Ki-67, in the phase III Breast International Group 1-98 randomized clinical trial, comparing letrozole to tamoxifen as adjuvant therapy for postmenopausal women with hormone receptor–positive breast cancer. Absolute treatment effects were measured by differences in 4-year cumulative incidence of breast cancer recurrence, while relative effects were measured by the subdistribution hazard ratio in the presence of competing risks using O–E (observed-minus-expected) methodology, an intuitive non-parametric method. While estimation of hazard ratio values based on O–E methodology has been shown, a similar development for the subdistribution hazard ratio has not. Furthermore, we observed that the subpopulation treatment effect pattern plot analysis may not produce results, even with 100 patients within each subpopulation. After further investigation through simulation studies, we observed inflation of the type I error rate of the traditional test statistic and sometimes singular variance–covariance matrix estimates that may lead to results not being produced. This is due to the lack of sufficient number of events within the subpopulations, which we refer to as instability of the subpopulation treatment effect pattern plot analysis. We introduce methodology designed to improve stability of the subpopulation treatment effect pattern plot analysis and generalize O–E methodology to the competing risks setting. Simulation studies were designed to assess the type I error rate of the tests for a variety of treatment effect measures, including subdistribution hazard ratio based on O–E estimation. This subpopulation treatment effect pattern plot methodology and standard regression modeling were used to evaluate heterogeneity of Ki-67 in the Breast International Group 1-98 randomized clinical trial.
Results:
We introduce methodology that generalizes O–E methodology to the competing risks setting and that improves stability of the STEPP analysis by pre-specifying the number of events across subpopulations while controlling the type I error rate. The subpopulation treatment effect pattern plot analysis of the Breast International Group 1-98 randomized clinical trial showed that patients with high Ki-67 percentages may benefit most from letrozole, while heterogeneity was not detected using standard regression modeling.
Conclusion:
The STEPP methodology can be used to study complex patterns of treatment effect heterogeneity, as illustrated in the Breast International Group 1-98 randomized clinical trial. For the subpopulation treatment effect pattern plot analysis, we recommend a minimum of 20 events within each subpopulation.
References
| 1. |
Wang, R, Lagakos, SW, Ware, JH. Statistics in medicine—reporting of subgroup analyses in clinical trials. N Eng J Med 2007; 357: 2189–2194. Google Scholar | Crossref | Medline | ISI |
| 2. |
Kent, DM, Rothwell, PM, Ioannidis, JPA. Assessing and reporting heterogeneity in treatment effects in clinical trials: a proposal. Trials 2011; 11: 1–11. Google Scholar | ISI |
| 3. |
Royston, P, Altman, D, Sauerbrei, W. Dichotomizing continuous predictors in multiple regression: a bad idea. Stat Med 2006; 25: 127–141. Google Scholar | Crossref | Medline | ISI |
| 4. |
Wang, SJ, O’Neil, RT, Hung, HJ. Statistical considerations in evaluating pharmacogenomics-based clinical effect for confirmatory trials. Clin Trials 2010; 7: 525–536. Google Scholar | SAGE Journals | ISI |
| 5. |
Lagakos, SW . The challenge of subgroup analyses—reporting without distorting. N Engl J Med 2006; 354: 1667–1669. Google Scholar | Crossref | Medline | ISI |
| 6. |
Pocock, S . More on subgroup analysis in clinical trials. N Eng J Med 2008; 358: 2076–2077. Google Scholar | Crossref | Medline | ISI |
| 7. |
Bonetti, M, Gelber, RD. A graphical method to assess treatment-covariate interactions using the Cox model on subsets of the data. Stat Med 2000; 19: 2595–2609. Google Scholar | Crossref | Medline | ISI |
| 8. |
Bonetti, M, Gelber, R. Patterns of treatment effects in subsets of patients in clinical trials. Biostatistics 2004; 5: 465–481. Google Scholar | Crossref | Medline | ISI |
| 9. |
Bonetti, M, Zahrieh, D, Cole, BF. A small sample study of the STEPP approach to assessing treatment-covariate interactions in survival data. Stat Med 2009; 28: 1255–1268. Google Scholar | Crossref | Medline | ISI |
| 10. |
Lazar, AA, Cole, BF, Bonetti, M. Evaluation of treatment-effect heterogeneity using biomarkers measured on a continuous scale: subpopulation treatment effect pattern plot. J Clin Oncol 2010; 28: 4539–4544. Google Scholar | Crossref | Medline | ISI |
| 11. |
Cui, I, Hung, HM, Wang, SJ. Issues related to subgroup analysis in clinical trials. J Biopharm Stat 2002; 12: 347–358. Google Scholar | Crossref | Medline |
| 12. |
Cox, DR . Regression models and life tables (with discussion). J R Stat Soc Series B Stat Methodol 1972; 34: 187–220. Google Scholar | ISI |
| 13. |
The Breast International Group (BIG) 1-98 Collaborative Group , Thürlimann, B, Keshaviah, A. A comparison of letrozole and tamoxifen in postmenopausal women with early breast cancer. N Engl J Med 2005; 353: 2747–2757. Google Scholar | Crossref | Medline | ISI |
| 14. |
Coates, AS, Keshaviah, A, Thürlimann, B. Five years of letrozole compared with tamoxifen as initial adjuvant therapy for postmenopausal women with endocrine-responsive early breast cancer: update of study BIG 1-98. J Clin Oncol 2007; 25: 486–492. Google Scholar | Crossref | Medline | ISI |
| 15. |
Clahsen, PC, van de Velde, CJ, Duval, C. The utility of mitotic index, oestrogen receptor and Ki-67 measurements in the creation of novel prognostic indices for node-negative breast cancer. Eur J Surg Oncol 1999; 25: 356–363. Google Scholar | Crossref | Medline | ISI |
| 16. |
Miller, WR, White, S, Dixon, JM. Proliferation, steroid receptors and clinical/pathological response in breast cancer treated with letrozole. Br J Cancer 2006; 94: 1051–1056. Google Scholar | Crossref | Medline | ISI |
| 17. |
Viale, G, Giobbie-Hurder, A, Regan, MM. Prognostic and predictive value of centrally reviewed Ki-67 labeling index in postmenopausal women with endocrine-responsive breast cancer: results from Breast International Group Trial 1-98 comparing adjuvant tamoxifen with letrozole. J Clin Oncol 2008; 26: 5569–5575. Google Scholar | Crossref | Medline | ISI |
| 18. |
R Foundation for Statistical Computing . R: a language and environment for statistical computing (www.CRANR-project.org). Vienna: R Foundation for Statistical Computing, 2008. Google Scholar |
| 19. |
Kaplan, EL, Meier, P. Nonparametric estimation from incomplete observations. J Am Stat Assoc 1958; 53: 457–481. Google Scholar | Crossref | ISI |
| 20. |
Kalbfleisch, JD, Prentice, RL. The statistical analysis of failure time data. New York: Wiley, 1980, pp. 168–169. Google Scholar |
| 21. |
Peto, R, Pike, MC, Armitage, P. Design and analysis of randomized clinical trials requiring prolonged observation of each patient. Br J Cancer 1977; 35: 1–39. Google Scholar | Crossref | Medline | ISI |
| 22. |
Cox, DR . Regression models and life tables (with discussion). J R Stat Soc Series B Stat Methodol 1972; 34: 187–220. Google Scholar | ISI |
| 23. |
Fine, JP, Gray, RJ. A proportional hazards model for the subdistribution of a competing risk. J Am Stat Assoc 1999; 94: 496–509. Google Scholar | Crossref | ISI |
| 24. |
Potthoff, RF, Peterson, BL, George, SL. Detecting treatment-by-centre interactions in multi-centre clinical trials. Stat Med 2001; 30: 193–213. Google Scholar | Crossref | ISI |
| 25. |
Pocock, S . More on subgroup analysis in clinical trials. N Engl J Med 2008; 358: 2076. Google Scholar | ISI |
| 26. |
Edgington, ES . Randomization tests. 3rd ed. New York: Marcel Dekker, 1995. Google Scholar |
| 27. |
VanderWeele, TJ, Knol, MJ. A tutorial on interaction. Epidemiol Method 2014; 3: 33–72. Google Scholar | Crossref |
| 28. |
Royston, P, Sauerbrei, W. A new approach to modeling interactions between treatment and continuous covariates in clinical trials by using fractional polynomials. Stat Med 2004; 19: 2509–2525. Google Scholar | Crossref | ISI |
| 29. |
Simon, R, Dixon, DO, Freidlin, BA. A Bayesian model for evaluating specificity of treatment effects in clinical trials. Norwell, MA: Kluwer Academic Publications, 1995. Google Scholar | Crossref |
| 30. |
Simon, R . Bayesian subset analysis: application to studying treatment-by-gender interactions. Stat Med 2002; 21: 2909–2916. Google Scholar | Crossref | Medline | ISI |
