When developing prediction models for application in clinical practice, health practitioners usually categorise clinical variables that are continuous in nature. Although categorisation is not regarded as advisable from a statistical point of view, due to loss of information and power, it is a common practice in medical research. Consequently, providing researchers with a useful and valid categorisation method could be a relevant issue when developing prediction models. Without recommending categorisation of continuous predictors, our aim is to propose a valid way to do it whenever it is considered necessary by clinical researchers. This paper focuses on categorising a continuous predictor within a logistic regression model, in such a way that the best discriminative ability is obtained in terms of the highest area under the receiver operating characteristic curve (AUC). The proposed methodology is validated when the optimal cut points’ location is known in theory or in practice. In addition, the proposed method is applied to a real data-set of patients with an exacerbation of chronic obstructive pulmonary disease, in the context of the IRYSS-COPD study where a clinical prediction rule for severe evolution was being developed. The clinical variable PCO2 was categorised in a univariable and a multivariable setting.

1. Altman, DG, Lyman, GH. Methodological challenges in the evaluation of prognostic factors in breast cancer. Breast Cancer Res Treat 1998; 52: 289303.
Google Scholar | Crossref | Medline | ISI
2. Royston, P, Altman, DG, Sauerbrei, W. Dichotomizing continuous predictors in multiple regression: A bad idea. Stat Med 2006; 25: 127141.
Google Scholar | Crossref | Medline | ISI
3. Hastie, T, Tibshirani, R. Generalized additive models, London: Chapman & Hall, 1990.
Google Scholar
4. Wood, SN . Generalized additive models: An introduction with R, London: Chapman & Hall, 2006.
Google Scholar
5. Turner, E, Dobson, J, Pocock, J. Categorisation of continuous risk factors in epidemiological publications: A survey of current practice. Epidemiol Perspect Innov 2010; 7: 99.
Google Scholar | Crossref | Medline
6. Bennette, C, Vickers, A. Against quantiles: Categorization of continuous variables in epidemiologic research, and its discontents. BMC Med Res Methodol 2012; 12: 2121.
Google Scholar | Crossref | Medline | ISI
7. Lim, BL, Kelly, AM. A meta-analysis on the utility of peripheral venous blood gas analyses in exacerbations of chronic obstructive pulmonary disease in the emergency department. Eur J Emerg Med 2010; 17: 246248.
Google Scholar | Crossref | Medline | ISI
8. Mazumdar, M, Glassman, JR. Categorizing a prognostic variable: Review of methods, code for easy implementation and applications to decision-making about cancer treatments. Stat Med 2000; 19: 113132.
Google Scholar | Crossref | Medline | ISI
9. Lausen, B, Schumacher, M. Evaluating the effect of optimized cutoff values in the assessment of prognostic factors. Comput Stat Data Anal 1996; 21: 307326.
Google Scholar | Crossref | ISI
10. Hin, LY, Lau, TK, Rogers, MS Dichotomization of continuous measurements using generalized additive modelling – Application in predicting intrapartum caesarean delivery. Stat Med 1999; 18: 11011110.
Google Scholar | Crossref | Medline | ISI
11. Magder, LS, Fix, AD. Optimal choice of a cut point for a quantitative diagnostic test performed for research purposes. J Clin Epidemiol 2003; 56: 956962.
Google Scholar | Crossref | Medline | ISI
12. Tsuruta, H, Bax, L. Polychotomization of continuous variables in regression models based on the overall C index. BMC Med Inform Decis Making 2006; 6: 4141.
Google Scholar | Crossref | Medline
13. Barrio, I, Arostegui, I, Quintana, JM Use of generalised additive models to categorise continuous variables in clinical prediction. BMC Med Res Methodol 2013; 13: 8383.
Google Scholar | Crossref | Medline | ISI
14. Harrell, FE, Califf, RM, Pryor, DB Evaluating the yield of medical tests. JAMA J Am Med Assoc 1982; 247: 25432546.
Google Scholar | Crossref | Medline | ISI
15. Buist, AS, Vollmer, WM, McBurnie, MA. Worldwide burden of COPD in high-and low-income countries. Part I. The Burden of Obstructive Lung Disease (BOLD) Initiative. Int J Tuberc Lung Dis 2008; 12: 703708.
Google Scholar | Medline | ISI
16. Esteban, C, Quintana, JM, Moraza, J Impact of hospitalisations for exacerbations of COPD on health-related quality of life. Respir Med 2009; 103: 12011208.
Google Scholar | Crossref | Medline | ISI
17. Rabe, KF, Hurd, S, Anzueto, A Global strategy for the diagnosis, management, and prevention of chronic obstructive pulmonary disease: GOLD executive summary. Am J Respir Crit Care Med 2007; 176: 532555.
Google Scholar | Crossref | Medline | ISI
18. Pauwels, RA, Rabe, KF. Burden and clinical features of chronic obstructive pulmonary disease (COPD). Lancet 2004; 364: 613620.
Google Scholar | Crossref | Medline | ISI
19. Quintana, JM, Esteban, C, Barrio, I The IRYSS-COPD appropriateness study: Objectives, methodology, and description of the prospective cohort. BMC Health Serv Res 2011; 11: 322322.
Google Scholar | Crossref | Medline | ISI
20. Teasdale, G, Jennett, B. Assessment of coma and impaired consciousness: A practical scale. Lancet 1974; 304: 8184.
Google Scholar | Crossref
21. McCullagh, P, Nelder, JA. Generalized linear models, London: Chapman & Hall, 1989.
Google Scholar | Crossref
22. Pepe, MS . The statistical evaluation of medical tests for classification and prediction, New York: Oxford University Press, 2003.
Google Scholar
23. Eiben AE and Smith JE. Introduction to evolutionary computing. Berlin: Springer, 2003.
Google Scholar
24. Copas, JB, Corbett, P. Overestimation of the receiver operating characteristic curve for logistic regression. Biometrika 2002; 89: 315331.
Google Scholar | Crossref | ISI
25. Airola, A, Pahikkala, T, Waegeman, W An experimental comparison of cross-validation techniques for estimating the area under the ROC curve. Comput Stat Data Anal 2011; 55: 18281844.
Google Scholar | Crossref | ISI
26. Steyerberg, EW . Clinical prediction models. A practical approach to development, validation, and updating, New York, NY: Springer, 2009.
Google Scholar
27. Efron, B, Tibshirani, RJ. An introduction to the bootstrap, New York, NY: Chapman & Hall, 1993.
Google Scholar | Crossref
28. Pencina, MJ, D’Agostino, RB, Vasan, RS. Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond. Stat Med 2008; 27: 157172.
Google Scholar | Crossref | Medline | ISI
29. Pepe, MS, Feng, Z, Gu, JW. Comments on ‘Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond’ by MJ Pencina et al., Statistics in Medicine. Stat Med 2008; 27: 173181.
Google Scholar | Crossref | Medline | ISI
30. R Core Team. R: A language and environment for statistical computing, http://www.R-project.org/ (2015, accessed August 2015).
Google Scholar
31. Global Initiative for Chronic Obstructive Lung Disease. Global strategy for the diagnosis, management, and prevention of chronic obstructive pulmonary disease, http://www.goldcopd.com/ (2013).
Google Scholar
32. Mebane, WR, Sekhon, JS. Genetic optimization using derivatives: The rgenoud package for R. J Stat Softw 2011; 42: 126.
Google Scholar | Crossref | ISI
33. Cohen, J . Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychol Bull 1968; 70: 213213.
Google Scholar | Crossref | Medline | ISI
34. Taylor, JMG, Yu, M. Bias and efficiency loss due to categorizing an explanatory variable. J Multivar Anal 2002; 83: 248263.
Google Scholar | Crossref | ISI
35. Altman DG. Categorizing continuous variables. In: Encyclopedia of biostatistics. Chichester: John Wiley & Sons, 2005.
Google Scholar
36. Mazumdar, M, Smith, A, Bacik, J. Methods for categorizing a prognostic variable in a multivariable setting. Stat Med 2003; 22: 559571.
Google Scholar | Crossref | Medline | ISI
Access Options

My Account

Welcome
You do not have access to this content.



Chinese Institutions / 中国用户

Click the button below for the full-text content

请点击以下获取该全文

Institutional Access

does not have access to this content.

Purchase Content

24 hours online access to download content

Research off-campus without worrying about access issues. Find out about Lean Library here

Your Access Options


Purchase

SMM-article-ppv for $41.50
Single Issue 24 hour E-access for $543.66

Cookies Notification

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Find out more.
Top