Although the C-statistic is widely used for evaluating the performance of diagnostic tests, its limitations for evaluating the predictive performance of biomarker panels have been widely discussed. The increment in C obtained by adding a new biomarker to a predictive model has no direct interpretation, and the relevance of the C-statistic to risk stratification is not obvious. This paper proposes that the C-statistic should be replaced by the expected information for discriminating between cases and non-cases (expected weight of evidence, denoted as Λ), and that the strength of evidence favouring one model over another should be evaluated by cross-validation as the difference in test log-likelihoods. Contributions of independent variables to predictive performance are additive on the scale of Λ. Where the effective number of independent predictors is large, the value of Λ is sufficient to characterize fully how the predictor will stratify risk in a population with given prior probability of disease, and the C-statistic can be interpreted as a mapping of Λ to the interval from 0.5 to 1. Even where this asymptotic relationship does not hold, there is a one-to-one mapping between the distributions in cases and non-cases of the weight of evidence favouring case over non-case status, and the quantiles of these distributions can be used to calculate how the predictor will stratify risk. This proposed approach to reporting predictive performance is demonstrated by analysis of a dataset on the contribution of microbiome profile to diagnosis of colorectal cancer.

1. Wu, PY, Cheng, CW, Kaddi, CD, et al. –Omic and electronic health record big data analytics for precision medicine. IEEE Trans Biomed Eng 2017; 64: 263273.
Google Scholar | Crossref | Medline
2. Byrne, S . A note on the use of empirical AUC for evaluating probabilistic forecasts. Electron J Stat 2016; 10: 380393.
Google Scholar | Crossref
3. Pencina, MJ, D’Agostino, RB, Pencina, KM, et al. Interpreting incremental value of markers added to risk prediction models. Am J Epidemiol 2012; 176: 473481.
Google Scholar | Crossref | Medline | ISI
4. Pepe, MS, Fan, J, Seymour, CW, et al. Biases introduced by choosing controls to match risk factors of cases in biomarker research. Clin Chem 2012; 58: 12421251.
Google Scholar | Crossref | Medline | ISI
5. Cook, NR . Statistical evaluation of prognostic versus diagnostic models: beyond the ROC curve. Clin Chem 2008; 54: 1723.
Google Scholar | Crossref | Medline | ISI
6. Janes, H, Longton, G, Pepe, M. Accommodating covariates in ROC analysis. Stata J 2009; 9: 1739.
Google Scholar | SAGE Journals | ISI
7. Huang, Y . Evaluating and comparing biomarkers with respect to the area under the receiver operating characteristics curve in two-phase case-control studies. Biostatistics 2016; 17: 499522.
Google Scholar | Crossref | Medline
8. Parikh, CR, Thiessen-Philbrook, H. Key concepts and limitations of statistical methods for evaluating biomarkers of kidney disease. J Am Soc Nephrol 2014; 25: 16211629.
Google Scholar | Crossref | Medline | ISI
9. Pencina, MJ, D’Agostino, RB, D’Agostino, RB, et al. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med 2008; 27: 157–172; discussion 207–212.
Google Scholar | Crossref | Medline
10. Huang, Y, Pepe, MS. Biomarker evaluation and comparison using the controls as a reference population. Biostatistics 2009; 10: 228244.
Google Scholar | Crossref | Medline | ISI
11. Pepe, MS, Feng, Z, Huang, Y, et al. Integrating the predictiveness of a marker with its performance as a classifier. Am J Epidemiol 2008; 167: 362368.
Google Scholar | Crossref | Medline | ISI
12. Hilden, J, Gerds, TA. A note on the evaluation of novel biomarkers: do not rely on integrated discrimination improvement and net reclassification index. Stat Med 2014; 33: 34053414.
Google Scholar | Crossref | Medline
13. Pepe, MS . Problems with risk reclassification methods for evaluating prediction models. Am J Epidemiol 2011; 173: 13271335.
Google Scholar | Crossref | Medline | ISI
14. Pepe, MS, Fan, J, Feng, Z, et al. The Net Reclassification Index (NRI): a misleading measure of prediction improvement even with independent test data sets. Stat Biosci 2015; 7: 282295.
Google Scholar | Crossref | Medline
15. Collins, GS, Reitsma, JB, Altman, DG, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement. BMC Med 2015; 13: 11.
Google Scholar | Crossref | Medline
16. Good IJ. Weight of evidence: a brief survey. In: Bernardo JM, DeGroot MH, Lindley DV, et al. (eds) Bayesian statistics. Amsterdam: Elsevier, 1985, pp.249–270.
Google Scholar
17. Gneiting, T, Raftery, AE. Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc 2007; 102: 359378.
Google Scholar | Crossref | ISI
18. Good, IJ, Toulmin, GH. Coding theorems and weight of evidence. J Inst Math Appl 1968; 4: 94105.
Google Scholar | Crossref
19. McKeigue, P . Sample size requirements for learning to classify with high-dimensional biomarker panels. Stat Meth Med Res 2019; 28: 904910.
Google Scholar | SAGE Journals | ISI
20. Austin, PC, Steyerberg, EW. Interpreting the concordance statistic of a logistic regression model: relation to the variance and odds ratio of a continuous explanatory variable. BMC Med Res Methodol 2012; 12: 8282.
Google Scholar | Crossref | Medline | ISI
21. Mackay, DJ . Information theory, inference and learning algorithms, Cambridge, UK: Cambridge University Press, 2003.
Google Scholar
22. Kansagara, D, Englander, H, Salanitro, A, et al. Risk prediction models for hospital readmission: a systematic review. JAMA 2011; 306: 16881698.
Google Scholar | Crossref | Medline | ISI
23. Clayton, DG . Prediction and interaction in complex disease genetics: experience in type 1 diabetes. PLoS Genet 2009; 5: e1000540e1000540.
Google Scholar | Crossref | Medline | ISI
24. Johnson, NP . Advantages to transforming the receiver operating characteristic (ROC) curve into likelihood ratio co-ordinates. Stat Med 2004; 23: 22572266.
Google Scholar | Crossref | Medline
25. DeLong, ER, DeLong, DM, Clarke-Pearson, DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988; 44: 837845.
Google Scholar | Crossref | Medline | ISI
26. Chen, W, Samuelson, FW, Gallas, BD, et al. On the assessment of the added value of new predictive biomarkers. BMC Med Res Methodol 2013; 13: 9898.
Google Scholar | Crossref | Medline
27. Stone, M . An asymptotic equivalence of choice of model by cross-validation and Akaike’s criterion. J R Stat Soc Series B (Methodological) 1977; 39: 4447.
Google Scholar | Crossref
28. Baxter, NT, Ruffin, MT, Rogers, MAM, et al. Microbiota-based model improves the sensitivity of fecal immunochemical test for detecting colonic lesions. Genome Med 2016; 8: 3737.
Google Scholar | Crossref | Medline
29. Carpenter, B, Gelman, A, Hoffman, M, et al. Stan: a probabilistic programming language. J Stat Softw 2017; 76: 132.
Google Scholar | Crossref | ISI
30. Piironen, J, Vehtari, A. Sparsity information and regularization in the horseshoe and other shrinkage priors. Electron J Stat 2017; 11: 50185051.
Google Scholar | Crossref
31. Goutis, C, Robert, CP. Model choice in generalised linear models: a Bayesian approach via Kullback-Leibler projections. Biometrika 1998; 85: 2937.
Google Scholar | Crossref | ISI
32. Piironen J and Vehtari A. Projection predictive variable selection using Stan + R. arXiv:150802502 [stat], http://arxiv.org/abs/1508.02502. ArXiv: 1508.02502 (2015, accessed 15 May 2018).
Google Scholar
33. Ott, J . Major strengths and weaknesses of the lod score method. Adv Genet 2001; 42: 125132.
Google Scholar | Crossref | Medline
34. Lee, WC . Selecting diagnostic tests for ruling out or ruling in disease: the use of the Kullback-Leibler distance. Int J Epidemiol 1999; 28: 521525.
Google Scholar | Crossref | Medline | ISI
35. Lindley, DV . On a measure of the information provided by an experiment. Ann Math Stat 1956; 27: 9861005.
Google Scholar | Crossref
36. Hughes, G . Information graphs for epidemiological applications of the Kullback-Leibler divergence. Meth Inform Med 2014; 53: IV–VI.
Google Scholar | Medline
37. McShane, LM, Altman, DG, Sauerbrei, W, et al. Reporting recommendations for tumor marker prognostic studies (REMARK). J Natl Cancer Inst 2005; 97: 11801184.
Google Scholar | Crossref | Medline
38. Janssens, ACJW, Ioannidis, JPA, van Duijn, CM, et al. Strengthening the reporting of Genetic RIsk Prediction Studies: the GRIPS Statement. PLoS Med 2011; 8: e1000420e1000420.
Google Scholar | Crossref | Medline
39. Bossuyt, PM, Reitsma, JB, Bruns, DE, et al. STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. BMJ 2015; 351: h5527h5527.
Google Scholar | Crossref | Medline
40. Varma, S, Simon, R. Bias in error estimation when using cross-validation for model selection. BMC Bioinform 2006; 7: 9191.
Google Scholar | Crossref | Medline
41. Iqbal, SA, Wallach, JD, Khoury, MJ, et al. Reproducible research practices and transparency across the biomedical literature. PLoS Biol 2016; 14: e1002333e1002333.
Google Scholar | Crossref | Medline | ISI
Access Options

My Account

Welcome
You do not have access to this content.



Chinese Institutions / 中国用户

Click the button below for the full-text content

请点击以下获取该全文

Institutional Access

does not have access to this content.

Purchase Content

24 hours online access to download content

Research off-campus without worrying about access issues. Find out about Lean Library here

Your Access Options


Purchase

SMM-article-ppv for $41.50
Single Issue 24 hour E-access for $543.66

Cookies Notification

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Find out more.
Top