The classical and most commonly used approach to building prediction intervals is the parametric approach. However, its main drawback is that its validity and performance highly depend on the assumed functional link between the covariates and the response. This research investigates new methods that improve the performance of prediction intervals with random forests. Two aspects are explored: The method used to build the forest and the method used to build the prediction interval. Four methods to build the forest are investigated, three from the classification and regression tree (CART) paradigm and the transformation forest method. For CART forests, in addition to the default least-squares splitting rule, two alternative splitting criteria are investigated. We also present and evaluate the performance of five flexible methods for constructing prediction intervals. This yields 20 distinct method variations. To reliably attain the desired confidence level, we include a calibration procedure performed on the out-of-bag information provided by the forest. The 20 method variations are thoroughly investigated, and compared to five alternative methods through simulation studies and in real data settings. The results show that the proposed methods are very competitive. They outperform commonly used methods in both in simulation settings and with real data.

1. Breiman, L . Random forests. Machine Learn 2001; 45: 532.
Google Scholar | Crossref | ISI
2. Meinshausen, N . Quantile regression forests. J Machine Learn Res 2006; 7: 983999.
Google Scholar | ISI
3. Lin, Y, Jeon, Y. Random forests and adaptive nearest neighbors. J Am Stat Assoc 2006; 101: 578590.
Google Scholar | Crossref
4. Hothorn, T, Lausen, B, Benner, A, et al. Bagging survival trees. Stat Med 2004; 23: 7791.
Google Scholar | Crossref | Medline | ISI
5. Athey S, Tibshirani J and Wager S. Generalized random forests. arXiv preprint 2017; arXiv:1610.01271.
Google Scholar
6. Hothorn T and Zeileis A. Transformation forests. arXiv preprint 2017; arXiv:1701.02110.
Google Scholar
7. Lei, J, G'Sell, M, Rinaldo, A, et al. Distribution-free predictive inference for regression. J Am Stat Assoc 2018; 113: 10941111. .
Google Scholar | Crossref
8. Vovk V, Gammerman A and Shafer G. Algorithmic learning in a random world. Berlin: Springer Science & Business Media, 2005.
Google Scholar
9. Sexton, J, Laake, P. Standard errors for bagged and random forest estimators. Comput Stat Data Anal 2009; 53: 801811.
Google Scholar | Crossref
10. Wager, S, Hastie, T, Efron, B. Confidence intervals for random forests: the jackknife and the infinitesimal jackknife. J Machine Learn Res 2014; 15: 16251651.
Google Scholar | Medline
11. Mentch, L, Hooker, G. Quantifying uncertainty in random forests via confidence intervals and hypothesis tests. J Machine Learn Res 2016; 17: 841881.
Google Scholar
12. Moradian, H, Larocque, D, Bellavance, F. L1 splitting rules in survival forests. Lifetime Data Anal 2017; 23: 671691.
Google Scholar | Crossref | Medline
13. Moradian, H, Larocque, D, Bellavance, F. Survival forests for data with dependent censoring. Stat Meth Med Res 2017; 28: 445461.
Google Scholar | SAGE Journals
14. Hothorn, T, Hornik, K, Zeileis, A. Unbiased recursive partitioning: A conditional inference framework. J Comput Graph Stat 2006; 15: 651674.
Google Scholar | Crossref | ISI
15. Zeileis, A, Hothorn, T, Hornik, K. Model-based recursive partitioning. J Comput Graph Stat 2008; 17: 492514.
Google Scholar | Crossref | ISI
16. Hyndman, RJ . Computing and graphing highest density regions. Am Stat 1996; 50: 120126.
Google Scholar | ISI
17. Samworth, RJ, Wand, MP. Asymptotics and optimal bandwidth selection for highest density region estimation. Ann Stat 2010; 38: 17671792.
Google Scholar | Crossref
18. R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, https://www.R-project.org/ (2017, accessed 1 February 2019).
Google Scholar
19. Ishwaran H and Kogalur UB. Random forests for survival, regression, and classification (RF-SRC), R package version 2.5.1, https://cran.r-project.org/package=randomForestSRC (2017, accessed 1 February 2019).
Google Scholar
20. Hothorn T. trtf: transformation trees and forests, R package version 0.3-2, https://CRAN.R-project.org/package=trtf (2018, accessed 1 February 2019).
Google Scholar
21. Hothorn, T, Zeileis, A. partykit: A modular toolkit for recursive partytioning in R. J Mach Learn Res 2015; 16: 39053909.
Google Scholar
22. Hyndman RJ. Highest density regions and conditional density estimation, R package version 3.1, https://github.com/robjhyndman/hdrcde (2015, accessed 1 February 2019).
Google Scholar
23. Leisch F and Dimitriadou E. mlbench: machine learning benchmark problems, R package version 2.1-1. Vienna, Austria: R Foundation for Statistical Computing, 2010.
Google Scholar
24. Friedman, JH . Multivariate adaptive regression splines. Ann Stat 1991; 19: 167.
Google Scholar | Crossref | ISI
25. Breiman, L . Bagging predictors. Machine Learn 1996; 24: 123140.
Google Scholar | Crossref | ISI
26. Meinshausen N. Quantregforest: quantile regression forests, R package version 13-5. Vienna, Austria: R Foundation for Statistical Computing, 2016.
Google Scholar
27. Tibshirani J, Athey S, Wager S, et al. grf: generalized random forests (beta), R package version 0.10.0, https://CRAN.R-project.org/package=grf (2018, accessed 1 February 2019).
Google Scholar
28. G’Sell M, Lei J, Rinaldo A, et al. Tools for conformal inference in regression, R package conformalInference version 1.0.0, https://github.com/ryantibs/conformal (2017, accessed 1 February 2019).
Google Scholar
29. Bache K and Lichman M. UCI machine learning repository, 2013, https://archive.ics.uci.edu/ml/index.php.
Google Scholar
30. Mayr, A, Hothorn, T, Fenske, N. Prediction intervals for future BMI values of individual children – a non-parametric approach by quantile boosting. BMC Med Res Methodol 2012; 12: 6.
Google Scholar | Crossref | Medline
Access Options

My Account

Welcome
You do not have access to this content.



Chinese Institutions / 中国用户

Click the button below for the full-text content

请点击以下获取该全文

Institutional Access

does not have access to this content.

Purchase Content

24 hours online access to download content

Research off-campus without worrying about access issues. Find out about Lean Library here

Your Access Options


Purchase

SMM-article-ppv for $41.50
Single Issue 24 hour E-access for $543.66

Cookies Notification

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Find out more.
Top