Random forests are one of the state-of-the-art supervised machine learning methods and achieve good performance in high-dimensional settings where p, the number of predictors, is much larger than n, the number of observations. Repeated measurements provide, in general, additional information, hence they are worth accounted especially when analyzing high-dimensional data. Tree-based methods have already been adapted to clustered and longitudinal data by using a semi-parametric mixed effects model, in which the non-parametric part is estimated using regression trees or random forests. We propose a general approach of random forests for high-dimensional longitudinal data. It includes a flexible stochastic model which allows the covariance structure to vary over time. Furthermore, we introduce a new method which takes intra-individual covariance into consideration to build random forests. Through simulation experiments, we then study the behavior of different estimation methods, especially in the context of high-dimensional data. Finally, the proposed method has been applied to an HIV vaccine trial including 17 HIV-infected patients with 10 repeated measurements of 20,000 gene transcripts and blood concentration of human immunodeficiency virus RNA. The approach selected 21 gene transcripts for which the association with HIV viral load was fully relevant and consistent with results observed during primary infection.

1. Breiman, L. Random forests. Mach Learn 2001; 45: 532.
Google Scholar | Crossref | ISI
2. Fernández-Delgado, M, Cernadas, E, Barro, S, et al. Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 2014; 15: 31333181.
Google Scholar
3. Cutler, DR, Edwards, TC, Beard, KH, et al. Random forests for classification in ecology. Ecology 2007; 88: 27832792.
Google Scholar | Crossref | Medline | ISI
4. Chen, X, Ishwaran, H. Random forests for genomic data analysis. Genomics 2012; 99: 323329.
Google Scholar | Crossref | Medline | ISI
5. Scornet, E, Biau, G, Vert, JP. Consistency of random forests. Ann Stat 2015; 43: 17161741.
Google Scholar | Crossref
6. Mentch, L, Hooker, G. Quantifying uncertainty in random forests via confidence intervals and hypothesis tests. J Mach Learn Res 2016; 17: 841881.
Google Scholar
7. Wager, S. Asymptotic theory for random forests. arXiv preprint arXiv:14050352 2014.
Google Scholar
8. Biau, G, Scornet, E. A random forest guided tour. Test 2016; 25: 197227.
Google Scholar | Crossref
9. Genuer, R, Poggi, JM, Tuleau, C. Random forests: some methodological insights. arXiv preprint arXiv:08113619 2008.
Google Scholar
10. Zhu, R, Zeng, D, Kosorok, MR. Reinforcement learning trees. J Am Stat Assoc 2015; 110: 17701784.
Google Scholar | Crossref | Medline
11. Linero, AR. Bayesian regression trees for high-dimensional prediction and variable selection. J Am Stat Assoc 2018; 113: 626636.
Google Scholar | Crossref
12. Chipman, HA, George, EI, McCulloch, RE. Bayesian cart model search. J Am Stat Assoc 1998; 93: 935948.
Google Scholar | Crossref
13. Hothorn, T, Bühlmann, P, Dudoit, S, et al. Survival ensembles. Biostatistics 2005; 7: 355373.
Google Scholar | Crossref | Medline
14. Ishwaran, H, Kogalur, UB, Blackstone, EH, et al. Random survival forests. Ann Appl Stat 2008; 2: 841860.
Google Scholar | Crossref | ISI
15. Ishwaran, H, Kogalur, UB, Gorodeski, EZ, et al. High-dimensional variable selection for survival data. J Am Stat Assoc 2010; 105: 205217.
Google Scholar | Crossref
16. Steingrimsson, JA, Diao, L, Strawderman, RL. Censoring unbiased regression trees and ensembles. J Am Stat Assoc 2019; 114: 370383.
Google Scholar | Crossref | Medline
17. Laird, NM, Ware, JH. Random-effects models for longitudinal data. Biometrics 1982; 963974.
Google Scholar | Crossref | Medline
18. Verbeke, G, Molenberghs, G. Linear mixed models for longitudinal data. New York: Springer, 2009.
Google Scholar
19. Segal, MR. Tree-structured methods for longitudinal data. J Am Stat Assoc 1992; 87: 407418.
Google Scholar | Crossref | ISI
20. Eo, SH, Cho, H. Tree-structured mixed-effects regression modeling for longitudinal data. J Comput Graph Stat 2014; 23: 740760.
Google Scholar | Crossref
21. Wei, Y, Liu, L, Su, X, et al. Precision medicine: Subgroup identification in longitudinal trajectories. Statistical Methods in Medical Research, 29(9), 2603–2616.
Google Scholar
22. Hajjem, A, Bellavance, F, Larocque, D. Mixed effects regression trees for clustered data. Stat Probab Lett 2011; 81: 451459.
Google Scholar | Crossref | ISI
23. Sela, RJ, Simonoff, JS. RE-EM trees: a data mining approach for longitudinal and clustered data. Mach Learn 2012; 86: 169207.
Google Scholar | Crossref | ISI
24. Fu, W, Simonoff, JS. Unbiased regression trees for longitudinal and clustered data. Comput Stat Data Anal 2015; 88: 5374.
Google Scholar | Crossref
25. Hothorn, T, Hornik, K, Zeileis, A. Unbiased recursive partitioning: a conditional inference framework. J Comput Graph Stat 2006; 15: 651674.
Google Scholar | Crossref | ISI
26. Breiman, L, Friedman, JH, Olshen, RA, et al. Classification and regression trees. London: Chapman & Hall, 1984.
Google Scholar
27. Hajjem, A, Bellavance, F, Larocque, D. Mixed-effects random forest for clustered data. J Stat Comput Simul 2014; 84: 13131328.
Google Scholar | Crossref
28. McLachlan, GJ, Krishnan, T. The EM algorithm and extensions. Hoboken, NJ: John Wiley & Sons, 1997.
Google Scholar
29. Kundu, MG, Harezlak, J. Regression trees for longitudinal data with baseline covariates. Biostat Epidemiol 2019; 3: 122.
Google Scholar | Crossref | Medline
30. Calhoun, P, Levine, RA, Fan, J. Repeated measures random forests (RMRF): Identifying factors associated with nocturnal hypoglycemia. Biometrics. Epub ahead of print 20 April 2020. DOI: 10.1111/biom.13284
Google Scholar
31. R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2019.
Google Scholar
32. Diggle, PJ, Hutchinson, MF. On spline smoothing with autocorrelated errors. Aust N Z J Stat 1989; 31: 166182.
Google Scholar | Crossref
33. Zhang, D, Lin, X, Raz, J, et al. Semiparametric stochastic mixed models for longitudinal data. J Am Stat Assoc 1998; 93: 710719.
Google Scholar | Crossref | ISI
34. Wu, H, Zhang, JT. Nonparametric regression methods for longitudinal data analysis: mixed-effects modeling approaches. Hoboken, NJ: John Wiley & Sons, 2006.
Google Scholar
35. Díaz-Uriarte, R, Alvarez De Andres, S. Gene selection and classification of microarray data using random forest. BMC Bioinf 2006; 7: 3.
Google Scholar | Crossref | Medline | ISI
36. Hejblum, BP, Skinner, J, Thiébaut, R. Time-course gene set analysis for longitudinal gene expression data. PLoS Comput Biol 2015; 11: e1004310.
Google Scholar | Crossref | Medline | ISI
37. Verikas, A, Gelzinis, A, Bacauskiene, M. Mining data with random forests: a survey and results of new tests. Pattern Recognit 2011; 44: 330349.
Google Scholar | Crossref | ISI
38. Lévy, Y, Thiébaut, R, Montes, M, et al. Dendritic cell-based therapeutic vaccine elicits polyfunctional HIV-specific T-cell immunity associated with control of viral load. Eur J Immunol 2014; 44: 28022810.
Google Scholar | Crossref | Medline
39. Genuer, R, Poggi, JM, Tuleau-Malot, C. Variable selection using random forests. Pattern Recognit Lett 2010; 31: 22252236.
Google Scholar | Crossref | ISI
40. Genuer, R, Poggi, JM, Tuleau-Malot, C. VSURF: an R package for variable selection using random forests. R J 2015; 7: 1933.
Google Scholar | Crossref
41. Chaussabel, D, Baldwin, N. Democratizing systems immunology with modular transcriptional repertoire analyses. Nat Rev Immunol 2014; 14: 271.
Google Scholar | Crossref | Medline
42. Bosinger, SE, Li, Q, Gordon, SN, et al. Global genomic analysis reveals rapid control of a robust innate response in SIV-infected sooty mangabeys. J Clin Invest 2009; 119: 35563572.
Google Scholar | Medline
Access Options

My Account

Welcome
You do not have access to this content.



Chinese Institutions / 中国用户

Click the button below for the full-text content

请点击以下获取该全文

Institutional Access

does not have access to this content.

Purchase Content

24 hours online access to download content

Research off-campus without worrying about access issues. Find out about Lean Library here

Your Access Options


Purchase

SMM-article-ppv for $41.50

Article available in:

Related Articles

Articles Citing this One: 0

Cookies Notification

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Find out more.
Top