The genomics era has led to an increase in the dimensionality of data collected in the investigation of biological questions. In this context, dimension-reduction techniques can be used to summarise high-dimensional signals into low-dimensional ones, to further test for association with one or more covariates of interest. This paper revisits one such approach, previously known as principal component of heritability and renamed here as principal component of explained variance (PCEV). As its name suggests, the PCEV seeks a linear combination of outcomes in an optimal manner, by maximising the proportion of variance explained by one or several covariates of interest. By construction, this method optimises power; however, due to its computational complexity, it has unfortunately received little attention in the past. Here, we propose a general analytical PCEV framework that builds on the assets of the original method, i.e. conceptually simple and free of tuning parameters. Moreover, our framework extends the range of applications of the original procedure by providing a computationally simple strategy for high-dimensional outcomes, along with exact and asymptotic testing procedures that drastically reduce its computational cost. We investigate the merits of the PCEV using an extensive set of simulations. Furthermore, the use of the PCEV approach is illustrated using three examples taken from the fields of epigenetics and brain imaging.

1. Abdi, H . Partial least square regression, projection on latent structure regression, PLS-regression. Wiley Interdiscip Rev 2010; 2: 97106.
Google Scholar | Crossref
2. Härdle, W, Simar, L. Canonical correlation analysis. Appl Multivariate Stat Anal 2007; 2: 321330.
Google Scholar
3. Friedman, J . Regularized discriminant analysis. J Am Stat Assoc 1989; 84: 165175.
Google Scholar | Crossref | ISI
4. Ott, J, Rabinowitz, D. A principal-components approach based on heritability for combining phenotype information. Hum Hered 1999; 49: 106111.
Google Scholar | Crossref | Medline
5. Wang, Y, Fang, Y, Man, J. A ridge penalized principal-components approach based on heritability for high-dimensional data. Hum Hered 2007; 64: 182191.
Google Scholar | Crossref | Medline
6. Klei, L, Luca, D, Devlin, B, et al. Pleiotropy and principal components of heritability combine to increase power for association analysis. Genet Epidemiol 2008; 32: 919.
Google Scholar | Crossref | Medline
7. Fang, Y, Feng, Y, Yuan, M. Regularized principal components of heritability. Comput Stat 2014; 29: 455465.
Google Scholar | Crossref
8. Nishisato, S . Optimization and data structure: seven faces of dual scaling. Ann Oper Res 1995; 55: 345359.
Google Scholar | Crossref
9. Chun, H, Keleş, S. Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J R Stat Soc B 2010; 72: 325.
Google Scholar | Crossref
10. Vinod, HD . Canonical ridge and econometrics of joint production. J Econometr 1976; 4: 147166.
Google Scholar | Crossref
11. Leurgans, SE, Moyeed, RA, Silverman, BW. Canonical correlation analysis when the data are curves. J R Stat Soc B 1993; ((55): 725740.
Google Scholar
12. Lin, J, Zhu, H, Knickmeyer, R, et al. Projection regression models for multivariate imaging phenotype. Genet Epidemiol 2012; 36: 631641.
Google Scholar | Crossref | Medline
13. Tibshirani, R . Regression shrinkage and selection via the lasso. J R Stat Soc B 1996; (58): 267288.
Google Scholar
14. Hoerl, AE, Kennard, RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics 1970; 12: 5567.
Google Scholar | Crossref | ISI
15. Everitt, B, Dunn, G. Applied multivariate data analysis, London: Edward Arnold, 1991.
Google Scholar
16. Rencher, A, Christensen, W. Methods of multivariate analysis, Toronto: John Wiley and Sons, 2012.
Google Scholar | Crossref
17. Johnstone, IM . Multivariate analysis and jacobi ensembles: largest eigenvalue, tracy–widom limits and rates of convergence. Ann Stat 2008; 36: 26382638.
Google Scholar | Crossref | Medline
18. Eckhardt, F, Lewin, J, Cortese, R, et al. DNA methylation profiling of human chromosomes 6, 20 and 22. Nat Genet 2006; 38: 13781385.
Google Scholar | Crossref | Medline | ISI
19. Friedman, J, Hastie, T, Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw 2010; 33: 122.
Google Scholar | Crossref | Medline | ISI
20. Cotterchio, M, McKeown-Eyssen, G, Sutherland, H, et al. Ontario familial colon cancer registry: methods and first-year response rates. Chronic Dis Can 2000; 21: 8186.
Google Scholar | Medline
21. Zanke, BW, Greenwood, CM, Rangrej, J, et al. Genome-wide association scan identifies a colorectal cancer susceptibility locus on chromosome 8q24. Nat Genet 2007; 39: 989994.
Google Scholar | Crossref | Medline | ISI
22. Fortin, JP, Labbe, A, Lemire, M, et al. Functional normalization of 450 k methylation array data improves replication in large cancer studies. Genome Biol 2014; 15: 503503.
Google Scholar | Crossref | Medline
23. Houseman, EA, Accomando, WP, Koestler, DC, et al. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinformatics 2012; 13: 8686.
Google Scholar | Crossref | Medline | ISI
24. Potkin, SG, Guffanti, G, Lakatos, A, et al. Hippocampal atrophy as a quantitative trait in a genome-wide association study identifying novel susceptibility genes for Alzheimer’s disease. PloS One 2009; 4: e6501e6501.
Google Scholar | Crossref | Medline | ISI
25. Miceli-Richard, C, Wang-Renault, SF, Boudaoud, S, et al. Overlap between differentially methylated DNA regions in blood B lymphocytes and genetic at-risk loci in primary Sjögren’s syndrome. Ann Rheum Dis 2015; (75): 18.
Google Scholar
26. Gao, X, Starmer, J, Martin, ER, et al. A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms. Genet Epidemiol 2008; 32: 361361.
Google Scholar | Crossref | Medline | ISI
27. Breitling, LP, Yang, R, Korn, B, et al. Tobacco-smoking-related differential DNA methylation: 27 K discovery and replication. Am J Hum Genet 2011; 88: 450457.
Google Scholar | Crossref | Medline | ISI
28. Lee, KW, Pausova, Z. Cigarette smoking and DNA methylation. Front Genet 2013; 4: 111.
Google Scholar | Crossref
29. Lindenmayer, JP, Bernstein-Hyman, R, Grochowski, S, et al. Psychopathology of schizophrenia: initial validation of a 5-factor model. Psychopathology 1995; 28: 2231.
Google Scholar | Crossref | Medline | ISI
30. Kherif, F, Poline, JB, Flandin, G, et al. Multivariate model specification for fMRI data. Neuroimage 2002; 16: 10681083.
Google Scholar | Crossref | Medline | ISI
31. Livshits, G, Roset, A, Yakovenko, K, et al. Genetics of human body size and shape: body proportions and indices. Ann Hum Biol 2002; 29: 271289.
Google Scholar | Crossref | Medline
32. Arya, R, Blangero, J, Williams, K, et al. Factors of insulin resistance syndrome–related phenotypes are linked to genetic locations on chromosomes 6 and 7 in nondiabetic Mexican-Americans. Diabetes 2002; 51: 841847.
Google Scholar | Crossref | Medline | ISI
33. Rowe, DB, Hoffmann, RG. Multivariate statistical analysis in fMRI. Eng Med Biol Mag IEEE 2006; 25: 6064.
Google Scholar | Crossref | Medline
34. Teipel, SJ, Born, C, Ewers, M, et al. Multivariate deformation-based analysis of brain atrophy to predict Alzheimer’s disease in mild cognitive impairment. Neuroimage 2007; 38: 1324.
Google Scholar | Crossref | Medline | ISI
35. Formisano, E, De Martino, F, Valente, G. Multivariate analysis of fMRI time series: classification and regression of brain responses using machine learning. Magn Reson Imag 2008; 26: 921934.
Google Scholar | Crossref | Medline
36. Efron, B, Hastie, T, Johnstone, I, et al. Least angle regression. Ann Stat 2004; 32: 407499.
Google Scholar | Crossref | ISI
37. Simon, N, Friedman, J, Hastie, T, et al. A sparse-group lasso. J Comput Graph Stat 2013; 22: 231245.
Google Scholar | Crossref | ISI
38. Liquet, B, de Micheaux, PL, Hejblum, BP, et al. Group and sparse group partial least square approaches applied in genomics context. Bioinformatics 2016; 32: 3542.
Google Scholar | Medline
39. Pearl, J . Causality: models, reasoning, and inference, New York: Cambridge University Press, 2009.
Google Scholar | Crossref
40. Cruchaga, C, Kauwe, JS, Harari, O, et al. GWAS of cerebrospinal fluid tau levels identifies risk variants for Alzheimers disease. Neuron 2013; 78: 256268.
Google Scholar | Crossref | Medline
41. Yu, CE, Seltman, H, Peskind, ER, et al. Comprehensive analysis of APOE and selected proximate markers for late-onset Alzheimer’s disease: patterns of linkage disequilibrium and disease/marker association. Genomics 2007; 89: 655665.
Google Scholar | Crossref | Medline | ISI
42. Bu, G . Apolipoprotein E and its receptors in Alzheimer’s disease: pathways, pathogenesis and therapy. Nat Rev Neurosci 2009; 10: 333344.
Google Scholar | Crossref | Medline | ISI
Access Options

My Account

Welcome
You do not have access to this content.



Chinese Institutions / 中国用户

Click the button below for the full-text content

请点击以下获取该全文

Institutional Access

does not have access to this content.

Purchase Content

24 hours online access to download content

Research off-campus without worrying about access issues. Find out about Lean Library here

Your Access Options


Purchase

SMM-article-ppv for $41.50
Single Issue 24 hour E-access for $543.66

Cookies Notification

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Find out more.
Top