Abstract
Simple models are preferred over complex models, but over-simplistic models could lead to erroneous interpretations. The classical approach is to start with a simple model, whose shortcomings are assessed in residual-based model diagnostics. Eventually, one increases the complexity of this initial overly simple model and obtains a better-fitting model. I illustrate how transformation analysis can be used as an alternative approach to model choice. Instead of adding complexity to simple models, step-wise complexity reduction is used to help identify simpler and better interpretable models. As an example, body mass index (BMI) distributions in Switzerland are modelled by means of transformation models to understand the impact of sex, age, smoking and other lifestyle factors on a person's BMI. In this process, I searched for a compromise between model fit and model interpretability. Special emphasis is given to the understanding of the connections between transformation models of increasing complexity. The models used in this analysis ranged from evergreens, such as the normal linear regression model with constant variance, to novel models with extremely flexible conditional distribution functions, such as transformation trees and transformation forests.
References
| Breiman, L (2001) Random forests. Machine Learning, 45, 5–32. doi:10.1023/A:1010933404324 Google Scholar | Crossref | ISI | |
| Bundesamt für Statistik
(2013) Die Schweizerische Gesundheitsbefragung
2012 in Kürze: Konzept, Methode, Durchführung [The Swiss Health Survey
2012 in Short: Concept, Method, Implementation]. Bern.
URL http://www.bfs.admin.ch Google Scholar | |
| Chernozhukov, V, Fernández-Val, I, Melly, B (2013) Inference on counterfactual distributions. Econometrica, 81, 2205–2268. doi:10.3982/ECTA10582. Google Scholar | Crossref | |
| Fahrmeir, L, Kneib, T, Lang, S, Marx, B (2013) Regression: Models, Methods and Applications. New York, NY: Springer-Verlag. Google Scholar | Crossref | |
| Farouki, RT (2012) The Bernstein polynomial basis: A centennial retrospective. Computer Aided Geometric Design, 29, 379–419. doi:10.1016/j.cagd.2012.03.001. Google Scholar | Crossref | ISI | |
| Hothorn, T (2018) trtf: Transformation Trees and Forests. R package version 0.3-0. URL https://CRAN.R-project.org/package=trtf Google Scholar | |
| Hothorn, T (2017a) mlt: Most Likely Transformations. R package version 0.2-1. URL https://CRAN.R-project.org/package=mlt Google Scholar | |
| Hothorn, T (2017b) Most Likely Transformations: The mlt Package. R package vignette version 0.2-0. URL https://CRAN.R-project.org/package=mlt.docreg Google Scholar | |
| Hothorn, T, Zeileis, A (2017) Transformation forests. Technical report, arXiv 1701.02110. URL https://arxiv.org/abs/1701.02110 Google Scholar | |
| Hothorn, T, Kneib, T, Bühlmann, P (2013) Conditional transformation models by example. In Proceedings of the 28th International Workshop on Statistical Modelling, edited by VMR Muggeo, V Capursi, G Boscaino and G Lovison. Pages 15–26. Universitá Degli Studi Di Palermo. ISBN 978-88-96251-47-8. Google Scholar | |
| Hothorn, T, Kneib, T, Bühlmann, P (2014) Conditional transformation models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76, 3–27. doi:10.1111/rssb.12017 Google Scholar | Crossref | ISI | |
| Hothorn, T, Möst, L, Bühlmann, P (2017) Most likely transformations. Scandinavian Journal of Statistics. URL https://arxiv.org/abs/1508.06749 Google Scholar | Crossref | |
| Hothorn, T, Zeileis, A (2017) Transformation forests (Technical report, arXiv 1701.02110). URL https://arxiv.org/abs/1701.02110 Google Scholar | |
| Liu, Q, Shepherd, BE, Li, C, Harrell, FE (2017) Modeling continuous response variables using ordinal regression. Statistics in Medicine. doi:10.1002/sim.7433 Google Scholar | Crossref | |
| Lohse, T, Rohrmann, S, Faeh, D, Hothorn, T (2017) Continuous outcome logistic regression for analyzing body mass index distributions. F1000Research, 6, 1933. doi:10.12688/f1000research.12934.1 Google Scholar | Crossref | |
| Manuguerra, M, Heller, GZ (2010) Ordinal regression models for continuous scales. The International Journal of Biostatistics, 6. doi:10.2202/1557-4679.1230 Google Scholar | Crossref | |
| Möst, L, Hothorn, T (2015) Conditional transformation models for survivor function estimation. International Journal of Biostatistics. doi:10.1515/ijb-2014-0006 Google Scholar | Crossref | |
| Möst, L, Schmid, M, Faschingbauer, F, Hothorn, T (2014) Predicting birth weight with conditionally linear transformation models. Statistical Methods in Medical Research. doi:10.1177/0962280214532745 Google Scholar | |
| R Core Team
(2017) R: A Language and Environment for
Statistical Computing. >R
Foundation for Statistical Computing,
Vienna, Austria. URL
http://www.R-project.org/ Google Scholar | |
| UNESCO Institute for
Statistics (2012) International
Standard Classification of Education: ISCED 2011.
Montreal. URL
http://www.uis.unesco.org/Education/Documents/isced-2011-en.pdf Google Scholar | Crossref |
