Abstract
The control of confounding is an area of extensive epidemiological research, especially in the field of causal inference for observational studies. Matched cohort and case-control study designs are commonly implemented to control for confounding effects without specifying the functional form of the relationship between the outcome and confounders. This paper extends the commonly used regression models in matched designs for binary and survival outcomes (i.e. conditional logistic and stratified Cox proportional hazards) to studies of continuous outcomes through a novel interpretation and application of logit-based regression models from the econometrics and marketing research literature. We compare the performance of the maximum likelihood estimators using simulated data and propose a heuristic argument for obtaining the residuals for model diagnostics. We illustrate our proposed approach with two real data applications. Our simulation studies demonstrate that our stratification approach is robust to model misspecification and that the distribution of the estimated residuals provides a useful diagnostic when the strata are of moderate size. In our applications to real data, we demonstrate that parity and menopausal status are associated with percent mammographic density, and that the mean level and variability of inpatient blood glucose readings vary between medical and surgical wards within a national tertiary hospital. Our work highlights how the same class of regression models, available in most statistical software, can be used to adjust for confounding in the study of binary, time-to-event and continuous outcomes.
References
| 1. | Rydell, M, Granath, F, Cnattingius, S In-utero exposure to maternal smoking is not linked to tobacco use in adulthood after controlling for genetic and family influences: a Swedish sibling study. Eur J Epidemiol 2014; 29: 499–506. Google Scholar | Medline | ISI |
| 2. | Brumback, BA, Dailey, AB, Zheng, HW. Adjusting for confounding by neighborhood using a proportional odds model and complex survey data. Am J Epidemiol 2012; 175: 1133–1141. Google Scholar | Medline |
| 3. | Rubin, DB . Multivariate matching methods that are equal percent bias reducing, I: Some examples. Biometrics 1976; 32: 109–120. Google Scholar | ISI |
| 4. | Abadie, A, Imbens, GW. Large sample properties of matching estimators for average treatment effects. Econometrica 2006; 74: 235–267. Google Scholar | ISI |
| 5. | Kennedy, EH, Sjolander, A, Small, DS. Semiparametric causal inference in matched cohort studies. Biometrika 2015; 102: 739–746. Google Scholar |
| 6. | Abadie, A, Imbens, GW. Matching on the estimated propensity score. Econometrica 2016; 84: 781–807. Google Scholar | ISI |
| 7. | Sjolander, A, Greenland, S. Ignoring the matching variables in cohort studies – when is it valid and why? Stat Med 2013; 32: 4696–4708. Google Scholar | Medline |
| 8. | Jager, K, Zoccali, C, Macleod, A Confounding: what it is and how to deal with it. Kidney Int 2008; 73: 256–260. Google Scholar | Medline |
| 9. | Mantel, N, Haenszel, W. Statistical aspects of the analysis of data from retrospective studies of disease. J Natl Cancer Inst 1959; 22: 719–748. Google Scholar | Medline |
| 10. | Rothman, KJ, Greenland, S, Lash, TL. Modern epidemiology, 3rd ed. Philadelphia, PA: Lippincott Williams & Wilkins, 2013. Google Scholar |
| 11. | Hsieh, CC, Walker, AM, Hoar, SK. Grouping occupations according to carcinogenic potential: occupation clusters from an exposure linkage system. Am J Epidemiol 1983; 117: 575–589. Google Scholar | Medline |
| 12. | Ben-Akiva ME, Lerman SR and NetLibrary Inc. Discrete choice analysis: theory and application to travel demand. Cambridge, MA.: MIT Press, 1985. Google Scholar |
| 13. | Louviere, JJ, Hensher, DA, Swait, JD. Stated choice methods: analysis and applications, Cambridge, UK: Cambridge University Press, 2000. Google Scholar |
| 14. | Beggs, S, Cardell, S, Hausman, J. Assessing the potential demand for electric cars. J Econometrics 1981; 17: 1–19. Google Scholar | ISI |
| 15. | Allison, PD, Christakis, NA. Logit-models for sets of ranked items. Sociol Methodol 1994; 24: 199–228. Google Scholar | ISI |
| 16. | Becher, H . The concept of residual confounding in regression models and some applications. Stat Med 1992; 11: 1747–1758. Google Scholar | Medline | ISI |
| 17. | McNamee, R . Regression modelling and other methods to control confounding. Occupation Environment Med 2005; 62: 500–506. Google Scholar | Medline | ISI |
| 18. | Therneau, TM, Grambsch, PM. Modeling survival data: extending the Cox Model, New York, NY: Springer, 2000. Google Scholar |
| 19. | R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, 2016, http://www.R-project.org/. Google Scholar |
| 20. | Therneau TM. A Package for Survival Analysis in S. 2015. Google Scholar |
| 21. | Evans, M, Hastings, NAJ, Peacock, JB. Statistical distributions, 3rd ed. New York, NY: Wiley, 2000. Google Scholar |
| 22. | Stasinopoulos, DM, Rigby, RA. Generalized additive models for location scale and shape (GAMLSS) in R. J Stat Softw 2007; 23. Google Scholar |
| 23. | Kutner MH. Applied linear statistical models/Michael H. Kutner… [et al.]. McGraw-Hill/Irwin series Operations and decision sciences. 5th ed. Boston, MA: McGraw-Hill Irwin, 2005, pp.1396, p. ill. + 1 CD-ROM (4 3/4 in.). Google Scholar |
| 24. | Cox, C . Delta method. Encyclopedia of biostatistics, Chichester, West Sussex: John Wiley & Sons, Ltd, 2005. Google Scholar |
| 25. | Rigby, RA, Stasinopoulos, DM. Generalized additive models for location, scale and shape,(with discussion). Appl Stat 2005; 54: 507–554. Google Scholar |
| 26. | Rigby RA and Stasinopoulos DM. Mean and dispersion additive models. In: Hardle W and Schimek MG (eds) Statistical theory and computational aspects of smoothing. Heidelberg: Springer, pp.215-230. Google Scholar |
| 27. | Rigby, RA, Stasinopoulos, DM. A semi-parametric additive model for variance heterogeneity. Stat Comput 1996; 6: 57–65. Google Scholar | ISI |
| 28. | McCormack, VA, dos Santos Silva, I. Breast density and parenchymal patterns as markers of breast cancer risk: a meta-analysis. Cancer Epidemiol Biomarker Prev 2006; 15: 1159–1169. Google Scholar | Medline | ISI |
| 29. | Stone, J, Warren, RM, Pinney, E Determinants of percentage and area measures of mammographic density. Am J Epidemiol 2009; 170: 1571–1578. Google Scholar | Medline | ISI |
| 30. | Assi, V, Warwick, J, Cuzick, J Clinical and epidemiological issues in mammographic density. Nat Rev Clin Oncol 2012; 9: 33–40. Google Scholar | ISI |
| 31. | Box, GEP, Cox, DR. An analysis of transformations. J R Stat Soc Ser B 1964; 26: 211–252. Google Scholar |
| 32. | Austin, PC . An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behav Res 2011; 46: 399–424. Google Scholar | Medline | ISI |
| 33. | Stuart, EA . Matching methods for causal inference: a review and a look forward. Stat Sci 2010; 25: 1–21. Google Scholar | Medline | ISI |
| 34. | Schneeweiss, S, Rassen, JA, Glynn, RJ High-dimensional propensity score adjustment in studies of treatment effects using health care claims data. Epidemiology 2009; 20: 512–522. Google Scholar | Medline | ISI |
| 35. | Barnwell-Menard, JL, Li, Q, Cohen, AA. Effects of categorization method, regression type, and variable distribution on the inflation of Type-I error rate when categorizing a confounding variable. Stat Med 2015; 34: 936–949. Google Scholar | Medline |
| 36. | Benedetti, A, Abrahamowicz, M. Using generalized additive models to reduce residual confounding. Stat Med 2004; 23: 3781–3801. Google Scholar | Medline |
| 37. | Hertz-Picciotto, I, Rockhill, B. Validity and efficiency of approximation methods for tied survival times in Cox regression. Biometrics 1997; 53: 1151–1156. Google Scholar | Medline |

