Abstract
In the analysis of clustered data, inverse cluster size weighting has been shown to be resistant to the potentially biasing effects of informative cluster size, where the number of observations within a cluster is associated with the outcome variable of interest. The method of inverse cluster size reweighting has been implemented to establish clustered data analogues of common tests for independent data, but the method has yet to be extended to tests of categorical data. Many variance estimators have been implemented across established cluster-weighted tests, but potential effects of differing methods on test performance has not previously been explored. Here, we develop cluster-weighted estimators of marginal proportions that remain unbiased under informativeness, and derive analogues of three popular tests for clustered categorical data, the one-sample proportion, goodness of fit, and independence chi square tests. We construct these tests using several variance estimators and show substantial differences in the performance of cluster-weighted tests based on variance estimation technique, with variance estimators constructed under the null hypothesis maintaining size closest to nominal. We illustrate the proposed tests through an application to a data set of functional measures from patients with spinal cord injuries participating in a rehabilitation program.
References
| 1. | Hoffman, EB, Sen, PK, Weinberg, CR. Within-cluster resampling. Biometrika 2001; 88: 1121–1134. Google Scholar | Crossref | ISI |
| 2. | Williamson, JM, Datta, S, Satten, GA. Marginal analysis of clustered data when cluster size is informative. Biometrics 2003; 59: 36–42. Google Scholar | Crossref | Medline | ISI |
| 3. | Datta, S, Satten, GA. Rank-sum tests for clustered data. J Am Stat Assoc 2005; 100: 908–915. Google Scholar | Crossref | ISI |
| 4. | Datta, S, Satten, GA. A signed-rank test for clustered data. Biometrics 2008; 65: 501–507. Google Scholar | Crossref |
| 5. | Lorenz, DJ, Datta, S, Harkema, SJ. Marginal association measures for clustered data. Stat Med 2011; 30: 3181–3191. Google Scholar | Crossref | Medline |
| 6. | Lorenz, DJ, Levy, S, Datta, S. Inferring marginal association with paired and unpaired clustered data. Stat Meth Med Res 2018; 27: 1806–1817. Google Scholar | SAGE Journals | ISI |
| 7. | Gregg, M, Datta, S, Lorenz, DJ. A log rank test for clustered data with informative within cluster group size. Stat Med 2018; 37: 4071–4082. Google Scholar | Crossref | Medline |
| 8. | Newcombe, RG. Two-sided confidence intervals for the single proportion: comparison of seven methods. Stat Med 1998; 17: 857–872. Google Scholar | Crossref | Medline | ISI |
| 9. | Blyth, CR, Still, HA. Binomial confidence intervals. J Am Stat Assoc 1983; 78: 108–116. Google Scholar | Crossref | ISI |
| 10. | Ghosh, BA. A comparison of some approximate confidence intervals for the binomial parameter. J Am Stat Assoc 1979; 74: 894–900. Google Scholar | Crossref | ISI |
| 11. | Seaman, SR, Pavlou, M, Copas, AJ. Methods for observed-cluster inference when cluster size is informative: a review and clarifications. Biometrics 2014; 70: 449–456. Google Scholar | Crossref | Medline |
| 12. | Nevalainen, J, Datta, S, Oja, H. Inference on the marginal distribution of clustered data with informative cluster size. Stat Papers 2014; 55: 71–92. Google Scholar | Crossref | Medline |
| 13. | Korn, EL, Graubard, BI. Confidence intervals for proportions with small expected number of positive counts estimated from survey data. Survey Methodol 1998; 24: 193–201. Google Scholar |
| 14. | Agresti, A. Score and pseudo-score confidence intervals for categorical data analysis. Stat Biopharmaceut Res 2011; 3: 163–172. Google Scholar | Crossref |
| 15. | Durkalski, VL, Palesch, YY, Lipsitz, SR, et al. Analysis of clustered matched pair data. Stat Med 2003; 22: 2417–2428. Google Scholar | Crossref | Medline | ISI |
| 16. | Dean, N, Pagano, M. Evaluating confidence interval methods for binomial proportions in clustered surveys. J Surv Stat Methodol 2015; 3: 484–503. Google Scholar | Crossref |
| 17. | Dutta, S, Datta, S. A rank-sum test for clustered data when the number of subjects in a group within a cluster is informative. Biometrics 2015; 72: 432–440. Google Scholar | Crossref | Medline |
| 18. | Huang, Y, Leroux, B. Informative cluster sizes for sub-cluster level covariates and weighted generalized estimating equations. Biometrics 2011; 67: 843–851. Google Scholar | Crossref | Medline | ISI |
| 19. | Agresti, A, Coull, BA. Approximate is better than exact for interval estimation of binomial proportions. Am Stat 1998; 52: 119–126. Google Scholar | ISI |
| 20. | Harkema, SJ, Schmidt-Read, M, Behrman, AL, et al. Establishing the NeuroRecovery Network: multisite rehabilitation centers that provide activity-based therapies and assessments for neurologic disorders. Arch Phys Med Rehabil 2012; 93: 1498–1507. Google Scholar | Crossref | Medline | ISI |
| 21. | Behrman, AL, Ardolino, E, VanHiel, L, et al. Assessment of functional improvement without compensation reduces variability of outcome measures after human spinal cord injury. Arch Phys Med Rehabil 2012; 93: 1518–1529. Google Scholar | Crossref | Medline | ISI |
| 22. | Harkema, SJ, Shogren, C, Ardolino, E, et al. Assessment of functional improvement without compensation for human spinal cord injury: extending the neuromuscular recovery scale to the upper extremities. J Neurotrauma 2016; 33: 2181–2190. Google Scholar | Crossref | Medline |
| 23. | Nevalainen, J, Oja, H, Datta, S. Tests for informative cluster size using a novel balanced bootstrap scheme. Stat Med 2017; 36: 2630–2640. Google Scholar | Crossref | Medline |
| 24. | Forrest, GF, Lorenz, DJ, Hutchinson, K, et al. Ambulation and balance outcomes measure different aspects of recovery in individuals with chronic, incomplete spinal cord injury. Arch Phys Med Rehabil 2012; 93: 1553–1564. Google Scholar | Crossref | Medline | ISI |
| 25. | Franchignoni, F, Tesio, L, Ricupero, C, et al. Trunk control test as an early predictor of stroke rehabilitation outcome. Stroke 1997; 28: 1382–1385. Google Scholar | Crossref | Medline | ISI |
| 26. | Kauermann, G, Carroll, RJ. A note on the efficiency of sandwich covariance matrix estimation. J Am Stat Assoc 2001; 96: 1387–1396. Google Scholar | Crossref | ISI |
| 27. | Li, P, Redden, DT. Small sample performance of bias-corrected sandwich estimators for cluster randomized trials with binary outcomes. Stat Med 2015; 34: 281–296. Google Scholar | Crossref | Medline | ISI |

