Abstract
The sequential goodness-of-fit (SGoF) multiple testing method has recently been proposed as an alternative to the familywise error rate- and the false discovery rate-controlling procedures in high-dimensional problems. For discrete data, the SGoF method may be very conservative. In this paper, we introduce an alternative SGoF-type procedure that takes into account the discreteness of the test statistics. Like the original SGoF, our new method provides weak control of the false discovery rate/familywise error rate but attains false discovery rate levels closer to the desired nominal level, and thus it is more powerful. We study the performance of this method in a simulation study and illustrate its application to a real pharmacovigilance data set.
References
| 1. | Dudoit S and van der Laan MJ. Multiple testing procedures and applications to genomics. Springer series in statistics. New York: Springer, 2007. Google Scholar |
| 2. | Lehmann, E, Romano, J. Testing statistical hypotheses, New York: Springer, 2006. Google Scholar |
| 3. | Carvajal-Rodríguez, A, de Uña-Álvarez, J, Rolán-Álvarez, E. A new multitest correction (SGoF) that increases its statistical power when increasing the number of tests. BMC Bioinformatics 2009; 10: 1–14. Google Scholar | Medline | ISI |
| 4. | Hochberg, Y, Tamhane, AC. Multiple comparison procedures, New Jersey: Wiley, 1987. Google Scholar |
| 5. | Westfall, PH, Tobias, RD, Wolfinger, RD. Multiple comparisons and multiple tests using SAS, Cary, NC: SAS Institute Inc., 2011. Google Scholar |
| 6. | Tukey, J . The higher criticism. Princeton University course notes, statistics, 411(T13), Princeton: Princeton University, 1976. Google Scholar |
| 7. | Donoho, D, Jin, J. Higher criticism for detecting sparse heterogeneous mixtures. Ann Stat 2004; 32: 962–994. Google Scholar | ISI |
| 8. | de Uña-Álvarez, J . On the statistical properties of SGoF multitesting method. Stat Appl Genet Mol Biol 2011; 10: 1–30. Google Scholar | ISI |
| 9. | de Uña-Álvarez, J . The beta-binomial SGoF method for multiple dependent tests. Stat Appl Genet Mol Biol 2012; 11(3): 365–396. Google Scholar | ISI |
| 10. | Lehmann, E, Romano, JP. Generalizations of the familywise error rate. Ann Stat 2005; 33: 1138–1154. Google Scholar | ISI |
| 11. | Chen X and Doerge R. A weighted FDR procedure under discrete and heterogeneous null distributions. ArXiv preprint arXiv:1502.00973. ArXiv preprint arXiv: 1502.00973[stat.ME], 2015. Google Scholar |
| 12. | Westfall, P, Wolfinger, R. Multiple tests with discrete distributions. Am Stat 1997; 51: 3–8. Google Scholar | ISI |
| 13. | Gilbert, P . A modified false discovery rate multiple-comparisons procedure for discrete data, applied to human immunodeficiency virus genetics. J R Stat Soc Ser C 2005; 54: 143–158. Google Scholar | ISI |
| 14. | Heller, R, Gur, H. False discovery rate controlling procedures for discrete tests. arxiv:11124627v2, ArXiv e-prints arXiv: 1112.4627v2[stat.ME], 2012. Google Scholar |
| 15. | Westfall, P, Troendle, J. Multiple testing with minimal assumptions. Biometrical J 2008; 50: 745–755. Google Scholar | Medline | ISI |
| 16. | Gutman, R, Hochberg, Y. Improved multiple test procedures for discrete distributions: new ideas and analytical review. J Stat Plan Infer 2007; 137: 2380–2393. Google Scholar | ISI |
| 17. | Benjamini, Y, Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B 1995; 57: 289–300. Google Scholar |
| 18. | Benjamini, Y, Liu, W. A step-down multiple hypotheses testing procedure that controls the false discovery rate under independence. J Stat Plan Infer 1999; 82: 163–170. Google Scholar | ISI |
| 19. | Castro-Conde, I, de Uña-Álvarez, J. Power, FDR and conservativeness of BB-SGoF method. Comput Stat. Stat 2015; 30: 1143–1161. Google Scholar | ISI |
| 20. | Shaked M and Shanthikumar J. Stochastic orders. Springer series in statistics. New York: Springer-Verlag, 2007. Google Scholar |
| 21. | David, H, Nagaraja, H. Order statistics, Third edition. New Jersey: Wiley, 2004. Google Scholar |
| 22. | Shorack, G, Wellner, J. Empirical processes with applications to statistics. Classics in applied mathematics, Philadelphia: Society for Industrial and Applied Mathematics (SIAM), 2009. Google Scholar |
| 23. | Hong, Y . On computing the distribution functions for the Poisson binomial distribution. Comput Stat Data Anal 2013; 59: 41–51. Google Scholar | ISI |
| 24. | Castro-Conde, I, de Uña-Álvarez, J. Adjusted p-values for SGoF multiple test procedure. Biometrical J 2015; 57: 108–122. Google Scholar | Medline | ISI |
| 25. | Berry, G, Armitage, P. Mid-p confidence intervals: a brief review. J R Stat Soc Ser D (The Statistician) 1995; 44: 417–423. Google Scholar |
| 26. | Heller R, Gur H and Yaacoby S. discreteMTP: Multiple testing procedures for discrete test statistics. R package version 0.1-2. CRAN package repository: https://cran.r-project.org/web/packages/discreteMTP/index.html, 2012. Google Scholar |
| 27. | de Uña–Álvarez, J, Carvajal-Rodríguez, A. ‘SGoFicance Trace’: assessing significance in high dimensional testing problems. PLoS One 2010; 5(12): e15930–e15930. Google Scholar | Medline | ISI |
| 28. | Hong Y. poibin: The Poisson binomial distribution. R package version 1.2. CRAN package repository: https://cran.r-project.org/web/packages/poibin/index.html, 2013. Google Scholar |
| 29. | Castro-Conde I and de Uña-Álvarez J. sgof: Multiple hypothesis testing. R package version 2.2. CRAN package repository: https://cran.r-project.org/web/packages/sgof/index.html, 2015. Google Scholar |
| 30. | Castro-Conde, I, de Uña-Álvarez, J. sgof: An R package for multiple testing problems. R Journal 2014; 6/2: 96–113. Google Scholar | ISI |
| 31. | Pounds, S, Cheng, C. Robust estimation of the false discovery rate. Bioinformatics 2006; 22: 1979–1987. Google Scholar | Medline | ISI |
| 32. | Chen, X, Doerge, R. Generalized estimators for multiple testing: proportion of true nulls and false discovery rate. Technical report 12-04, Department of Statistics, Purdue University, 2012. Google Scholar |
| 33. | Martínez-Camblor, P . On correlated z-values distribution in hypothesis testing. Comput Stat Data Anal 2014; 79: 30–43. Google Scholar | ISI |
| 34. | R Development Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2009. Google Scholar |
| 35. | Volkova, A . A refinement of the central limit theorem for sums of independent random indicators. Theor Probab Appl 1996; 40: 791–794. Google Scholar | ISI |
| 36. | Carvajal-Rodríguez, A, de Uña-Álvarez, J. Assessing significance in high-throughput experiments by sequential goodness of fit and q-value estimation. PLoS One 2011; 6(9): e24700–e24700. Google Scholar | Medline | ISI |
| 37. | Storey, JD . A direct approach to false discovery rates. J R Stat Soc Ser B (Statistical Methodology) 2002; 64: 479–498. Google Scholar | ISI |
| 38. | Nettleton, D, Hwang, JTG Estimating the number of true null hypotheses from a histogram of p values. J Agr Biol Environ Stat 2006; 11: 337–356. Google Scholar | ISI |
| 39. | Dalmasso, C, Broet, P, Moreau, T. A simple procedure for estimating the false discovery rate. Bioinformatics 2005; 21: 660–668. Google Scholar | Medline | ISI |
| 40. | Dickhaus, T, Straburger, K, Schunk, D How to analyze many contingency tables simultaneously in genetic association studies. Stat Appl Genet Mol Biol 2012; 11(4): Article12–Article12. Google Scholar |
| 41. | Dialsingh I. False discovery rates when the statistics are discrete. PhD Thesis, The Pennsylvania State University, 2012. Google Scholar |
| 42. | Bancroft T and Nettleton D. Computationally efficient estimation of false discovery rate using sequential permutation p-values. Technical report, Iowa State University, 2009. Google Scholar |
