Abstract
Compositional count data are discrete vectors representing the numbers of outcomes falling into any of several mutually exclusive categories. Compositional techniques based on the log-ratio methodology are appropriate in those cases where the total sum of the vector elements is not of interest. Such compositional count data sets can contain zero values which are often the result of insufficiently large samples. That is, they refer to unobserved positive values that may have been observed with a larger number of trials or with a different sampling design. Because the log-ratio transformations require data with positive values, any statistical analysis of count compositions must be preceded by a proper replacement of the zeros. A Bayesian-multiplicative treatment has been proposed for addressing this count zero problem in several case studies. This treatment involves the Dirichlet prior distribution as the conjugate distribution of the multinomial distribution and a multiplicative modification of the non-zero values. Different parameterizations of the prior distribution provide different zero replacement results, whose coherence with the vector space structure of the simplex is stated. Their performance is evaluated from both the theoretical and the computational point of view.
References
| Aebischer, NJ, Robertson, PA, Kenward, RE (1993) Compositional analysis of habitat use from animal radio-tracking data. Ecology, 74(5), 1313–25. Google Scholar | Crossref | ISI | |
| Agresti, A (2003) Categorical data analysis. Wiley Series in Probability and Statistics, p. 710. 2nd edn, Hoboken: John Wiley & Sons. Google Scholar | |
| Aitchison, J (1986) The statistical analysis of compositional data. Monographs on Statistics and Applied Probability (Reprinted 2003 with additional material by The Blackburn Press). London: Chapman and Hall Ltd., p. 416. Google Scholar | Crossref | |
| Bernard, JM (2005) An introduction to the imprecise Dirichlet model for multinomial data. International Journal of Approximate Reasoning, 39(2–3), 123–50. Google Scholar | Crossref | ISI | |
| Butler, A, Glasbey, C (2008) A latent Gaussian model for compositional data with zeros. Journal of the Royal Statistical Society Series C-Applied Statistics, 57, 505–20. Google Scholar | Crossref | ISI | |
| Davis, CS (1993) The computer generation of the multinomial random variates. Computational Statistics & Data Analysis, 16, 205–17. Google Scholar | Crossref | ISI | |
| Eaton, ML (1983) Multivariate statistics. A vector space approach. New York: John Wiley & Sons, p. 512. Google Scholar | |
| Egozcue, JJ (2009) Reply to ‘On the Harker variation diagrams; ...’ by J.A. Cortés. Mathematical Geosciences, 41, 829–34. Google Scholar | Crossref | ISI | |
| Egozcue, JJ, Pawlowsky-Glahn, V (2006) Simplicial geometry for compositional data. In Buccianti, A, Mateu-Figueras, G, Pawlowsky-Glahn, V (eds), Compositional data analysis in the geosciences: From theory to practice London: Geological Society, pp. 145–160. Google Scholar | Crossref | |
| Egozcue, JJ, Pawlowsky-Glahn, V, Mateu-Figueras, G, Barceló-Vidal, C (2003) Isometric logratio transformations for compositional data analysis. Mathematical Geology, 35(3), 279–300. Google Scholar | Crossref | |
| Egozcue, JJ, Tolosana-Delgado, R, Ortego, MI (eds) (2011) Proceedings of CODAWORK’11: The 4th Compositional Data Analysis Workshop. Sant Feliu De Guxols, May 10-13. ISBN978-84-87867-76-7 (electronic publication). Google Scholar | |
| Elston, DA, Illius, AW, Gordon, IJ (1996) Assessment of preference among a range of options using log ratio analysis. Ecology, 77, 2538–48. Google Scholar | Crossref | ISI | |
| Filzmoser, P, Hron, K, Templ, M (2012) Discriminant analysis for compositional data and robust parameter estimation. Computational Statistics, 27(4), 585–604. Google Scholar | Crossref | ISI | |
| Friedman, J, Alm, EJ (2012) Inferring correlation networks from genomic survey data. PLoS Computational Biology, 8(9), e1002687. doi:10.1371/journal.pcbi.1002687. Google Scholar | Crossref | ISI | |
| Graffelman, J (2011) Statistical inference for Hardy-Weinberg equilibrium using logratio coordinates. In Egozcue, J.J., Tolosana-Delgado, R., Ortego, M.I. (Eds), Proceedings of the 4th International Workshop on Compositional Data Analysis, p. 5. Google Scholar | |
| Graffelman, J, Egozcue, JJ (2011) Hardy-Weinberg equilibrium: A nonparametric compositional approach, Ch. 15. In Pawlowsky-Glahn, V., Buccianti, A. (Eds), Compositional Data Analysis: Theory and Applications, pp. 208–17. Chichester, UK: John Wiley & Sons, Ltd. Google Scholar | Crossref | |
| Hron, K, Templ, M, Filzmoser, P (2010) Imputation of missing values for compositional data using classical and robust methods. Computational Statistics & Data Analysis, 54(12), 3095–107. Google Scholar | Crossref | ISI | |
| Martín-Fernández, JA, Barceló-Vidal, C, Pawlowsky-Glahn, V (2003) Dealing with zeros and missing values in compositional data sets using nonparametric imputation. Mathematical Geology, 35(3), 253–78. Google Scholar | Crossref | |
| Martín-Fernández, JA, Palarea-Albaladejo, J, Olea, RA (2011) Dealing with zeros, Ch. 4. In Pawlowsky-Glahn, V., Buccianti, A. (Eds), Compositional Data Analysis: Theory and Applications, pp. 47–62. Chichester, UK: John Wiley & Sons, Ltd. Google Scholar | Crossref | |
| Martín-Fernández, JA, Hron, K, Templ, M, Filzmoser, P, Palarea-Albaladejo, J (2012) Model-based replacement of rounded zeros in compositional data: Classical and robust approach. Computational Statistics & Data Analysis, 56(3), 2688–704. Google Scholar | Crossref | ISI | |
| Mateu-Figueras, G, Pawlowsky-Glahn, V(2008) A critical approach to probability laws in geochemistry. Mathematical Geosciences, 40(5), 489–502. Google Scholar | Crossref | ISI | |
| Monti, GS, Mateu-Figueras, G, Pawlowsky-Glahn, V (2011) Notes on the scaled Dirichlet distribution. In Pawlowsky-Glahn, V., Buccianti, A. (Eds), Compositional Data Analysis: Theory and Applications, pp. 128–38. Chichester, UK: John Wiley & Sons, Ltd. Google Scholar | Crossref | |
| Palarea-Albaladejo, J, Martín-Fernández, JA, Gómez-García, J (2007) A parametric approach for dealing with compositional rounded zeros. Mathematical Geology, 39, 625–45. Google Scholar | Crossref | |
| Palarea-Albaladejo, J, Martín-Fernández, JA (2008) A modified EM alr-algorithm for replacing rounded zeros in compositional data sets. Computers & Geosciences, 34(8), 902–17. Google Scholar | Crossref | ISI | |
| Palarea-Albaladejo, J, Martín-Fernández, JA, Soto, JA (2012) Dealing with distances and transformations for fuzzy c-Means clustering of compositional data. Journal of Classification, 29(2), 144–69. Google Scholar | Crossref | ISI | |
| Palarea-Albaladejo, J, Martín-Fernández, JA (2013) Values below detection limit in compositional chemical data. Analytica Chimica Acta, 764, 32–43. Google Scholar | Crossref | Medline | ISI | |
| Pawlowsky-Glahn, V, Buccianti, A, eds (2011) Compositional data analysis: Theory and applications. Chichester: John Wiley & Sons, p. 378. Google Scholar | Crossref | |
| Pawlowsky-Glahn, V, Egozcue, JJ (2002) BLU estimators and compositional data. Mathematical Geology, 34(3), 259–74. Google Scholar | Crossref | |
| Pearson, K (1897) Mathematical contributions to the theory of evolution. On a form of spurious correlation which may arise when indices are used in the measurement of organs. Proceedings of the Royal Society of London, 60, 489–502. Google Scholar | Crossref | |
| Pierotti, MER, Martín-Fernández, JA, Seehausen, O (2009) A mapping individual variation in male mating preference space: Multiple choice in a colour polymorphic cichlid fish. Evolution, 63(9), 2372–88. Google Scholar | Crossref | Medline | ISI | |
| R development core team (2012) R: A language and environment for statistical computing, Vienna, Austria: R Foundation for Statistical Computing. http://www.r-project.org. Google Scholar | |
| Richardson, D (1997) How to recognize zero. Journal of Symbolic Computation, 24(6), 627–45. Google Scholar | Crossref | ISI | |
| Rodrigues, PC, Lima, AT (2009) Analysis of an European union election using principal component analysis. Statistical Papers, 50, 895–904. Google Scholar | Crossref | ISI | |
| Stewart, C, Field, C (2010) Managing the essential zeros in quantitative fatty acid signature analysis. Journal of Agricultural, Biological, and Environmental Statistics, 16(1), 45–69. Google Scholar | Crossref | ISI | |
| Templ, M, Hron, K, Filzmoser, P (2011) robCompositions: An R-package for robust statistical analysis of compositional data, Ch. 25. In Pawlowsky-Glahn, V., Buccianti, A. (Eds), Compositional Data Analysis: Theory and Applications, pp. 341–55. Chichester, UK: John Wiley & Sons, Ltd. Google Scholar | Crossref | |
| Walley, P (1996) Inferences from multinomial data: Learning about a bag of marbles. Journal of the Royal Statistical Society Series B (Methodological), 58(1), 3–57. Google Scholar |
