In this article, the authors define a methodological framework for analyzing the relationship between state sequences and covariates. Inspired by the principles of analysis of variance, this approach looks at how the covariates explain the discrepancy of the sequences. The authors use the pairwise dissimilarities between sequences to determine the discrepancy, which makes it possible to develop a series of statistical significance–based analysis tools. They introduce generalized simple and multifactor discrepancy-based methods to test for differences between groups, a pseudo-R2 for measuring the strength of sequence-covariate associations, a generalized Levene statistic for testing differences in the within-group discrepancies, as well as tools and plots for studying the evolution of the differences along the time frame and a regression tree method for discovering the most significant discriminant covariates and their interactions. In addition, the authors extend all methods to account for case weights. The scope of the proposed methodological framework is illustrated using a real-world sequence data set.

Abbott, Andrew . 1990. “A Primer on Sequence Methods.” Organization Science 1:375-92.
Google Scholar | Crossref | ISI
Abbott, Andrew, Forrest, John. 1986. “Optimal Matching Methods for Historical Sequences.” Journal of Interdisciplinary History 16:471-94.
Google Scholar | Crossref | ISI
Abbott, Andrew, Hrycak, Alexandra. 1990. “Measuring Resemblance in Sequence Data: An Optimal Matching Analaysis of Musician’s Carrers.” American Journal of Sociolgy 96:144-85.
Google Scholar | Crossref | ISI
Anderson, Marti Jane . 2001. “A New Method for Non-Parametric Multivariate Analysis of Variance.” Austral Ecology 26:32-46.
Google Scholar | ISI
Anderson, Marti Jane . 2006. “Distance-Based Tests for Homogeneity of Multivariate Dispersions.” Biometrics 62:245-53.
Google Scholar | Crossref | Medline | ISI
Bartlett, Maurice Stevenson . 1937. “Properties of Sufficiency and Statistical Tests.” Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences 160:268-82.
Google Scholar | Crossref
Batagelj, Vladimir . 1988. “Generalized Ward and Related Clustering Problems.” Pp. 67-74 in Classification and Related Methods of Data Analysis, edited by Bock, Hans H. Amsterdam, the Netherlands: North-Holland.
Google Scholar
Billari, Francesco Candeloro . 2001a. “The Analysis of Early Life Courses: Complex Description of the Transition to Adulthood.” Journal of Population Research 18:119-42.
Google Scholar | Crossref
Billari, Francesco Candeloro . 2001b. “A Log-Logistic Regression Model for a Transition Rate With a Starting Threshold.” Population Studies 55:15-24.
Google Scholar | Crossref | ISI
Billari, Francesco Candeloro . 2005. “Life Course Analysis: Two (Complementary) Cultures? Some Reflections With Examples From the Analysis of Transition to Adulthood.” Pp. 267-88 in Towards an Interdisciplinary Perspective on the Life Course, edited by Levy, René, Ghisletta, Paolo, Le Goff, Jean-Marie, Spini, Dario, Widmer, Eric. Amsterdam, the Netherlands: Elsevier.
Google Scholar | Crossref
Blossfeld, Hans-Peter, Rohwer, Götz. 2002. Techniques of Event History Modeling, New Approaches to Causal Analysis. 2nd ed. Mahwah NJ: Lawrence Erlbaum.
Google Scholar
Breiman, Leo, Friedman, Jerome H., Olshen, R. A., Stone, C. J. 1984. Classification and Regression Trees. New York: Chapman & Hall.
Google Scholar
Brown, Morton B., Forsythe, Alan B. 1974a. “Robust Tests for the Equality of Variances.” Journal of the American Statistical Association 69:364-67.
Google Scholar | Crossref | ISI
Brown, Morton B., Forsythe, Alan B. 1974b. “The Small Sample Behavior of Some Statistics Which Test the Equality of Several Means.” Technometrics 16:129-32.
Google Scholar | Crossref | ISI
Cuadras, Carles M. 2008. “Distance-Based Association and Multi-Sample Tests for General Multivariate Data.” In Advances in Mathematical and Statistical Modeling, edited by Barry C. Arnold, N. Balakrishnan, Jose-Maria Sarabia, and Roberto Minguez. Boston: Birkhäuser.
Google Scholar | Crossref
Delicado, Pedro . 2007. “Functional k-Sample Problem When Data Are Density Functions.” Computational Statistics 22:391-410.
Google Scholar | Crossref | ISI
Dijkstra, Will, Taris, Toon. 1995. “Measuring the Agreement Between Sequences.” Sociological Methods and Research 24:214-31.
Google Scholar | SAGE Journals | ISI
Elder, Glen H. 1999. Children of the Great Depression. Boulder, CO: Westview.
Google Scholar
Elzinga, Cees H. 2003. “Sequence Similarity: A Non-Aligning Technique.” Sociological Methods and Research 31:214-31.
Google Scholar
Elzinga, Cees H. 2007. “Sequence Analysis: Metric Representations of Categorical Time Series.” Unpublished manuscript, Department of Social Science Research Methods, Vrije Universiteit, Amsterdam, the Netherlands.
Google Scholar
Elzinga, Cees H. 2010. “Complexity of Categorical Time Series.” Sociological Methods and Research 38:463-81.
Google Scholar | SAGE Journals | ISI
Elzinga, Cees H., Liefbroer, Aart C. 2007. “De-Standardization of Family-Life Trajectories of Young Adults: A Cross-National Comparison Using Sequence Analysis.” European Journal of Population 23:225-50.
Google Scholar | Crossref | ISI
Gabadinho, Alexis, Ritschard, Gilbert, Müller, Nicolas S., Studer, Matthias. 2011. “Analyzing and Visualizing Sate Sequences in R with TraMineR.” Journal of Statistical Software 40(4):1-37.
Google Scholar | Crossref | ISI
Gabadinho, Alexis, Ritschard, Gilbert, Studer, Matthias, Müller, Nicolas S. 2009. “Mining Sequence Data in R With the TraMineR Package: A User’s Guide.” Technical report, Department of Econometrics and Laboratory of Demography, University of Geneva, Geneva, Switzerland.
Google Scholar
Gabadinho, Alexis, Ritschard, Gilbert, Studer, Matthias, Müller, Nicolas S. 2010. “Indice de Complexité pour le Tri et la Comparaison de Séquences Catégorielles.” Revue des Nouvelles Technologies de l’Information E-19:61-66.
Google Scholar
Gabadinho, Alexis, Ritschard, Gilbert, Studer, Matthias, Müller, Nicolas S. 2011. “Extracting and Rendering Representative Sequences.” Pp. 94-106 in Knowledge Discovery, Knowledge Engineering and Knowledge Management, edited by Fred, Ana, Dietz, Jan L. G., Liu, Kecheng, Filipe, Joaquim. Berlin, Germany: Springer-Verlag.
Google Scholar | Crossref
Gansner, Emden R., North, Stephen C. 1999. “An Open Graph Visualization System and Its Applications to Software Engineering.” Software—Practice and Experience 30:1203-33.
Google Scholar | Crossref | ISI
Geurts, Pierre, Wehenkel, Louis, Buc, Florence d’Alché. 2006. “Kernelizing the Output of Tree-Based Methods.” Pp. 345-52 in ICML, edited by Cohen, William W., Moore, Andrew. New York: Association for Computing Machinery.
Google Scholar | Crossref
Gower, John Clifford . 1966. “Some Distance Properties of Latent Root and Vector Methods Used in Multivariate Analysis.” Biometrika 53:325-38.
Google Scholar | Crossref | ISI
Gower, John Clifford . 1982. “Euclidean Distance Geometry.” Mathematical Scientist 7:1-14.
Google Scholar
Gower, John Clifford, Krzanowski, Wojtek J. 1999. “Analysis of Distance for Structured Multivariate Data and Extensions to Multivariate Analysis of Variance.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 48:505-19.
Google Scholar | Crossref | ISI
Jobson, J. D. 1991. Applied Multivariate Data Analysis, Volume I: Regression and Experimental Design. New York: Springer-Verlag.
Google Scholar | Crossref
Lesnard, Laurent . 2010. “Setting Cost in Optimal Matching to Uncover Contemporaneous Socio-Temporal Patterns.” Sociological Methods and Research 38:389-419.
Google Scholar | SAGE Journals | ISI
Lodhi, Huma, Saunders, Craig, Shawe-Taylor, John, Cristianini, Nello, Watkins, Chris. 2002. “Text Classification Using String Kernels.” Journal of Machine Learning Research 2:419-44.
Google Scholar | ISI
Manly, Bryan F. J. 2007. Randomization, Bootstrap and Monte Carlo Methods in Biology. 3rd ed. New York: Chapman & Hall.
Google Scholar
McArdle, Brian H., Anderson, Marti J. 2001. “Fitting Multivariate Models to Community Data: A Comment on Distance-Based Redundancy Analysis.” Ecology 82:290-97.
Google Scholar | Crossref | ISI
McVicar, Duncan, Anyadike-Danes, Michael. 2002. “Predicting Successful and Unsuccessful Transitions From School to Work Using Sequence Methods.” Journal of the Royal Statistical Society A 165:317-34.
Google Scholar | Crossref | ISI
Mielke, Paul W., Berry, Kenneth J. 1983. “Asymptotic Clarifications, Generalizations, and Concerns Regarding an Extended Class of Matched Pairs Tests Based on Powers of Ranks.” Psychometrika 48:483-85.
Google Scholar | Crossref | ISI
Mielke, Paul W., Berry, Kenneth J. 2007. Permutation Methods: A Distance Function Approach. 2nd ed. New York: Springer.
Google Scholar
Morgan, J. N., Sonquist, J. A. 1963. “Problems in the Analysis of Survey Data, and a Proposal.” Journal of the American Statistical Association 58:415-34.
Google Scholar | Crossref | ISI
Piccarreta, Raffaella . 2010. “Binary Trees for Dissimilarity Data.” Computational Statistics and Data Analysis 54:1516-24.
Google Scholar | Crossref | ISI
Piccarreta, Raffaella, Billari, Francesco Candeloro. 2007. “Clustering Work and Family Trajectories by Using a Divisive Algorithm.” Journal of the Royal Statistical Society A 170:1061-1078.
Google Scholar | Crossref | ISI
Pollock, Gary . 2007. “Holistic Trajectories: A Study of Combined Employment, Housing and Family Careers by Using Multiple-Sequence Analysis.” Journal of the Royal Statistical Society A 170:167-83.
Google Scholar | Crossref | ISI
R Development Core Team . 2008. “R: A Language and Environment for Statistical Computing.” Vienna, Austria: R Foundation for Statistical Computing.
Google Scholar
Reiss, Philip T. M., Stevens, Henry H., Shehzad, Zarrar, Petkova, Eva, Milham, Michael P. 2009. “On Distance-Based Permutation Tests for Between-Group Comparisons.” Biometrics 66:636-43.
Google Scholar | Crossref | Medline | ISI
Scherer, Stefani . 2001. “Early Career Patterns: A Comparison of Great Britain and West Germany.” European Sociological Review 17:119-44.
Google Scholar | Crossref | ISI
Shaw, Ruth G., Mitchell-Olds, Thomas. 1993. “ANOVA for Unbalanced Data: An Overview.” Ecology 74:1638-45.
Google Scholar | Crossref | ISI
Späth, Helmuth . 1975. Cluster Analyse Algorithmen. Munich, Germany: R. Oldenbourg Verlag.
Google Scholar
Studer, Matthias, Ritschard, Gilbert, Gabadinho, Alexis, Müller, Nicolas S. 2009. “Analyse de Dissimilarités par Arbre d’Induction.” Revue des Nouvelles Technologies de l’Information E-15:7-18.
Google Scholar
Studer, Matthias, Ritschard, Gilbert, Gabadinho, Alexis, Müller, Nicolas S. 2010. “Discrepancy Analysis of Complex Objects Using Dissimilarities.” Pp. 3-19 in Advances in Knowledge Discovery and Management, edited by Guillet, Fabrice, Ritschard, Gilbert, Zighed, Djamel A., Briand, Henri. Berlin, Germany: Springer.
Google Scholar | Crossref
Widmer, Eric, Ritschard, Gilbert. 2009. “The De-Standardization of the Life Course: Are Men and Women Equal?” Advances in Life Course Research 14:28-39.
Google Scholar | Crossref | ISI
Wu, Lawrence L. 2000. “Some Comments on ’Sequence Analysis and Optimal Matching Methods in Sociology: Review and Prospect.’” Sociological Methods Research 29:41-64.
Google Scholar | SAGE Journals | ISI
Yujian, Li, Bo, Liu. 2007. “A Normalized Levenshtein Distance Metric.” IEEE Transactions on Pattern Analysis and Machine Intelligence 29:1091-95.
Google Scholar | Crossref | Medline | ISI
Zapala, Matthew A., Schork, Nicholas J. 2006. “Multivariate Regression Analysis of Distance Matrices for Testing Associations Between Gene Expression Patterns and Related Variables.” Proceedings of the National Academy of Sciences of the United States of America 103:19430-35.
Google Scholar | Crossref
Access Options

My Account

Welcome
You do not have access to this content.



Chinese Institutions / 中国用户

Click the button below for the full-text content

请点击以下获取该全文

Institutional Access

does not have access to this content.

Purchase Content

24 hours online access to download content

Research off-campus without worrying about access issues. Find out about Lean Library here

Your Access Options


Purchase

SMR-article-ppv for $37.50
Single Issue 24 hour E-access for $422.00

Cookies Notification

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Find out more.
Top