Multivariate Functional Additive Mixed Models

Multivariate functional data can be intrinsically multivariate like movement trajectories in 2D or complementary like precipitation, temperature, and wind speeds over time at a given weather station. We propose a multivariate functional additive mixed model (multiFAMM) and show its application to both data situations using examples from sports science (movement trajectories of snooker players) and phonetic science (acoustic signals and articulation of consonants). The approach includes linear and nonlinear covariate effects and models the dependency structure between the dimensions of the responses using multivariate functional principal component analysis. Multivariate functional random intercepts capture both the auto-correlation within a given function and cross-correlations between the multivariate functional dimensions. They also allow us to model between-function correlations as induced by e.g.\ repeated measurements or crossed study designs. Modeling the dependency structure between the dimensions can generate additional insight into the properties of the multivariate functional process, improves the estimation of random effects, and yields corrected confidence bands for covariate effects. Extensive simulation studies indicate that a multivariate modeling approach is more parsimonious than fitting independent univariate models to the data while maintaining or improving model fit.


Introduction
With the technological advances seen in recent years, functional data sets are increasingly multivariate. They can be multivariate with respect to the domain of a function, its codomain, or both. Here, we focus on multivariate functions with a one-dimensional domain f = (f (1) , ..., f (D) ) : I ⊂ R → R D with square-integrable components f (d) ∈ L 2 (I), d = 1, ..., D. For this type of data, we can distinguish two subclasses: One has interpretable separate dimensions and can be seen as several complementary modes of a common phenomenon ("multimodal" data, cf. Uludag and Roebroeck;2014) as in the analysis of acoustic signals and articulation processes in speech production in one of our data examples. The codomain then simply is the Cartesian product S = S (1) × ... × S (D) of interpretable univariate codomains S (d) ⊂ R. The other subclass is more "intrinsically" multivariate insofar as univariate analyses would not yield meaningful results. Consider for example two-dimensional movement trajectories as in one of our motivating applications, where the function measures Cartesian coordinates over time: for fixed trajectories, rotation or translation of the essentially arbitrary coordinate system would change the results of univariate analyses. For intrinsically multivariate functional data a multivariate approach is the natural and preferred mode of analysis, yielding interpretable results on the observation level. Even for multimodal functional data, a joint analysis may generate additional insight by incorporating the covariance structure between the dimensions. This motivates the development of statistical methods for multivariate functional data. We here propose multivariate functional additive mixed models to model potentially sparsely observed functions with flexible covariate effects and crossed or nested study designs.
Multivariate functional data have been the interest in different statistical fields such as clustering (Jacques and Preda;2014;Park and Ahn;, functional principal component analysis (Chiou et al.;2014;Happ and Greven;Backenroth et al.;2020), and registration (Carroll et al.;2021;Steyer et al.;2021). There is also ample literature on multivariate functional data regression such as graphical models , reduced rank regression (Liu et al.;2020), or varying coefficient models 2012;. Yet, so far, there are only few approaches that can handle multilevel regression when the functional response is multivariate. In particular, Goldsmith and Kitago (2016) propose a hierarchical Bayesian multivariate functional regression model that can include subject level and residual random effect functions to account for correlation between and within functions. They work with bivariate functional data observed on a regular and dense grid and assume a priori independence between the different dimensions of the subject-specific random effects. Thus, they model the correlation between the dimensions only in the residual function. As our approach explicitely models the dependencies between dimensions for multiple functional random effects and also handles data observed on sparse and irregular grids on more than two dimensions, the model proposed by Goldsmith and Kitago (2016) can be seen as a special case of our more general model class.
Alternatively, Zhu et al. (2017) use a two stage transformation with basis functions for the multivariate functional mixed model. This allows the estimation of scalar regression models for the resulting basis coefficients that are argued to be approximately independent. The proposed model is part of the so called functional mixed model (FMM) framework (Morris;. While FMMs use basis transformations of functional responses (observed on equal grids) at the start of the analysis, we propose a multivariate model in the functional additive mixed model (FAMM) framework, which uses basis representations of all (effect) functions in the model 2015). The differences between these two functional regression frameworks have been extensively discussed before Morris;.
The main advantages of our multivariate regression model, also compared to Goldsmith and Kitago (2016) and Zhu et al. (2017), are that it is readily available for sparse and irregular functional data and that it allows to include multiple nested or crossed random processes, both of which are required in our data examples. Another important contribution is that our approach directly models the multivariate covariance structure of all random effects included in the model using multivariate functional principal components (FPCs) and thus implicitly models the covariances between the dimensions. This makes the model representation more parsimonious, avoids assumptions difficult to verify, and allows further interpretation of the random effect processes, such as their relative importance and their dominating modes. As part of the FAMM framework, our model provides a vast toolkit of modeling options for covariate and random effects, of estimation and inference (Wood;. The proposed multivariate functional additive mixed model (multiFAMM) extends the FAMM framework combining ideas from multilevel modeling  and multivariate functional data (Happ and Greven; to account for sparse and irregular functional data and different study designs. We illustrate the multiFAMM on two motivating examples. The first (intrinsically multivariate) data stem from a study on the effect of a training programme for snooker players with a nested study design (shots within sessions within players) (Enghofer;2014). The movement trajectories of a player's elbow, hand, and shoulder during a snooker shot are recorded on camera, yielding six-dimensional multivariate functional data (see Figure 1). In the second data example, we analyse multimodal data from a speech production study with a crossed study design (speakers crossed with words)  on so-called "assimilation" of consonants. The two measured modes (acoustic and articulatory, see Figure 3) are expected to be closely related but joint analyses have not yet incorporated the functional nature of the data.
These two examples motivate the development of a regression model for sparse and irregularly sampled multivariate functional data that can incorporate crossed or nested functional random effects as required by the study design in addition to flexible covariate effects. The proposed approach is implemented in R (R Core Team; 2020) in package multifamm (Volkmann;2021). The paper is structured as follows: Section 2 specifies the multiFAMM and section 3 its estimation process. Section 4 presents the application of the multiFAMM to the data examples and section 5 shows the estimation performance of our proposed approach in simulations. Section 6 closes with a discussion and outlook. ) and the evaluation is subject to white noise, i.e. y it = y * it + it . The residual term it reflects additional uncorrelated white noise measurement error, following a D-dimensional multivariate normal distribution N D with zero-mean and diagonal covariance matrixΣ = diag(σ 2 1 , ..., σ 2 D ) with dimension specific variances σ 2 d . We construct a multivariate functional mixed model as where U j (t) = (U j1 (t), ..., U jV U j (t)); j = 1, ..., q, ind.c.
We assume an additive predictor µ(x i , ·) = p l=1 f l (x i , ·) of fixed effects, which consists of partial predictors f l (x i , ·) = (f (1) l (x i , ·), ..., f (D) l (x i , ·)) ∈ L 2 D (I), l = 1, ..., p, that are multivariate functions depending on a subset of the vector of scalar covariates x i . This allows to include linear or smooth covariate effects as well as interaction effects between multiple covariates as in the univariate FAMM 2015). Partial predictors may also depend on dimension specific subsets of covariates.
For random effects U , we focus on model scenarios with q independent multivariate functional random intercepts for crossed and/or nested designs. For group level v = 1, . . . , V Uj within grouping layer j = 1, . . . , q, these take the value U jv ∈ L 2 D (I). For each layer, the U j1 , ..., U jV U j present independent copies of a multivariate smooth zero-mean Gaussian random process. Analogously to scalar linear mixed models, the U jv model correlations between different response functions y * i within the same group as well as variation across groups. By arranging them in a (D × V Uj ) matrix U j (t) per t, the jth random intercept can be expressed in the common mixed model notation in (1) using appropriate group indicators z ij = (z ij1 , . . . , z ijV U j ) for the respective design.
Although technically a curve-specific functional random intercept, we distinguish the smooth residuals E i ∈ L 2 D (I) in the notation, as they model correlation within rather than between response functions. We write E v ∈ L 2 D (I), v = 1, ..., V E with V E = N . The E i capture smooth deviations from the group-specific mean µ(x i , ·) + q j=1 U j (·)z ij . For a more compact representation, we can arrange all U j (t) and E i (t) together in a (D × ( q j=1 V Uj + N )) matrix U (t) per t, and the group indicators for all layers in a corresponding vector z i = (z i1 , . . . , z iq , e i ) with e i the i-th unit vector. The resulting model term U (t)z i then comprises all smooth random functions, accounting for all correlation between/within response functions y * i given the covariates x i as required by the respective experimental design. E i and U jv are independent copies (ind. c.) of random processes having multivariate D×D covariance kernels K E , K Uj , with univariate covariance surfaces K     jv (t ) reflecting the covariance between the process dimensions d and e at t and t . We call these auto-covariances for d = e and cross-covariance otherwise. The multivariate Gaussian processes are uniquely defined by their multivariate mean function, here the null function 0, and the multivariate covariance kernels K g and we write M GP (0, K g ) , g ∈ {U 1 , . . . , U q , E}. Note that vectorizing the matrix U (t) allows to formulate the joint distribution assumption vec(U (t)) ∼ M GP (0, K U ) with K U (t, t ) having a blockdiagonal structure repeating each K Uj (t, t ) for V Uj times and K E (t, t ) for N times.
We assume that the different sources of variation U j (t), j = 1, ..., q, E i (t), and it are mutually uncorrelated random processes to assure model identification. Assuming smoothness of the covariance kernel K E further guarantees that the residual process E i (t) can be separated from the white noise it , removing the error variance from the diagonal of the smooth covariance kernel (e.g., Yao et al.;.

FPC Representation of the Random Effects
Model (1) specifies a univariate functional linear mixed model (FLMM) as given in Cederbaum et al. (2016) for each dimension d. The main difference lies in the multivariate random processes that introduce dependencies between the dimensions. In order to avoid restrictive assumptions about the structure of these multivariate covariance operators, which would typically be very difficult to elicit a priori or verify ex post, we estimate them directly from the data. The main difficulty then becomes computationally efficient estimation, which is already costly in the univariate case. Especially for higher dimensional multivariate functional data, accounting for the cross-covariances can become a complex task, which we tackle with multivariate functional principal component analysis (MFPCA).
Given the covariance operators (see Section 3), we represent the multivariate random effects in model (1) using truncated multivariate Karhunen-Loève (KL) expansions where the orthonormal multivariate eigenfunctions ψ gm = (ψ gm ) ∈ L 2 D (I), m = 1, ..., M g , g ∈ {U 1 , ..., U q , E} of the corresponding covariance operators with truncation order M g are used as basis functions and the random scores ρ gvm ∼ N (0, ν gm ) are independent and identically distributed (i.i.d.) with ordered eigenvalues ν gm of the corresponding covariance operator. Note that the assumption of Gaussianity for the random processes can be relaxed. For non-Gaussian random processes, the KL expansion still gives uncorrelated (but non-normal) scores and estimation based on a penalized least squares (PLS) criterion (see Section 3.2) remains reasonable.
Using KL expansions gives a parsimonious representation of the multivariate random processes that is an optimal approximation with respect to the integrated squared error (cf. Ramsay and Silverman;, as well as interpretable basis functions capturing the most prominent modes of variation of the respective process. The distinct feature of this approach is that the multivariate FPCs directly account for the dependency structure of each random process across the dimensions. If, by contrast, e.g. splines were used in the basis representation of the random effects, it would be necessary to explicitly model the cross-covariances of each random process in the model (cf. 2020). Multivariate eigenfunctions, however, are designed to incorporate the dependency structure between dimensions and allow the assumption of independent (univariate) basis coefficients ρ gvm via the KL theorem (see e.g. Happ and Greven;. This leads to a parsimonious multivariate basis for the random effects, where a typically small vector of scalar scores ρ gvm common to all dimensions represents nearly the entire information about these D-dimensional processes.

Estimation
We use a two-step approach to estimate the multiFAMM and the respective multivariate covariance operators. In a first step (section 3.1), the D-dimensional eigenfunctions ψ gm (t) with their corresponding eigenvalues ν gm are estimated from their univariate counterparts following Cederbaum et al. (2018) and Happ and Greven (2018). These estimates are then plugged into (2) and we represent the multiFAMM as part of the general FAMM framework (section 3.2) by suitable re-arrangement. We can view the estimated ψ gm (t) simply as an empirically derived basis that parsimoniously represents the patterns in the observed data. While their estimation adds uncertainty, we are not interested in inferential statements for the variance modes and our simulations (see Section 5) suggest that the estimated eigenfunctions are reasonable approximations that work well as a basis.

Step 1: Estimation of the Eigenfunction Basis
Step 1 (i): Univariate Mean Estimation In a first step, we obtain preliminary estimates of the dimension specific means l (x il , t) using univariate FAMMs. We model independently for all d with i.i.d. Gaussian random variables (d) it . The estimation of µ (d) (x i , t) proceeds analogously to the estimation of the multiFAMM described in section 3.2. It is based on the evaluation points of the y * (d) i (t), whose locations on the interval I can vary across dimensions. Model (3) thus accommodates sparse and irregular multivariate functional data and implies a working independence assumption across scalar observations within and across functions.
Step 1 (ii): Univariate Covariance Estimation This preliminary mean function is used to centre the dataỹ Cederbaum et al. (2016) already find that for this purpose, the working independence assumption within functions across evaluation points in (3) gives reasonable results. The expectation of the crossproducts of the centred functions then coincides with the auto-covariance, i.e. E ỹ i t . For the independent random components specified in model (1), this overall covariance decomposes additively into contributions from each random process as using indicators δ xx that equal one for x = x and zero otherwise. The indicator δ vj v j thus identifies if the curves in the crossproduct belong to the same group v j of the jth layer. Using t, t , and the indicators δ vj v j , δ tt , δ ii as covariates and the crossproducts of the centred data as responses, we can estimate the auto-covariances K of the random processes using symmetric additive covariance smoothing . This extends the univariate approach proposed by Cederbaum et al. (2016). In particular, we also allow a nested random effects structure as required for the snooker training application in section 4.1 by specifying the indicator of the nested effect as the product of subject and session indicators. Note that estimating (4) also yields estimates of the dimension specific error variances σ 2 d as a byproduct.
Step 1 (iii): Univariate Eigenfunction Estimation Based on the covariance kernel estimates, we apply separate univariate functional principal component analyses (FPCAs) for each random process by conducting an eigendecomposition of the respective linear integral operator. Practically, each estimated process-and dimension-specific auto-covariance is re-evaluated on a dense grid so that a univariate FPCA can be conducted. Alternatively, Reiss and Xu (2020) provide an explicit spline representation of the estimated eigenfunctions. Eigenfunctions with non-positive eigenvalues are removed to ensure positive definiteness, and further regularization by truncation based on the proportion of variance explained is possible (see e.g. Di et al.;Peng and Paul;. However, we suggest to keep all univariate FPCs with positive eigenvalues for the computation of the MFPCA in order to preserve all important modes of variation and cross-correlation in the data. Step 1 (iv): Multivariate Eigenfunction Estimation The estimated univariate eigenfunctions and scores are then used to conduct an MFPCA for each of the g multivariate random processes separately. The MFPCA exploits correlations between univariate FPC scores across dimensions to reduce the number of basis functions needed to sufficiently represent the random processes. We base the MFPCA on the following definition of a (weighted) scalar product for the response space with positive weights w d , d = 1, ..., D and the induced norm denoted by ||| · |||. The corresponding covariance operators Γ g : L 2 D (I) → L 2 D (I) of the multivariate random processes U jv and E v are then given by (Γ g f )(t) = f , K g (t, ·) , g ∈ {U 1 , ..., U q , E}. The standard choice of weights in our applications is w 1 = ... = w D = 1 (unweighted scalar product) but other choices are possible. Consider for example a scenario where dimensions are observed with different amounts of measurement error. If variation in dimensions with a large proportion of measurement error is to be downweighted, we propose to use w d = 1 σ 2 d with the dimension specific measurement error variance estimatesσ 2 d obtained from (4). Happ and Greven (2018) show that estimates of the multivariate eigenvalues ν gm of Γ g can be obtained from an eigenanalysis of a covariance matrix of the univariate random scores. The corresponding multivariate eigenfunctions ψ gm can be obtained as linear combinations of the univariate eigenfunctions with the weights given by the resulting eigenvectors. The estimateŝ ψ gm are then substituted for the basis functions of the truncated multivariate KL expansions of the random effects U jv and E v in (2). Note that for each random process g, the maximum number of FPCs is given by the total number of univariate eigenfunctions included in the estimation process of the MFPCA of g. To achieve further regularization and analogously to Cederbaum et al. (2016), we propose to choose truncation orders M g for each KL expansion of the multivariate random processes using a prespecified proportion of explained variation.
Step 1 (v): Multivariate Truncation Order We offer two different approaches for the choice of truncation orders M g based on different variance decompositions (derivation in Appendix A): and I Var y with |I| the length of the interval I (here equal to one) and || · || the L 2 norm. Multivariate variance decomposition (6) uses the (weighted) sum of total variation in the data across dimensions. We select the FPCs with highest associated eigenvalues ν gm over all random processes g until their sum reaches a prespecified proportion (e.g. 0.95) of the total variation, thus approximating the infinite sums in (6) with M g summands. For the approach based on the univariate variance (7), we require M g to be the smallest truncation order for which at least a prespecified proportion of variance is explained on every dimension d. This second choice of M g might be preferable in situations where the variation is considerably different (in amount or structure) across dimensions, whereas the first approach gives a more parsimonious representation of the random effects. Note that both approaches can lead to a simplification of the multiFAMM if M g = 0 is chosen for some g. The simulation results of section 5 suggest that increasing the number of FPCs improves model accuracy which is why sensitivity analyses with regard to the truncation order are recommended.

3.2
Step 2: Estimation of the multiFAMM In the following, we discuss estimating the multiFAMM given the estimated multivariate FPCs.
We base the proposed model on the general FAMM framework of Scheipl et al. (2015), which models functional responses using basis representations. To make the extension of the FAMM framework to multivariate functional data more apparent, the multivariate response vectors and the respective model matrices are stacked over dimensions, so that every block has the structure of a univariate FAMM over all observations i. This gives concatenated basis functions with discontinuities between the dimensions. The fixed effects are modelled analogously to the univariate case by interacting all covariate effects with a dimension indicator. The random effects are based on the parsimonious, concatenated multivariate FPC basis.

Matrix Representation
For notational simplicity we assume that the functions are evaluated on a fixed grid of time .., t T ) over all N individuals and D dimensions. However, our framework allows for sparse functional data using different grids per dimension and per observed function as in the two applications (Section 4). Correspondingly, y = (y (1) , ..., y (D) ) is the DN T -vector of stacked evaluation points with y (d) = (y 1T , ..., N T ) the vector of residuals. We have ∼ N (0, Σ) with Σ = diag(σ 2 1 , ..., σ 2 D ) ⊗ I N T , the Kronecker product denoted by ⊗, and the (N T × N T ) identity matrix I N T .
We estimate θ and ρ by minimizing the PLS criterion using appropriate penalty matrices P l (λ xl , λ tl ) and P g for the fixed effects and random effects, respectively, and smoothing parameters λ xl = λ , and λ g .
The model and penalty matrices as well as the parameter vectors of (8) and (9) are discussed in detail below.

Modeling of Fixed Effects
The block diagonal matrix Φ = diag Φ (1) , ..., Φ (D) models the fixed effects separately on each dimension as in a FAMM 2015). The (DN T × b) matrix Φ consists of the design p ) that are constructed for the partial predictors f (d) l (x, t (d) ), l = 1, ..., p, which correspond to the N T -vectors of evaluations of the effect functions f (d) l . The vectors of scalar covariates x i are repeated T times to form the matrix of covariate information x = (x 1 , ..., x 1 , ..., x N ) . We use the basis representations where A B denotes the row tensor product (A ⊗ 1 b ) · (1 a ⊗ B) of the (h × a) matrix A and the (h × b) matrix B with element-wise multiplication · and 1 c the c-vector of ones. This modeling approach combines the ( tl . These matrices contain the evaluations of suitable marginal bases in x and t (d) , respectively. For a linear effect, for example, the basis matrix Φ l (x, t (d) ), the basis matrix Φ  Scheipl et al. (2015). The tensor product basis is weighted by the b Choosing the number of basis functions is a well known challenge in the estimation of nonlinear or functional effects. We introduce regularization by a corresponding quadratic penalty term in (9). Let θ l contain the coefficients corresponding to the partial predictor l and order it by dimensions. The penalty P l (λ xl , λ tl ) is then constructed from the penalty on the marginal basis for the covariate effect, P tl . Specifically, P l (λ xl , λ tl ) is a block-diagonal matrix with blocks for each d corresponding to the Kronecker sums of the marginal penalty matrices λ tl (Wood;. A standard choice for these marginal penalty matrices given a B-splines basis representation are second or third order difference penalties, thus approximately penalizing squared second or third derivatives of the respective functions (Eilers and Marx;1996). For unpenalized effects such as a linear effect of a scalar covariate, the corresponding P (d) xl is simply a matrix of zeroes.

Modeling of Random Effects
We represent the DN T -vectors U j (t) = (U j (t (1) ) , ..., U j (t (D) ) ) , E(t) = (E(t (1) ) , ..., E(t (D) ) ) with U j (t (d) ), E(t (d) ) containing the evaluations of the univariate random effects for the corresponding groups and time points using the basis approximations The vth column in the (DN T × V g ), g ∈ {U 1 , ..., U q , E} indicator matrix δ g indicates whether a given row is from the vth group of the corresponding grouping layer. Thus, the rows of the indicator matrix δ g contain repetitions of the group indicators z ij and e i in model (1). For the smooth residual, δ E simplifies to ) comprises the evaluations of the M g multivariate eigenfunctions ψ  The M g V g vector ρ g = (ρ g1 , ..., ρ gVg ) with ρ gv = (ρ gv1 , ..., ρ gvMg ) stacks all the unknown random scores for the functional random effect g. The (DN T × g M g V g ) model matrix Ψ = (Ψ U1 |...|Ψ Uq |Ψ E ) then combines all random effect design matrices. Stacking the vectors of random scores in a g M g V g vector ρ = (ρ U1 , ..., ρ Uq , ρ E ) lets us represent all functional random intercepts in the model via Ψρ.
For a given functional random effect, the penalty takes the form ρ g P g ρ g = ρ g (I Vg ⊗P g )ρ g , where I Vg corresponds to the assumed independence between the V g different groups. The diagonal matrixP g = diag(ν g1 , ..., ν gMg ) −1 contains the (estimated) eigenvalues ν gm of the associated multivariate FPCs. This quadratic penalty is mathematically equivalent to a normal distribution assumption on the scores ρ gv with mean zero and covariance matrixP −1 g , as implied by the KL theorem for Gaussian random processes. Note that the smoothing parameter λ g allows for additional scaling of the covariance of the corresponding random process.

Estimation
We estimate the unknown smoothing parameters in λ xl , λ tl , and λ g using fast restricted maximum likelihood (REML)-estimation (Wood;. The standard identifiability constraints of FAMMs are used 2015). In particular, in addition to the constraints for the fixed effects, the multivariate random intercepts are subject to a sum-to-zero constraint over all evaluation points as given by e.g. .
We propose a weighted regression approach to handle the heteroscedasticity assumption contained in Σ. We weigh each observation proportionally to the inverse of the estimated univariate measurement error variancesσ 2 d from the estimation of the univariate covariances (4). Alternatively, updated measurement error variances can be obtained from fitting separate univariate FAMMs on the dimensions using the univariate components of the multivariate FPCs basis. In practice, we found that the less computationally intensive former option gives reasonable results. As our proposed model is part of the FAMM framework, inference for the multiFAMM is readily available based on inference for scalar additive mixed models (Wood;. Note, however, that all inferential statements do not incorporate uncertainty due to the estimated multivariate eigenfunction bases, nor in the chosen smoothing parameters. The estimation process readily provides i.a. standard errors for the construction of point-wise univariate confidence bands (CBs).

Implementation
We provide an implementation of the estimation of the proposed multiFAMM in the multifamm R-package (Volkmann;2021). It is possible to include up to two functional random intercepts in U (t), which can have a nested or crossed structure, in addition to the curve-specific random intercept E i (t). While including e.g. functional covariates is conceptually straightforward (see Scheipl et al.;2015), our implementation is restricted to scalar covariates and interactions thereof. We provide different alternatives for specifying the multivariate scalar product, the multivariate cut-off criterion, and the covariance matrix of the white noise error term. Note that the estimated univariate error variances have been proposed as weights for two separate and independent modeling decisions: as weights in the scalar product of the MFPCA and as regression weights under heteroscedasticity across dimensions.

Applications
We illustrate the proposed multiFAMM for two different data applications corresponding to intrinsically multivariate and multimodal fuctional data. The presentation focuses on the first application with a detailed description of the multimodal data application in Appendix C. We provide the data and the code to produce all analyses in the github repository multifammPaper. Right: Two-dimensional trajectories of the snooker training data set (grey curves, right). For both groups of skilled and unskilled participants, three randomly selected observations are highlighted. Every colour corresponds to one multivariate observation, i.e. one observation consists of three trajectories: elbow (top), shoulder (right), hand (bottom). The start of the exemplary trajectories are marked with a black asterisk with the hand trajectory centred at the origin.

Data Set and Preprocessing
In a study by Enghofer (2014), 25 recreational snooker players split into two groups, one of which had instructions to follow a self-administered training schedule over the next six weeks consisting of exercises aimed at improving snooker specific muscular coordination. The second was a control group. Before and after the training period, both groups were recorded on high speed digital camera under similar conditions to investigate the effects of the training on their snooker shot of maximal force. In each of the two recording sessions, six successful shots per participant were videotaped. The recordings were then used to manually locate points of interest (a participant's shoulder, elbow, and hand) and track them on a two-dimensional grid over the course of the video. This yields a six dimensional functional observation per snooker shot y * = (y * (elbow.x) , ..., y * (shoulder.y) ) : I = [0, 1] → R 6 , i.e. a two-dimensional movement trajectory for each point of interest (see Figure 1).
In their starting position (hand centred at the origin), the snooker players are positioned centrally in front of the snooker table aiming at the cue ball. From their starting position, the players draw back the cue, then accelerate it forwards and hit the cue ball shortly after their hands enter the positive range of the horizontal x axis. After the impulse onto the cue ball, the hand movement continues until it is stopped at a player's chest. Enghofer (2014) identify two underlying techniques that a player can apply: dynamic and fixed elbow. With a dynamic elbow, the cue can be moved in an almost straight line (piston stroke) whereas additionally fixing the elbow results in a pendular motion (pendulum stroke). In both cases, the shoulder serves as a fixed point and should be positioned close to the snooker table.
We adjust the data for differences in body height and relative speed (Steyer et al.;2021) and apply a coarsening method to reduce the number of redundant data points, thereby lowering computational demands of the analysis. Appendix B provides a detailed description of the data preprocessing. As some recordings and evaluations of bivariate trajectories are missing, the final data set contains 295 functional observations with a total of 56,910 evaluation points. These multivariate functional data are irregular and sparse, with a median of 30 evaluation points per functional observation (minimum 8, maximum 80) for each of the six dimensions.

Model Specification
We estimate the following model with i = 1, ..., 25 the index for the snooker player, j = 1, 2 the index for the session, h = 1, ..., H ij the index for the typically six snooker shot repetitions in a session, and t ∈ [0, 1] relative time. Correspondingly, B i (t) is a subject-specific random intercept, C ij (t) is a nested subject-andsession-specific random intercept, and E ijh (t) is the shot-specific random intercept (smooth residual). The nested random effect C ij (t) is supposed to capture the variation within players between sessions (e.g. differences due to players having a good or bad day). Different positioning of participants with respect to the recording equipment or the snooker table as well as shot to shot variation are captured by the smooth residual E ijh (t). The white noise measurement error ijht is assumed to follow a zero-mean multivariate normal distribution with covariance matrix σ 2 I 6 , as all six dimensions are measured with the same set-up. The additive predictor is defined as The dummy covariates skill i and group i indicate whether player i is an advanced snooker player and belongs to the treatment group (i.e. receives the training programme), respectively. Note that the snooker players self-select into training and control group to improve compliance with the training programme, which is why we include a group effect in the model. The dummy covariate session j indicates whether the shot j is recorded after the training period. The effect function f 4 (t) can thus be interpreted as the treatment effect of the training programme. Cubic P-splines with first order difference penalty, penalizing deviations from constant functions over time, with 8 basis functions are used for all effect functions in the preliminary mean estimation as well as in the final multiFAMM. For the estimation of the auto-covariances of the random processes, we use cubic P-splines with first order difference penalty on five marginal basis functions. We use an unweighted scalar product (5) for the MFPCA to give equal weight to all spatial dimensions, as we can assume that the measurement error mechanism is similar across dimensions. Additionally, we find that hand, elbow, and shoulder contribute roughly the same amount of variation to the data, cf. Table 1 in Appendix B.3, where we also discuss potential weighting schemes for the MFPCA. The multivariate truncation order is chosen such that 95% of the (unweighted) sum of variation (6) is explained.

Results
The MFPCA gives sets of five (for C and E) and six (for B) multivariate FPCs that explain 95% of the total variation. The estimated eigenvalues allow to quantify their relative importance. Approximately 41% of the total variation (conditional on covariates) can be attributed to the nested subject-and-session-specific random intercept C ij (t), 33% to the subject-specific random intercept B i (t), 14% to the shot-specific E ijh (t), and 7% to white noise. This suggests that day to day variation within a snooker player is larger than the variation between snooker players. Note that these proportions are based on estimation step 1 (see Section 3.1).
The left plot of Figure 2 displays the first FPC for C, which explains about 28% of total variation. A suitable multiple of the FPCs is added (+) to and subtracted (−) from the overall mean function (black solid line, all covariate values set to 0.5). We find that the dominant mode of the random subject-and-session specific effect influences the relative positioning of a player's elbow, shoulder, and hand, thus suggesting a strong dependence between the dimensions. Enghofer (2014) argue from a theoretical viewpoint that the ideal starting position should place elbow and hand in a line perpendicular to the plane of the snooker table (corresponding to the x axis). The most prominent mode of variation captures deviations from this ideal starting position found in the overall mean. The next most important FPC ψ B1 of the subject-specific random effect, which explains about 15% of total variation, represents a subject's tendency towards the piston or pendulum stroke (see Appendix Figure 8). This additional insight into the underlying structure of the variance components might be helpful for e.g. developing personalized training programmes. The central plot on the right of Figure 2 compares the estimated mean movement trajectory for advanced snooker players (red) to that in the reference group (blue). It suggests that more experienced players tend towards the dynamic elbow technique, generating a hand trajectory resembling a straight line (piston stroke). Uncertainties in the trajectory could be represented by pointwise ellipses, but inference is more straightforward to obtain from the univariate effect functions. The marginal plots display the estimated univariate effects with pointwise 95% confidence intervals. Even though we find only little statistical evidence for increased movement of the elbow (horizontal-left and vertical-top marginal panels), the hand and shoulder movements (horizontal centre and right, vertical centre and bottom) strongly suggest that the skill level indeed influences the mean movement trajectory of a snooker player. Further results indicate that the mean hand trajectories might slightly differ between treatment and control group at baseline as well as between sessions (f 2 (t) and f 3 (t), see Appendix Figure 12). The estimated treatment effect f 4 (t) (Appendix Figure 11), however, suggests that the training programme did not change the participants' mean movement trajectories substantially. Appendix B.3 contains a Figure 3: Index curves of the consonant assimilation data set for both ACO and EPG data as a function of standardized time t (grey curves). For every consonant order, three randomly selected observations have been highlighted. Every colour corresponds to one multivariate observation, i.e. one observation consists of two index curves.
detailed discussion of all model terms as well as some model diagnostics and sensitivity analyses.

Data Set and Model Specification
Pouplier and Hoole (2016) study the assimilation of the German /s/ and /sh/ sounds such as the final consonant sounds in "Kürbis" (English example: "haggis") and "Gemisch" (English example: "dish"), respectively. The research question is how these sounds assimilate in fluent speech when combined across words such as in "Kürbis-Schale" or "Gemisch-Salbe", denoted as /s#sh/ and /sh#s/ with # the word boundary. The 9 native German speakers in the study repeated a set of 16 selected word combinations five times. Two different types of functional data, i.e. acoustic (ACO) and electropalatographic (EPG) data, were recorded for each repetition to capture the acoustic (produced sound) and articulatory (tongue movements) aspects of assimilation over (relative) time t within the consonant combination.
Each functional index varies roughly between +1 and −1 and measures how similar the articulatory or acoustic pattern is to its reference patterns for the first (+1) and second (−1) consonant at every observed time point . Without assimilation, the data are thus expected to shift from positive to negative values in a sinus-like form (see Figure 3). The data set contains 707 bivariate functional observations with differently spaced grids of evaluation points per curve and dimension, with the number of evaluation points ranging from 22 to 59 with a median of 35. Note that the consonant assimilation data are unaligned as registration of the time domain would mask transition speeds between the consonants, which are an interesting part of assimilation.
For comparability, we follow the model specification of Cederbaum et al. (2016), who analyse only the ACO dimension and ignore the second mode EPG. Our specified multivariate model is similar to (10) with i = 1, ..., 9 the speaker index, j = 1, ..., 16 the word combination in-dex, h = 1, ..., H ij the repetition index and t ∈ [0, 1] relative time. Note that the nested effect C ij (t) is replaced by the crossed random effect C j (t) specific to the word combinations. The additive predictor µ(x j , t) now contains eight partial effects: the functional intercept plus main and interaction effects of scalar covariates describing characteristics of the word combination such as the order of the consonants /s/ and /sh/. The white noise measurement error ijht is assumed to follow a zero-mean bivariate normal distribution with diagonal covariance matrix diag(σ 2 ACO , σ 2 EPG ). The basis and penalty specifications follow the univariate analysis in Cederbaum et al. (2016). Given different sampling mechanisms, we also compare the multiFAMM based on weighted and unweighted scalar products for the MFPCA.

Results
The multivariate analysis supports the findings of Cederbaum et al. (2016) that assimilation is asymmetric (different mean patterns for /s#sh/ and /sh#s/). Overall, the estimated fixed effects are similar across dimensions as well as comparable to the univariate analysis. Hence, the multivariate analysis indicates that previous results for the acoustics are consistently found also for the articulation. Compared to univariate analyses, our approach reduces the number of FPC basis functions and thus the number of parameters in the analysis. The multiFAMM can improve the model fit and can provide smaller CBs for the ACO dimension compared to the univariate model in Cederbaum et al. (2016) due to the strong cross-correlation between the dimensions. We find similar modes of variation for the multivariate and the univariate analysis as well as across dimensions. In particular, the word combination-specific random effect C j (t) is dropped from the model as much of the between-word variation is already explained by the included fixed effects. The definition of the scalar product has little effect on the estimated fixed effects but changes the interpretation of the FPCs. Appendix C contains a more in depth description of this application.

Simulation Set-Up
We conduct an extensive simulation study to investigate the performance of the multiFAMM depending on different model specifications and data settings (over 20 scenarios total), and to compare it to univariate regression models as proposed by Cederbaum et al. (2016), estimated on each dimension independently. Given the broad scope of analysed model scenarios, we refer the interested reader to Appendix D for a detailed report and restrict the presentation here to the main results.
We mimic our two presented data examples (Section 4) and simulate new data based on the respective multiFAMM-fit. Each scenario consists of model fits to 500 generated data sets, where we randomly draw the number and location of the evaluation points, the random scores, and the measurement errors according to different data settings. The accuracy of the estimated model components is measured by the root relative mean squared error (rrMSE) based on the unweighted multivariate norm but otherwise as defined by Cederbaum et al. (2016), see Appendix D.1. The rrMSE takes on (unbounded) positive values with smaller values indicating a better fit.

Simulation Results
Figure 4 compares the rrMSE values over selected modeling scenarios based on the consonant assimilation data. We generate a benchmark scenario (dark blue), which imitates the original data without misspecification of any model component. In particular, the number of FPCs is fixed to avoid truncation effects. Comparing this scenario to the other blue scenarios illustrates the importance of the number of FPCs in the accuracy of the estimation. Choosing the truncation order via the proportion of univariate variance explained (Cut-Off Uni) as in (7) gives models with roughly the same number of FPCs (mean B : 2.8, E : 5) as is used for the data generation (B : 3, E : 5). The cut-off criterion based on the multivariate variance (Cut-Off Mul) given by (6) results in more parsimonious models (mean B : 2.15, E : 4) and thus considerably higher rrMSE values. The increased variation in the rrMSE values can also be attributed to variability in the truncation orders (cf. Appendix Figure 23), leading to a mixture distribution. Comparing the benchmark scenario to more sparsely observed functional data (ceteris paribus) suggests a lower estimation accuracy for the Sparse Data scenario, especially for the curve-specific random effect E ijh (t) and resultingly the fitted curves y ijh (t), but pooling the information across functions helps the estimation of µ(x ij , t) and B i (t). In particular, the estimation of the mean µ(x ij , t) is quite robust against the increased uncertainty of these three scenarios. Only when the random scores are not centred and decorrelated as in the benchmark scenario do we find an increase in rrMSE values for the mean (Uncentred Scores). This corresponds to a departure from the modeling assumptions likely to occur in practice when only few levels of a random effect are available (here for the subject-specific B i (t)). The model then has difficulties to correctly separate the intercept in µ(x ij , t) and the random effects B i (t). The empirical (non-zero) mean of the B i (t) is then absorbed by the intercept in µ(x ij , t), resulting in higher rrMSE values for both of these model terms. However, this shift does not affect the overall fit to the data y ijh (t) nor the estimation of the other fixed effects (cf. Appendix Figure 31). Note that the rrMSE values of the Sparse Data and Uncentred Scores scenarios are based on slightly different normalizing constants (i.e. different true data) and cannot be directly compared except for the mean.
Our simulation study thus suggests that basing the truncation orders on the proportion of explained variation on each dimension (7) gives parsimonious and well fitting models. If interest lies mainly in the estimation of fixed effects, the alternative cut-off criterion based on the total variation in the data (6) allows even more parsimonious models at the cost of a less accurate estimation of the random effects and overall model fit. Furthermore, the results presented in Appendix D show that the mean estimation is relatively stable over different model scenarios including misspecification of the measurement error variance structure or of the multivariate scalar product, as well as in scenarios with strong heteroscedasticity across dimensions. In our benchmark scenario, the CBs cover the true effect 89 − 94% of the time but coverage can further decrease with additional uncertainty e.g. about the number of FPCs. Overall, the covariance structure such as the leading FPCs can be recovered well, also for a nested random effect such as in the snooker training application. The comparison to the univariate modeling approach suggests that the multiFAMM can improve the mean estimation but is especially beneficial for the prediction of the random effects while reducing the number of parameters to estimate. In some cases like strong heteroscedasticity, including weights in the multivariate scalar product might further improve the modeling.

Discussion
The proposed multivariate functional regression model is an additive mixed model, which allows to model flexible covariate effects for sparse or irregular multivariate functional data. It uses FPC Figure 4: rrMSE values of the fitted curves y ijh (t), the mean µ(x ij , t), and the random effects B i (t) and E ijh (t) for different modeling scenarios. Scenarios coloured with blue correspond to different model specifications in the same data setting.
based functional random effects to model complex correlations within and between functions and dimensions. An important contribution of our approach is estimating the parsimonious multivariate FPC basis from the data. This allows us to account not only for auto-covariances, but also for non-trivial cross-covariances over dimensions, which are difficult to adequately model using alternative approaches such as parametric covariance functions like the Matèrn family or penalized splines, which imply a parsimonious covariance only within but not necessarily between functions. As a FAMM-type regression model, a wide range of covariate effect types is available, also providing pointwise CBs. Our applications show that the multiFAMMs can give valuable insight into the multivariate correlation structure of the functions in addition to the mean structure.
An apparent benefit of multivariate modeling is that it allows to answer research questions simultaneously relating to different dimensions. In addition, using multivariate FPCs reduces the number of parameters compared to fitting comparable univariate models while improving the random effects estimation by incorporating the cross-covariance in the multivariate analysis. The added computational costs are small: For our multimodal application, the multivariate approach prolongs the computation time by only five percent (104 vs. 109 minutes on a 64-bit Linux platform).
We find that the average point-wise coverage of the point-wise CBs can in some cases lie considerably below the nominal value. There are two main reasons for this: One, the CBs presented here do not incorporate the uncertainty of the eigenfunction estimation nor of the smoothing parameter selection. Two, coverage issues can arise in (scalar) mixed models, if effect functions are estimated as constant when in truth they are not (e.g. Wood;. To resolve these issues, further research on the level of scalar mixed models might be needed. A large body of research covering CB estimation for functional data (e.g. 2013;Choi and Reimherr;Liebl and Reimherr;2019) suggests that the construction of CBs is an interesting and complex problem, also outside of the FAMM framework.
It would be interesting to extend the multiFAMM to more general scenarios of multivariate functional data such as observations consisting of functions with different dimensional domains, e.g. functions over time and images as in Happ and Greven (2018). This would require adapting the estimation of the univariate auto-covariances for spatial arguments t, t . Exploiting properties of dense functional data, such as the block structure of design matrices for functions observed on a grid, could help to reduce computational cost in this case. Future research could further generalize the covariance structure of the multiFAMM by allowing for additional covariate effects. In our snooker training application, for example, a treatment effect of the snooker training might show itself in the form of reduced intra-player variance (cf. Backenroth et al.;. Ideas from distributional regression could be incorporated to jointly model the mean trajectories and covariance structure conditional on covariates.

A Derivation of Variance Decompositions
This is the derivation of the variance decompositions (6) and (7) from the model equations (1) and (2). Following the argumentation of Cederbaum et al. (2016), the variation of the response can be decomposed per dimension using iterated expectations as assuming in the second equality that each observation is in exactly one group in the jth grouping layer, i.e. exactly one indicator z ijv = z 2 ijv , v = 1, . . . , V Uj is one for each i and j, and also using in the second equality that the scores are uncorrelated across eigenfunctions, in particular Cov(ρ Uj vm , ρ Uj vm ) = 0 if m = m .
Similarly, the total variance as measured by the sum of the univariate variances can be decomposed as B Additional Results for the Snooker Data

B.1 Data Description
In this section, we provide additional information of the study by Enghofer (2014). The participants of the snooker training study volunteered if they wanted to take part in the training group. The training schedule for this treatment group recommended to perform a training two to three times a week which consisted of three exercises that are supposed to improve snooker specific muscular coordination. Note that these were autonomous trainings so that there is no reliable information if the participants followed the recommendations. In order to analyse the snooker shot of maximal force, the participants were asked to place themselves in the typical playing position. Yet no exact measure of the distance between the tip of their cue and the cue ball is available and thus the time of the impulse onto the cue ball cannot be exactly determined from the data. The participants were videotaped and the open source software Kinovea (https://www.kinovea.org) was used to manually locate points of interest (a participant's shoulder, elbow, and hand) and track them on a two-dimensional grid over the course of the video. Figure 5 shows the univariate functions that make up the two-dimensional trajectories of the participants' elbow, hand, and shoulder. The univariate functions for the x-axis of the hand show that for most observed curves, the time of impact might lie around 0.75 of the relative time of the snooker shot. That is when the hand moves into the positive range on the x dimension. For their shot, the snooker players can either try to fix their elbow or move the elbow dynamically. In order to move the hand in a straight line (piston stroke), the elbow has to move dynamically down and up when the hand is drawn back and accelerated towards the cue ball. The latter part of the movement trajectory (after the cue hits the cue ball) does not impact the snooker shot itself. Note that a long final downwards movement of an elbow trajectory indicates that the hand is not stopped at the chest but is rather pulled through towards the shoulder. This can give insight into a player's stance at the snooker table, e.g. if the upper body is close to the table or if it is more erect.
Note that the univariate functions of Figure 5 show different characteristics over the different dimensions, and a first-order difference penalty penalizing deviations from a constant function seems to be a sensible default choice.

B.2 Data Preprocessing
In order to account for differences in body height between the participants, we first rescale the observed coordinate locations by the median distance between hand and elbow. As we are interested in the movement trajectory irrespective of phase variation, i.e. independent of the exact timing of different parts of the stroke, we apply an elastic registration approach to the data which aligns the two dimensional trajectory of the hand to its respective mean trajectory across all players and shots (Steyer et al.;2021). We then reparameterize the time for the elbow and shoulder trajectories according to the results of the hand alignment. Other registration approaches are possible and we provide both the original and the reparameterized time of the data. Due to the high frame rate of the high speed camera and a comparatively rough grid of the tracking software, the resulting multivariate functional data are dense with redundant information. The following subsection describes the coarsening method in detail. The coarsening reduces the data set from roughly 400,000 to only 56,910 scalar observations. Note that 5 functional observations are lost due to technical problems during the recording. Even bivariate trajectories within one multivariate functional observation can have different numbers of scalar observations due to inconsistencies in the tracking programme.

Coarsening Method for the Snooker Data
Functional response variables as given by the snooker trajectories typically exhibit very high autocorrelation. Thus, up to a certain point, removing single curve measurements will inflict only very limited information loss while reducing computational cost in the model fit at least linearly. We therefore coarsen the snooker trajectories as a preprocessing step, as they were originally sampled "overly" densely in that sense. In particular, there are also evaluation points that do not carry additional information as they measure the same location as the evaluation points right before and after (e.g. because the hand does not move noticeably over these three or more time points). The coarsening should a) retain original data points via subsampling to avoid pre-smoothing losses and b) be optimal in the sense that as much information as possible is preserved given the subsample size.
Finding an optimal subsample, with respect to a general criterion and for all possible subsamples, is however typically very computationally demanding in itself. We therefore propose a fast greedy coarsening algorithm recursively discarding single points from the sample. The ultimate aim of the algorithm is to find a subsample polygon (Fig. 6) representing the whole data as well as possible.
One step of the coarsening algorithm described in the following is illustrated in Figure 7. Let T 0 = {t 1 , . . . , t m } ⊂ I be the set of evaluation points t 1 < · · · < t m of a curve y : I → R 2 . We select a sequence of subsamples T 0 ⊃ T 1 ⊃ T 2 ⊃ . . . by recursively discarding the point t from T k where y(t) is closest to the line segment [y(t − ), y(t + )] between its adjacent points at sampling points subsample polygon point-polygon distance   (top), the loss ∆ (k) (t) inflicted when approximating y(t) by the line segment between its neighbors is computed for each index t 2 , . . . , t 5 (middle). The data point with the least approximation loss is discarded (bottom). Note that ∆ (k+1) (t) has to be re-computed only for the two adjacent points. All others can be passed forward.
In more detail, we proceed in the kth step of the algorithm as follows 1. ∀t ∈T k compute the step-to-step quadratic error For computing the quadratic distance ∆ (k) (t), we have to distinguish the case where the projection of y(t) on the line through y(t − ) and y(t + ) lies between the two points and the case where it lies on either side of them. To do so, define y (t) = y(t) − y(t − ) and y + = y (t + ), as well as the unit vector u = y + y + pointing from y(t − ) to y(t + ). Then we can compute This procedure may be repeated until T k+1 is of the desired sample size. However, we consider an error based stopping criterion more convenient in most cases. We suggest to specify a threshold for the cumulative step-to-step error S k = k κ=1 ∆ (κ) (t (κ) ) or a relative version R k = S k /∆ of it, relative to∆ = 1 m−2 T ∆ (∞) (t) the mean quadratic distance to the line segment [y(t 1 ), y(t m )] between the two only points left at the ultimate subsample T ∞ .
In our experience, a threshold R * = 10 −4 for R k yields a close to lossless subsample from a visual point of view. For the model, R * = 0.003 was chosen to limit computational demands. While the multiFAMM would allow for dimension specific evaluations of the curves, the warping algorithm applied as part of the preprocessing does not. Thus, we decided to apply the coarsening algorithm to the hand trajectories, which we consider the most informative component with the best signal to noise ratio. Unselected observed time points for each hand trajectory are then also dropped for the corresponding trajectories of the shoulder and elbow. As a result, the coarsened data retain the characteristic of the snooker trajectory data such that for a given time point, we observe the location of (almost) all points of interest (i.e. the grid of evaluation points is identical over the dimensions for a given observation).

Analysis of the Variance Components
The multiFAMM estimates a total of 61 univariate FPCs (20 each for B and C, and 21 for E), where each process is represented by three to five univariate FPCs on each dimension. Table 1 shows that the three points of interest (hand, elbow, and shoulder) all show a similar amount of variation. The source of variation, however, differs in interpretation. While the variation in the hand trajectories stems from differing movements, the variation in the shoulder mainly reflects different positioning. Applying higher weights to dimensions associated with the hand, for example, can shift the focus of the MFPCA to favor movement based variation of the hand trajectories in the analysis. From Table 1 it is also apparent that movements or positions along the horizontal x-axis contribute more to the variation in the data than movements or positions along the vertical y-axis. This also suggests that variation in the x direction is the driving factor of the MFPCA. By assigning higher weights in the scalar product to dimensions associated with the y-axis, the analyst can "distort" the natural grid of observations to balance out the different variation in the axes. Note that we use an unweighted scalar product in the analysis. Also keep in mind that the variation in Table 1 is calculated based on estimation step 1. Based on the cut-off criterion (6), 16 multivariate FPCs are chosen to explain 95% of total variation. For a similar amount of variance explained, 23 univariate FPCs would be needed, which would give a more complex model that contains redundancies and ignores the multivariate nature of the data. Figures 8 and 9 show i.a. the two most prominant modes of variation in the snooker training data. As described in the main part, the FPC ψ C1 seems to represent variation in the starting position (explaining about 27% of total variation. The second leading FPC ψ B1 (explaining about 15% of total variation) shows subject-specific variation related to the two snooker techniques identified by Enghofer (2014): The red trajectory (+) shows that the elbow moves first down and then up to draw the hand back and accelerate it towards the cue ball. This then vertically contracts the hand trajectory compared to fixing the elbow during the acceleration phase (blue −), which allows the hand to swing more freely (pendulum stroke). We also find that players with a personal tendency towards the pendulum stroke seem to not stop their hands at their chest. Note that for ψ B1 in Figure 8, the red trajectory is overlapping and thus masks the down-up-and-down movement of the elbow. Figure 8 shows the multivariate FPCs of the subject-specific functional random intercept included in the multiFAMM. ψ B2 (t) (explaining about 9% of the total variation) seems to capture variation in the positioning of the shoulder, size differences of the upper arm, and the final part of the movement trajectories after the cue ball is hit. Figure 9 shows the multivariate FPCs of the subject-and-session-specific random intercept. The second FPC (explaining about 6% of the total variation) shifts the relative positioning of the elbow and shoulder. Figure 10 shows the multivariate FPCs of the curve-specific random intercept. The leading FPC (explaining about 7% of total variation) captures variation in the starting position of elbow and shoulder, and in the final hand movement.

Analysis of the Estimated Effect Functions
The left panel of Figure 11 shows the estimated covariate effect functions for the covariate skill. In addition to the effect described in the main part, we point out that for skilled players, the shoulder is positioned closer to the table as well as to the body all other things being equal. The movement trajectories in the right panel of Figure 11 are composed of the estimated effect functions of intercept, treatment group, and session (blue trajectories), so that the interaction effect can be separated and interpreted (red trajectories). We do not find strong evidence for differences in the displayed mean trajectories, nor in the univariate effects (marginal plots). This suggests that the training programme did not considerably change the participants' mean movement trajectories. Figure 12 shows the estimated intercept (left) and effect functions of the covariates for treatment group (center) and session (right). The intercept (scalar plus functional) gives the mean     movement trajectories (dark blue) in the reference group, i.e. an unskilled snooker player in the control group with a shot in the first session. In the middle panel, the red trajectories show the estimated effect of the treatment group added to the intercept. The marginal plots around the movement trajectories show the estimated univariate effects and their pointwise 95% confidence intervals. We find only minor differences between movement trajectories in treatment and control group. The hand trajectories of the treatment group seem to be slightly higher and further along the x-axis than the control group, all other things being equal. The right panel compares the mean trajectory for the reference group of players in the first (blue) to the second (red) session. The univariate estimated covariate effects on the y-axis seem to indicate a slight shift in the vertical direction of the trajectories. Keep in mind that the trajectories of the hand have been centred to the origin before the analysis.

Model Diagnostics and Sensitivity Analysis
We use the univariate root relative mean squared error (urrMSE) defined in Section 5 as a criterion for model fit between the different dimensions. Table 2 shows that the model fits comparatively well on all dimensions except for the x-axis of the elbow and the y-axis of the hand. In order to get an impression of the quality of the model fit, Figure 13 shows selected fitted trajectories in solid red together with the observed trajectories in dashed grey. We choose to present the quantiles from the contribution of each observation to the multivariate root relative mean squared error (mrrMSE), i.e. quantiles from |||ζ i −ζ i ||| 2 of the multivariate functional trajectory ζ i , i = 1, ..., N and the corresponding estimateζ i (see Section 5). The presented fitted trajectories suggest that the estimates might not always overlap with the observed trajectories, which suggests residual structure in the model residuals. The top panels of Figure 14 plot the scalar residuals over time for each of the dimensions. Especially on the y-axis,   we can discern patterns in the residuals even though this type of graphic is prone to overplotting. In order to investigate residual autocorrelation, we apply the following ad-hoc method, which allows us to approximate the well-known concept of an autocorrelation function in time series analysis. We first use fast symmetric additive covariance smoothing  to estimate a smoothed correlation matrix for the residuals. Then we calculate the means of the off-diagonals, which corresponds to an approximated mean autocorrelation for a given time lag. Figure 15 shows the results based on the residuals for each dimension separately. Overall, we find that the model residuals (red) show clear signs of autocorrelation, in particular up to a time lag of 0.25. However, compared to the autocorrelation of the original data (solid grey line), we see a considerable reduction, which can be attributed to the functional random effects in the model. Table 3 also underlines the importance of the random effects in modeling the snooker trajectories. We use the predictor variance as an indicator for quantifying the effect of the fixed and random effects on the model fit. Separately on each dimension, we calculate the variance of the partial predictors on the data and give the respective proportions in Table 3. We find that with exception of the reference mean (f 0 (t)), the partial predictors of the fixed effects contribute generally little to the overall predictor variance (highest proportions for f 1 (t) with around 5% on most dimensions). Even the reference mean contributes little to the predictor variance on the shoulder and the x-axis of the elbow. Consequently, we find that most variance in the partial predictors can be assigned to the random effects. This suggests that the snooker training data might be dominated by random processes with little explanatory power of the available covariates.
We conduct a sensitivity analysis, where we increase the prespecified proportion of variance to 0.99. Additionally, we assume heteroscedasticity in the model given the residual plot in Figure  14. The model then estimates different measurement error variances with a predominantly large variance on dimension hand.x. The model almost doubles the number of FPCs and includes 30 multivariate FPCs (B: 9, C: 10, E:11). The estimation of fixed effects hardly change (results not shown) but we find a better fit to the data (compare Table 2 and Figure 13). The scalar residuals in the lower panels of Figure 14 seem to exhibit less structure for the model of the sensitivity analysis but we can still discern patterns in the residuals. Considering the residual autocorrelation, Figure 15 overall shows a reduction of autocorrelation for small time lags compared to the main model. For larger time lags we find a somewhat surprising increase of the approximated   autocorrelation function. Including a large number of comparatively trivial FPCs might thus overall improve model fit while introducing new dependencies in the residuals. Given that our main interest lies in the leading FPCs and the fixed effects, we conclude that the sensitivity analysis yields similar results with added computational complexity.

Data Description
In this section we present more information on the analysis of the consonant assimilation data. The ACO data were recorded via microphone at 32,768 Hz. In addition, all speakers wore a custom-fit palate sensor with 62 electrodes that measured the area where the tongue touched the palate during the articulation in high temporal resolution (EPG data). These primary data were summarized and transformed into a functional index over the time of articulation for the two dimensions ACO and EPG separately. Each functional index measures how similar the articulatory or acoustic pattern is to its reference patterns for the first and second consonant at every observed time point. These similarity indices vary roughly between +1 and −1, where the value +1 indicates patterns close to the reference for the final target consonant (i.e. the consonant ending the first word) and −1 for the initial target consonant (i.e. the consonant beginning the second word). Due to the specifics of the index calculation the index values can lie slightly outside the interval [−1, 1]. The curves on the ACO dimension are generally smoother than on the EPG dimension, which reflects the index calculation: The ACO signal is much richer in information because 256 continuously valued points are considered in the calculation of the similarity index for every time point while only 62 binary points enter the index calculation for the EPG signal. Details on the data generation and functional index computation can be found in Pouplier and Hoole (2016) and Cederbaum et al. (2016).

Model Specification
For this application, we follow the model specification of Cederbaum et al. (2016), who analyse only the ACO dimension of the data and ignore the second mode of the consonant assimilation data. Our specified multivariate model is with i = 1, ..., 9 the speaker index, j = 1, ..., 16 the word combination index, and h = 1, ..., H ij the repetition index. Again, B i (t) and E ijh (t) are the person-specific and curve-specific random intercepts. C j (t) is the word combination-specific random intercept, which gives a crossed random effects structure. Note that here, the curve-specific random component E ijh (t) also captures speaker and word interactions. Taking the different smoothness of the dimensions into account, the white noise measurement error ijht is assumed to follow a zero-mean bivariate normal distribution with diagonal covariance matrix diag(σ 2 ACO , σ 2 EPG ). The additive predictor of the multiFAMM is specified as with dummy covariates order j , stress1 j , stress2 j , and vowel j indicating whether in the word combination /sh/ is followed by /s/ (instead of the reference /s#sh/), the final target syllable is not stressed, the initial target syllable is not stressed, and the vowel context (i. e. the adjacent vowel for each consonant sound of interest) is /a#i/ (instead of the reference /i#a/ as in "Gemisch Salbe"), respectively. The functional intercept f 0 (t) and the effect function f 1 (t) and their deviation from a sinus-like form or zero, respectively, are especially interesting for studying assimilation.
For comparability, we follow the univariate analysis in Cederbaum et al. (2016) by specifying cubic P-splines with third order difference penalty with eight and five (marginal) basis functions for the covariate effect functions and auto-covariance estimation, respectively. The MFPCA is based on an unweighted scalar product. We choose the multivariate MFPCA truncation orders so that at least 95% of the total variation in the data is explained as presented in (6). To handle the heteroscedasticity, we use the weighted regression approach. Given the different sampling mechanisms on the dimensions, Appendix C.3 also contains an alternative analysis based on a weighted scalar product for the MFPCA.

Analysis of the Variance Components
The univariate decomposition of the auto-covariances yields five univariate FPCs for the random components B and E on both dimensions and one and two FPCs for C on ACO and EPG, respectively. The cut-off criterion based on the sum of total variation selects a total of eight FPCs. The estimated multiFAMM then contains five multivariate FPCs for the smooth residual (representing 64% of total variation) and three for the subject-specific random effects (20% of total variation) with 12% of total variation due to measurement error. With the chosen truncation order M C = 0, the crossed random component C is effectively dropped from the model as a lot of variation in the data is already explained by the included fixed effects, all of which capture characteristics of the word combinations (8 fixed effects for 16 word combinations). Table 4 shows the contribution of each random process to the total variation in the consonant assimilation data according to the fitted multiFAMM. The first row gives the eigenvalues of the random components B and E as well as the estimated univariate error variances and the total amount of variation in the data as computed in (6). The second and third row show the univariate L 2 norm of the multivariate eigenfunctions. The proportion of explained variance π is given in the final three rows. The proportions displayed in the fourth and fifth row are computed according to (7) and the last row is computed according to (6). Note that the fitted model uses the latter multivariate cut-off criterion to determine the number of FPCs. With different sampling mechanisms for different dimensions and thus different measurement errors, it is advisable to check whether the proportion of explained variance on each dimension (7) is adequate. For the consonant assimilation data, the selected number of FPCs explains about 97% of variation on EPG but only 94% on ACO, which we still deem acceptable. If a greater disparity is revealed in an application, we recommend to use the proportion of explained univariate variation as a cut-off criterion (in our case this would lead to including a sixth FPC for component E).
We present the estimated multivariate FPCs for component B in Figure 16. The estimated leading multivariate FPC ψ B1 (t) of the subject-specific random effect B i (t) (see left panels) shows a personal tendency to pronounce the final or initial target syllable distinctly, which explains roughly 12% of the total variation. It changes the relative time spent pronouncing each syllable and pulls the entire curve closer to the reference sound of the preferred consonant. The second FPC (accounting for 6% variation), however, shows the individual tendency for assimilation. It flattens or amplifies the sinus-like shape of the overall mean (black solid line) thus rendering the speaker more or less prone to distinguish between the two sounds. Note that in the univariate analysis for ACO presented in Cederbaum et al. (2016), this mode of variation is identified as the leading FPC of the person-specific random effect. The three leading FPCs ψ E1 (t), ψ B1 (t), and ψ E2 (t) all show greater univariate norms on the EPG dimension (Table 4) suggesting that this dimension is the driving source of variation in the model -but this Table 4: Variance components included in the multiFAMM. First row: Estimates of eigenvalues, univariate error variances, total variation. Second (third) row: Univariate norms of estimated FPCs on dimension ACO (EPG). Fourth (fifth) row: Proportion of univariate variation explained on dimension ACO (EPG) by eigenfunctions and error variance, total univariate variation explained. Sixth row: Proportion of multivariate variation explained by eigenfunctions and error variances, total multivariate variation explained.  Cederbaum et al. (2016). Note that only ψ B3 (t) impacts the dimensions differently in that more assimilation on one dimension equals less assimilation on the other. For the curve-specific random effect E ijh (t) (Figure 17), the leading FPC impacts primarily the final target consonant pulling it towards or pushing it away from its reference sound. ψ E2 (t) has a similar shape to ψ B2 (t). Note that the third leading FPC of E affects the mean function in opposite directions on to the two dimensions shifting the curve up on one dimension and down on the other. Figure 18 shows the estimated auto-and cross-covariance surface of components B and E . It is evident that the cross-covariances are about as large in magnitude as the auto-covariances for the dimension ACO. In univariate analyses, however, the cross-covariance is completely ignored. Note that when no covariates are included in the model, the estimated multiFAMM contains one FPC for component C (results not shown). Figure 19 presents the estimated covariate effects (black, top plot for consonant order /s#sh/, bottom for /sh#s/). Overall, we find similar shapes for the estimated effects on both dimensions. In the reference group, the functional intercept f 0 (t) shows signs of assimilation for consonant /s/. The effect f 1 (t) of covariate order, however, pushes the final target syllable (in this case /sh/) towards its reference, all other things being equal. The positive effect at the end of the observed interval then pushes the initial target syllable /s/ towards the center, thus indicating a more assimilated pronunciation. Shortly before, we find a negative effect on the dimension EPG which might indicate that for a brief section, the articulatory pattern of /s/ is indeed close to its reference but this does not necessarily translate to the produced sound. Thus, similar to Cederbaum et al. (2016) (red), we find that the final target syllable /sh/ is pulled towards the reference while the initial target syllable /s/ seems to be less affected. Given the similar shape on the dimension EPG, this supports their finding that assimilation is asymmetric. Since the estimated effects are similar across dimensions and similar to the univariate results, we refer to Cederbaum et al. (2016) for interpretation of the other fixed effects. Figure 16: FPCs for the subject-specific functional random effect B i (t) with the respective proportion of variance explained. The black solid line represents the mean trajectory to which a suitable multiple (2 √ ν B· ) of the FPC is added (red +) and subtracted (blue −). Figure 17: FPCs for the curve-specific functional random effect E ijh (t) with the respective proportion of variance explained. The black solid line represents the mean trajectory to which a suitable multiple (2 √ ν E· ) of the FPC is added (red +) and subtracted (blue −).

Comparison to Univariate Analysis
Compared to separate univariate FAMMs, the multivariate model incorporates the dependency between the dimensions, thus reducing the number of FPC bases in the analysis (for a similar amount of variance explained, ten univariate FPCs would be needed). The shaded areas in Figure 19 indicate where the standard errors of the univariate analysis presented in Cederbaum et al. (2016) are smaller than the corresponding standard errors of the multiFAMM. We find that overall, the multivariate analysis seems to give smaller standard errors. In order to compare the model fit between the univariate and the multivariate analysis, we can use the urrMSE defined in Section 5. We then compare the fitted values with the observed values for the consonant assimilation data. The multivariate analysis yields urrMSE values of 0.970 (ACO) and 0.966 (EPG), whereas independent univariate FAMMs give values of 0.978 (ACO) and 0.961 (EPG), respectively. Consequently, a univariate model analogously specified to the ACO model in Cederbaum et al. (2016) gives a slightly better model fit on the EPG data as measured by the urrMSE. However, on the ACO dimension, the multivariate analysis is slightly preferred. The added computational complexity of multivariate analyses is also negligible in our case: Fitting a univariate FAMM as proposed by Cederbaum et al. (2016) takes around 52 minutes on a 64-bit Linux platform and requires a considerable amount of RAM memory (32 GB is sufficient). The multivariate analysis maintains the requirements for internal memory, while the duration to fit the multivariate model (109 minutes on the aforementioned platform) only slightly increases compared to sequentially fitting two univariate models.

C.3 Results of the Model Using a Weighted Scalar Product
This section gives a short description of the considered multiFAMM when a weighted scalar product is used. We only present the results of estimating the eigenfunction basis as the effect on the estimated covariate effects is negligible. Table 5 shows the contribution of each random Figure 19: Estimated covariate effects (black) and comparison to univariate model (red). The corresponding 95% confidence intervals are given by the dashed line. Upper (from left to right): reference mean and covariate effects of stress1, stress2, vowel (/s#sh/). Lower (from left to right): covariate effect of order and interactions of order with stress1, stress2, vowel (/sh#s/). Areas shaded in red indicate where the standard errors of the univariate analysis are smaller. Table 5: Variance components included in the multiFAMM using a weighted scalar product. First row: Estimates of eigenvalues, univariate error variances, total variation. Second (third) row: Univariate norms of estimated FPCs on dimension ACO (EPG). Fourth (fifth) row: Proportion of univariate variation explained on dimension ACO (EPG) by eigenfunctions and error variance, total univariate variation explained. Sixth row: Proportion of multivariate variation explained by eigenfunctions and error variances, total multivariate variation explained. process to the total weighted variation and is structured similarly to Table 4. The number of eigenfunctions is chosen by weighted sum of total variation (6) which results in the same number of eigenfunctions for each random component as with the unweighted approach. However, the total share of weighted variation explained is slightly different (about 22% and 65% explained by the components B and E, respectively). Consequently, the proportion of (weighted) variation assigned to the measurement error is smaller when applying the weighted scalar product. We also find that the univariate norms of the eigenfunctions are now more evenly distributed between the dimensions ACO and EPG, with most of the norms bigger on the ACO dimension. We thus see that weighting the scalar product shifts emphasis to the dimension ACO which has a lower measurement error. With this emphasis on ACO, the proportion of univariate variation explained on EPG falls now short of 95% while ACO has a proportion of univariate variation explained of about 98%. For this model, too, we have specified the cut-off using the weighted sum of total variation (6) and the model explains 96% of variation in the data. Figure 20 shows the estimated multivariate FPCs of random component B. We find that the leading FPC again depicts an individual tendency to spend more time pronouncing the final or the initial target consonant. For ψ B2 (t), almost the same mode of variation is captured on the dimension ACO as with the unweighted scalar product. On the dimension EPG, however, we find the assimilation primarily in the final target syllable with the initial target syllable rather unaffected (more or less time spent in the pronunciation). The third FPC for the subject-specific random effect is comparable to the scenario of the unweighted scalar product. Figure 21 shows the estimated multivariate FPCs of random component E. The leading FPC is somewhat unchanged by the weighting of the scalar product. While in the unweighted case the effect of ψ E2 was distributed equally across the two consonants, we find a stronger effect on the initial target consonant for dimension ACO and on the final target consonant for dimension EPG in the weighted case. This seems to be compensated by the third FPC, where in the weighted scenario the emphasis lies on differences in the final target consonant on ACO and in the initial target consonant on EPG. The remaining FPCs are comparable for both the weighted and unweighted scalar product.
We conclude that the model based on the weighted scalar product gives similar results. However, if some dimension were considerably more noisy than the rest, weighting the scalar product might be advisable (see results of the simulation). Figure 20: FPCs for the subject-specific functional random effect B i (t) with the respective proportion of weighted variance explained. The black solid line represents the mean trajectory to which a suitable multiple (2 √ ν B· ) of the FPC is added (red +) and subtracted (blue −). Figure 21: FPCs for the curve-specific functional random effect E ijh (t) with the respective proportion of weighted variance explained. The black solid line represents the mean trajectory to which a suitable multiple (2 √ ν E· ) of the FPC is added (red +) and subtracted (blue −).

D Detailed Analysis of the Simulation Study
We conduct an extensive simulation study in order to answer the following questions: How does the performance of the proposed multiFAMM depend on different model specifications? How does the model perform in different data settings? And how does the multiFAMM compare to a univariate modeling approach, where univariate regression models as proposed by Cederbaum et al. (2016) are estimated on each dimension independently? Additionally, we evaluate the estimation of the covariance and the fixed effects. We simulate data based on the presented model fits (10) of the snooker training data (main part) and (11) of the consonant assimilation data (Appendix C). Six different settings of the data generating process are analysed, where for each data setting, we additionally compare up to seven different model specifications. One modeling scenario (data setting plus model specification) consists of independent model fits based on 500 generated data sets. Table 6 provides an overview of the analysed data settings and model specifications, which are described in the following section. Note that we use data setting 1 and model specification A, respectively, to generate benchmark estimates useful for comparison to other settings and specifications.

D.1 Simulation Description
Data Settings Table 7 provides detailed description of the data generating process for all data settings. For each data setting, 500 data sets have been simulated. For data setting 1, we generate new observations based on the model fit (11) (Appendix C) of the consonant assimilation data by randomly drawing the evaluation points, the random scores, and the measurement errors. Note that we center and decorrelate the random scores so that the empirical means and covariances coincide with the theoretical counterparts. As in the original data, each simulated data set contains the bivariate functional observations of i = 1, .., 9 individuals, each repeating the j = 1, ..., 16 different word combinations five times. Setting 2 (strong heteroscedasticity) is analogously to setting 1 but with larger difference between the measurement error variances of the two dimensions. Setting 2 mimics an application setting with (multimodal) multivariate data, where some dimensions are much noisier than others, in order to evaluate whether this imbalance interferes with the variance decomposition of the multiFAMM. Setting 3 (sparse data) focuses on the estimation quality for sparse functional data. Compared to setting 1, the number of evaluation points is reduced to three to ten measurements per dimension. In setting 4 (uncentred scores), the random scores are not decorrelated Table 6: Summary of the different data settings and model specifications analysed in the simulation study. Each analysed combination of data setting and model specification comprises 500 model fits.

Data Settings
Model Specification 1 Consonant assimilation data A True model 2 Strong heteroscedasticity B Truncation via total variation (TV) 3 Sparse data C Truncation via univariate variation (UV) 4 Uncentred scores D TV with alternate scalar product 5 Weighted scalar product E UV with alternate scalar product 6 Snooker training data F Scedastic misspecification U Univariate models individuals, each repeating 16 crossed grouping levels 5 times. Number of observed time points is an independent random draw for each dimension from a uniform discrete distribution of natural numbers in [20,50]. Observed time point is an independent random draw from uniform distribution on [0, 1]. Mean consists of functional intercept and 7 covariate effect functions as given in Appendix C.2. Covariates are given for each observation based on the crossed grouping level (word combination). Functional random effects B, E are based on estimated eigenfunctions of model in Appendix C.2 (3 and 5 FPCs). Corresponding random scores are independent draws from N (0,ν ·· ), then demeaned and decorrelated. Measurement errors are independent draws from N (0,σ 2 · ) for each observed time point on the respective dimension. 2 Measurement errors are independent draws from N (0,σ 2 1 ) for each observed time point on dim1 and independent draws from N (0, 16 ·σ 2 1 ) on dim2. Rest as in 1. 3 Number of observed time points is a random draw for each dimension from a uniform discrete distribution of natural numbers in [3,10]. Rest as in 1. 4 Random scores are independent draws from N (0,ν ·· ) (no centering or decorrelation). Rest as in 1. or centred (otherwise identical to setting 1). Especially for covariance components with few grouping levels (particularly B), this can result in a considerable departure from the modeling assumptions, which is likely to occur in such settings. This setting explores the sensitivity of the multiFAMM to violations of its assumptions. Setting 5 (weighted scalar product) is identical to setting 1 but all model components are simulated based on the estimated model using a weighted scalar product for the MFPCA (see Appendix C.3). This data setting helps to understand the impact of weights in the scalar product. For setting 6 (snooker training data), we generate new trajectory data according to the model fit (10) of the snooker training data. This allows us to evaluate a higher dimensional multiFAMM as well as the estimation quality of nested random effects.

Model Specifications
Table 8 provides a detailed description of the different modeling scenarios used in the simulation study. We denote the most accurate approach of modeling as specification A (true model). This standard scenario mirrors the data generation so that there is no model misspecification. Most notably, we fix the number of FPCs to the number used for generating the data in order to avoid truncation effects. Though somewhat unrealistic, specification A serves to separate the impact of modeling decisions or situations with more (realistic) uncertainty for the user from the overall performance of the multiFAMM. For model specification B (truncation via total variation (TV)), the truncation orders of the FPCs are chosen so that 95% of the total variation (6) are explained. In scenario C (truncation via univariate variation (UV)), we choose the number of FPCs so that on every dimension 95% of univariate variance (7) are explained. Specifications D (TV with alternate scalar product) and E (UV with alternate scalar product) use the cut-off criterions analogous to B and C but we alternate the scalar product on which the MFPCA is based: For data generated from a model based on an unweighted MFPCA, the scalar product used in these scenarios is weighted by 1 σ 2 d , and vice versa. Model specification F (scedastic misspecification) evaluates misspecifying the multiFAMM with a homoscedasticity assumption. We also contrast the multiFAMM with the univariate approach of fitting independent univariate models. In modeling scenarios denoted with U, we fit an independent FAMM on each dimension. We use the FAMMs proposed by Cederbaum et al. (2016) so that we can apply the same model specifications as for the multiFAMM (e.g. basis functions, penalties, etc.). The number of FPCs in the model is then chosen so that 95% of univariate variation is explained.

Modeling Scenarios
For each of the combinations of data settings and modeling scenarios indicated in Table 9, analyses were performed giving 500 fitted models per combination. The results presented in the main part correspond to modeling scenarios 1A (Benchmark), 1B (Cut-Off Multi), 1C (Cut-Off Uni), 3A (Sparse Data), and 4A (Uncentred Scores).

Evaluation Criteria
The accuracy of the estimated model components is measured using the rrMSE. We analyse the accuracy of the covariance estimation, i.e. eigenfunctions, eigenvalues, and measurement error variances as well as the mean estimation. For evaluating the overall accuracy of the estimateζ with the multivariate norm based on the unweighted scalar product. Note that S depends on the model component to be evaluated, e.g. for the fitted values y ijh (t), S equals N but for the subject-specific random effect B i (t), S equals the number of individuals. The mrrMSE corresponds to the evaluation criterion used in section 5 of the main part. We also define a urrMSE S ) , which allows to evaluate the dimension specific estimation accuracy as well as a straightforward comparison to the univariate modeling approach. For scalar estimates such as eigenvalues and error variances, we define the rrMSE as rrMSE(ζ,ζ) = (ζ −ζ) 2 ζ 2 with estimateζ of the scalar value ζ. The rrMSE takes on (unbounded) positive values with smaller values indicating a better fit. As the rrMSE is a relative measure, small differences between estimate and true component can result in large rrMSE values for true component norms close to zero. Note that eigenfunctions are defined only up to a sign change. We thus flip the estimated eigenfunction (multiply it by (−1)) if this results in a smaller norm for the difference between the true function and its estimate. Additionally, we evaluate the coverage of the point-wise CBs of the estimated fixed effects.

Impact of Model Specifications
The results for setting 1 (consonant assimilation data) demonstrate the importance of the number of FPCs in the accuracy of the estimation (see Figure 22 and Table 10). With specifications A (true model) and F (scedastic misspecification), the sets of FPCs are fixed giving overall low Figure 22: mrrMSE values of the fitted curves y ijh (t), the mean µ(x j , t), and the random effects B i (t) and E ijh (t) for all modeling scenarios in data setting 1 (blue shades) and the standard modeling scenario A in data settings 2 to 5.  values for the mrrMSE (the misspecification in F yields only slightly worse results). Similarly, choosing the truncation order via the proportion of univariate variance explained in scenario C gives models with roughly the same number of FPCs as is used for the data generation. The cut-off criterion based on the total amount of variance in scenario B results in more parsimonious models and thus considerably higher mrrMSE values. The number of selected FPCs also explains the wider boxplots of the scenarios B-E compared to A: rather than a larger overall variance in estimation, we find separate clusters based on the included FPC sets (see also Figure 23). For the modeling scenarios based on a weighted scalar product (1D and 1E), the number of chosen FPCs is quite similar regardless of the cut-off criterion but the overall mrrMSE values of the fitted curves (y ijh (t)) are higher than for the unweighted approach. The estimate of the mean µ(x j , t), however, is comparatively stable over the different model specifications. Figure 23 shows the urrMSE values of the fitted curves y ijh (t) for different modeling scenarios (1U, 1B, 1C, 1D, 1E) depending on the number of FPCs included in the model. For example in the univariate modeling scenario 1U, all 500 models choose two FPCs for each random effect (B2-E2) on dimension dim1. On dim2, however, different combinations of FPCs lead to considerably different urrMSE values. For the multivariate modeling scenarios, we also find that the reduction in rrMSE values depends on which additional FPC is included: ψ E5 reduces the values more than ψ B3 . Figure 24 shows the mrrMSE values of the fitted curves y ijh (t), the mean µ(x i , t), and the random effects B i (t) and E ijh (t) for the different modeling scenarios of data settings 1, 2, and 5. Again, we find considerable differences for different numbers of FPCs in the models when there is strong heteroscedasticity between the different dimensions (setting 2). The overall model fit seems to be better for applying an unweighted scalar product for the MFPCA (2B, 2C compared to 2D, 2E). Later, we will see that the model fit on single dimensions can be improved by weighting the scalar product. Note that for setting 2F, misspecifying the model assumption now has a larger negative impact on the fitted curves than in setting 1F. In data setting 5, the data are generated based on the model using a weighted scalar product. Then, the number of FPCs in modeling scenarios 5B and 5C is very similar. Interestingly, basing the MFPCA on a scalar product using weights of one can lead to a similar estimation accuracy than the standard modeling scenario (5A compared to 5E).
Our simulation study thus suggests that basing the truncation orders on the proportion of explained variation on each dimension gives parsimonious and well fitting models. If interest lies mainly on the estimation of fixed effects, the alternative cut-off criterion based on the total variation in the data allows even more parsimonious models. Furthermore, an unweighted scalar product is a reasonable starting point for the multiFAMM.

Model Performance on Different Data Settings
To compare different data settings, we focus on model specification A (true model) in Figure  22. Note that the mrrMSE values of the other data settings cannot be directly compared as the denominator of the mrrMSE (slightly) changes (except for the mean of 2A, 3A, and 4A). We find that strong heteroscedasticity (2A) mainly negatively affects the fitted curves and the smooth residual. Unsurprisingly, the results for scenario 3A suggest that the estimation accuracy is lower Figure 24: mrrMSE values of the fitted curves y ijh (t), the mean µ(x i , t), and the random effects B i (t) and E ijh (t) for different scenarios.
for sparse functional data. However, the mean estimation is comparable to more densely observed data. On the other hand, the mean estimation is susceptible to violations of the assumption of uncorrelated and centred realizations of the random effects (4A). The model then has difficulties to separate intercept and random effects (see also Figure 31), which does not necessarily translate to a worse overall fit to the data. The results of scenario 5A (weighted scalar product) suggest that the accuracy of the multiFAMM does not depend on the definition of the scalar product used for the MFPCA. Figure 25 shows the urrMSE values for the aforementioned model components of modeling scenario 6A (snooker training data) where we compare the estimation accuracy across the dimensions. The fit of the functional curves y ijh (t) suggests that there might be pronounced differences between the estimation accuracy of the dimensions, the dimensions dim1 (corresponding to elbow.x) and dim4 (corresponding to hand.y) giving high rrMSE values.
We conclude that the proposed multiFAMM performs well even in more challenging data settings such as sparse functional data or data with few grouping levels for the random effects. Especially the estimation of the fixed effects seems to be stable over the different analysed settings.

Comparison to Univariate Approach
We compare the univariate modeling approach U to multivariate modeling scenarios C (UV) and E (UV with alternate scalar product), i.e. we make sure that the proportion of explained univariate variation is also at least 95% for each model. Table 10 shows that in data settings 1 (consonant assimilation data) and 2 (strong heteroscedasticity) the number of included FPCs tends to be higher for scenario U. Yet Figure 26 indicates that the multiFAMM yields consistently lower urrMSE values on dim1 (smaller measurement error variance). For dim2, the random effects seem to be estimated more accurately and the fixed effects similarly well, but especially with the weighted scalar product (scenarios E), the overall fit for y can give higher urrMSE values. Overall, the results suggest that the unweighted scalar product is to be preferred in data situations with similar measurement error variances of the dimensions such as setting 1, where it gives reliably good results across all model components. However, downweighting the dimension Figure 25: urrMSE values of the fitted curves y ijh (t), the mean µ(x i , t), and the random effects B i (t), C ij (t) and E ijh (t) for modeling scenario 6A. One outlying observation with high urrMSE value was removed from the boxplot for E ijh (t) and dim1.  dim2 with larger error seems to be a reasonable modeling decision in setting 2 if interest lies primarily on dim1 (lowest urrMSE values for 2E). Again, we point out that the estimation of the fixed effects is relatively stable across approaches and slightly better than for the univariate modeling approach. Table 11 compares the standard errors of the fixed effects estimation for the multiFAMM in scenario 1C with the univariate modeling approach of scenario 1U. We look at the ratio of standard errors sem seu , where se m denotes the standard error of the multiFAMM and se u the standard error of the corresponding univariate FAMM. For each dimension and partial predictor, we calculate the proportion of ratios smaller and larger than one over an equidistant grid of 100 time points and all simulation runs. Especially on dim1, we find that the multivariate approach gives smaller standard errors. On dim2, the proportions are more similar with slightly more proportions larger than one. Overall, we find more ratios smaller than one in our simulation study thus indicating that the multivariate approach can yield smaller standard errors. Table  12 further shows that the coverage of these two scenarios is very similar (see subsection on the estimation of fixed effects). Figure 26: urrMSE values of the fitted curves y ijh (t), the mean µ(x i , t), and the random effects B i (t) and E ijh (t) for the univariate modeling scenario U (grey) and the multivariate modeling scenarios C and E in data settings 1 (blue) and 2 (green).
We conclude that the multivariate modeling approach can improve the mean estimation but is especially beneficial for the prediction of the random effects. In some cases, including weights in the multivariate scalar product might further improve the modeling. Figure 27 shows the mrrMSE values for the estimated eigenfunctions of the two random effects in the standard modeling scenario A of data settings 1 to 5. In general, we find that the leading eigenfunctions tend to have lower mrrMSE values. Especially the leading eigenfunctions of the smooth residual show a high accuracy. There seems to be more variance in the estimation accuracy for the subject-specific random effect. This can be confirmed with Figure 28, which contains the estimated eigenfunctions of all 500 simulation iterations for modeling scenario 1A (grey curves) and compares them to the data generating function (red curve). Overall we find that the modes can be reconstructed sufficiently well, albeit considerably better for the random smooth residual. Figure 27 also allows to compare the eigenfunction estimation across the different data settings. Keep in mind that scenario 5A is based on a different model so the mrrMSE values are not directly comparable. We find that the eigenfunction estimation of the smooth residual suffers from strong heteroscedasticity (2A), whereas the subject-specific random effect seems unaffected. The same effect (even more pronounced) can be observed for sparse data (3A), where fewer information about the correlation within functions is available. On the other hand, B i (t) suffers from few different individuals as this can lead to a considerable departure from the modeling assumption when the independent draws of scores are not centred and decorrelated (4A). With 720 different grouping levels of the smooth residual, we typically get an empirical covariance for the random scores that is closer to its assumption and thus smaller differences in mrrMSE values compared to 1A. The estimation of the eigenfunctions is somewhat less accurate in scenario 5A Figure 27: mrrMSE values of the eigenfunction estimation for the random effects B i (t) and E ijh (t) for different scenarios.

Covariance Estimation
as the weights of the scalar product have to be estimated as well. This additional uncertainty leads to a larger variance for the mrrMSE values and makes it harder to correctly identify the data generating modes of variation. Note that this does not affect the overall estimation of the random effects as discussed above.
We observe similar trends for the mrrMSE values of the estimated eigenfunctions in scenario 6A as presented in Figure 29. Leading eigenfunctions tend to be more accurately estimated and increasing the number of grouping levels (B i (t) : 25, C ij (t) : 50, E ijh (t) : 300) seems to have a diminishing effect on the variance of the mrrMSE values. Figure 30 shows the rrMSE values of the estimated multivariate eigenvalues and measurement error variances. Compared to scenario 1A, the rrMSE values of the smooth residual are higher for scenarios 2A (strong heteroscedasticity) and 3A (sparse data), whereas the rrMSE values of the subject-specific random effect are higher for scenario 4A (correlated scores). This is along the lines of the findings for the eigenfunctions and the random effects. Scenario 5A is again not directly comparable and the overall higher rrMSE values suggest that the uncertainty in the estimation of the weights of the scalar product influences the accuracy of the estimation of the eigenvalues. With regards to the error variance, the rrMSE values are comparable over the different scenarios except for the sparse data setting, where we find higher values.
We conclude that modes of variation can be recovered well in most of the model scenarios. The nested random effect and its leading modes of variation can be well captured by the multiFAMM. Figure 31 shows the urrMSE values of the estimated effect functions in scenario 1A (blue). We find that the estimation of the functional intercept f 0 (t) and the covariate effect of order f 1 (t) yield low urrMSE values on both dimensions. The estimation of the other covariates (f 2 (t) to f 4 (t)) and especially the interactions of order with the other covariates (f 5 (t) to f 7 (t)) give larger urrMSE values and a higher variance of these values. The yellow boxplots show the corresponding values in scenario 4A, thus indicating that only the estimation of the intercept is affected by correlated and uncentred scores (as an empirical score mean different from zero times the corresponding eigenfunctions is captured by the intercept). Figure 32 plots all estimated   effect functions against the data generating effect functions in scenario 1A. This suggests that the multiFAMM can overall capture characteristics of the true underlying effect functions. Figure 33 shows the urrMSE values for each effect function in scenario 6A. Overall, the functional intercept f 0 (t) and the covariate effect of skill f 1 (t) have a higher estimation accuracy as implied by lower urrMSE values. The other effect functions are also somewhat less pronounced, which is why the scaling with the inverse mean norm can give large urrMSE values. Figure 34 shows all estimated effect functions (grey curves) and the corresponding data generating functions in red. This plot suggests that especially on the dimensions dim1, dim5, and dim6 the effect estimation shows a larger variance across the simulation runs. These dimensions correspond to those dimensions in the data set (elbox.x, shoulder.x, and shoulder.y in Figure 5) where the response functions are relatively constant over t compared to the other dimensions.

Fixed Effects Estimation
For all modeling scenarios, we use the average point-wise coverage to evaluate the 95% CBs of the estimated fixed effects (see Table 12). For scenario 1A, we find the CBs to cover the true effect 88−95% of the time. Additional uncertainty e.g. about the number of FPCs further reduces the coverage (for example 1B). Overall, the coverage of fixed effects by the corresponding CBs is comparable over the different data settings (see Appendix Table 12). In data settings 2 and 3 the uncertainty in the data is increased which allows for wider CBs and thus a slightly better coverage. The averaged pointwise coverage in 4A is comparable to 1A except for the functional intercept. Here, the averaged point-wise coverage of the scalar and functional intercept lie well below 70% as the data setting makes it particularly hard to identify the true underlying mean (the scores of the random effects are not centred or decorrelated and any mean difference to zero is absorbed by the intercept). In scenario 5A, the coverage tends to be lower than for models based on an unweighted scalar product, possibly due to the added uncertainty from estimation of the weight in the scalar problem. We find averaged point-wise coverage to be considerably lower than its nominal value in scenario 6A. In this scenario, we find that in many of the simulation runs the effects are wrongly estimated as constants (cf. also Figure 34), which can lead to undercoverage also in scalar additive mixed models . Figure 35 shows the point-wise coverage average over the 500 simulation iterations in scenario 1A. It suggests that coverage tends to lie below the nominal value in areas close to the border of the functions.   for different scenarios. The coverage is averaged over the 500 simulation iterations and over 100 evaluation points along t. d Scenario β