Correlation models for monitoring fetal growth

Ultrasound growth measurements are monitored to evaluate if a fetus is growing normally compared with a defined standard chart at a specified gestational age. Using data from the Fetal Growth Longitudinal Study of the INTERGROWTH-21st project, we have modelled the longitudinal dependence of fetal head circumference, biparietal diameter, occipito-frontal diameter, abdominal circumference, and femur length using a two-stage approach. The first stage involved finding a suitable transformation of the raw fetal measurements (as the marginal distributions of ultrasound measurements were non-normal) to standardized deviations (Z-scores). In the second stage, a correlation model for a Gaussian process is fitted, yielding a correlation for any pair of observations made between 14 and 40 weeks. The correlation structure of the fetal Z-score can be used to assess whether the growth, for example, between successive measurements is satisfactory. The paper is accompanied by a Shiny application, see https://lxiao5.shinyapps.io/shinycalculator/.

spaced time points -a model that allows for such irregularity is required. Correlation models have previously been derived for child data [4][5][6][7][8] but not for fetal biometry data.
We model the correlation of fetal biometry (i.e. HC, BPD, OFD, AC, and FL) and derive formulae and a Shiny application that can be used to obtain the correlation for each fetal measure between measurements made at any two time points between 14 and 40 weeks of GA. We model the correlations using fetal ultrasound data from the INTERGROWTH-21 st Project Fetal Growth Longitudinal Study (FGLS) on which the international standards for fetal growth are based. 9,10 A separate analysis of the cohort demonstrated that the FGLS cohort remained healthy with adequate growth and motor development up to 2 years of age. 11

Data
The INTERGROWTH-21 st Project was a population-based longitudinal study that measured serial fetal growth scans every 5 AE1 weeks from recruitment at 9 þ0 -13 þ6 weeks of gestation until, but not beyond, 42 þ0 weeks of gestation. The FGLS component of the INTERGROWTH-21 st Project is the largest prospective study to collect data on fetal ultrasound measurements from optimally healthy pregnant women to date, collecting data in eight geographically diverse populations and using many quality control measures. The FGLS involved measuring serial fetal growth scans every 5 AE1 weeks after the initial dating scan, so that the possible ranges after the dating scan were 14-18, 19-23, 24-28, 29-33, 34-38, and 39-42 weeks of gestation. To ensure that all sites collected highquality data that were comparable within and between the study sites, all sonographers and anthropometrists were trained, and all ultrasound measurements were performed in a standardized manner following strict protocols. 12 All sites adopted uniform methods, used identical ultrasound equipment in all of the study sites, adopted standardized methodology to take fetal measurements, and employed locally accredited ultra-sonographers who underwent standardization training and monitoring.
The FGLS screened 13,108 pregnant women attending the study clinics < 14 þ0 weeks of gestation within the project's defined geographical areas; of these, 4607 (35%) who met the eligibility criteria gave informed consent and enrolled. The most common reasons for ineligibility were low maternal height (13%), BMI ! 30 (12%), and maternal age < 18 or > 35 years (11%). Thirty-six women (0.8%) who developed severe conditions during pregnancy or took up smoking or used drugs, and 71 (1.5%) who were lost to follow-up or withdrew consent were excluded. A total of 4422 women delivered a live singleton, of which 4321 women (20,313 ultrasound scans) who had pregnancies without major complications and delivered live singletons without congenital malformations that contributed data for the construction of the INTERGROWTH-21 st international fetal growth standards, 9 international gestation-specific newborn standards, 13 gestational weight gain standards, 14 and preterm postnatal growth standards 15 were used for the present analysis. This cohort experienced very low maternal and perinatal mortality and morbidity rates, 9,13 confirming that the participants were at low risk of adverse outcome and therefore contributed to the construction of the international fetal growth standards. The baseline characteristics of the study cohort across the eight sites were very similar, which was expected because women were selected from the underlying low-risk populations using the same clinical and demographic criteria. 9,16 The median number of ultrasound scans (excluding the dating scan) was 5.0 (mean ¼ 4.9, SD ¼ 0.8, range from 4 to 7) and 97% of women had four scans. Eighty-five percent of the 20,313 ultrasound scans were performed within the expected gestational age window of the protocol as shown in Figure 1. 9 The INTERGROWTH-21 st Project was approved by the Oxfordshire Research Ethics Committee "C" (reference: 08/H0606/139), the research ethics committees of the individual participating institutions, and the corresponding regional health authorities where the project was implemented. Participants provided written consent to be involved in the study.

Statistical methodology
Consider the longitudinal data fðT ij ; Y ij Þ; 1 j m i ; 1 i ng, where T ij is the gestational age in weeks for subject i at the jth visit, Y ij is one of the five ultrasound growth measurements in millimeters at T ij , m i is the number of visits for subject i, and n is the number of subjects. The total number of visits per woman is shown in Table 1.
We estimate a correlation matrix of the ultrasound measurement at different gestational ages. A single model is fitted for both sexes as some mothers do not want to know the sex of the child they expect. Because the marginal distributions of ultrasound growth measurements may be non-normal, e.g. skewed, a suitable transformation of the raw growth measurements is first identified and applied to the data to construct a working marginal reference chart. The raw fetal measurements are then transformed accordingly to provide standardized deviations (Z-scores). Next the Z-scores are modeled by a Gaussian process with zero mean and unit variance so that the temporal correlation of the process can be estimated.

Working models for marginal reference distribution
We consider the LMS transformation 17 which could transform non-normal data to make the assumption of normality acceptable. Let Y be a positive random variable and its LMS transformation is given by Here l; r 2 R þ and 2 R are location, scale and skewness parameters, respectively. If Z has a standard normal distribution, then Y is said to follow the three-parameter Box-Cox Cole-Green distribution 17 denoted by BCCGðl; r; Þ. A fourth parameter can be added to further model kurtosis: if Z has a t distribution with degrees of freedom s 2 R þ , then Y is said to follow the Box-Cox t distribution 18 denoted by BCTðl; r; ; sÞ; if Z has a  standard power exponential distribution with parameter s 2 R þ , then Y is said to follow the Box-Cox power exponential distribution 19 denoted by BCPEðl; r; ; sÞ. Note that BCTðl; r; ; s ¼ þ1Þ and BCPEðl; r; ; s ¼ 2Þ reduce to BCCGðl; r; Þ. We model the parameters in equation (1), lðtÞ; ðtÞ and rðtÞ as a smooth function of gestational age in conjunction with BCCG. The additional parameters in BCT and BCPE are defined similarly. The GAMLSS method 20 can be used to estimate such functions. Under the LMS framework, suppose thatlðtÞ;rðtÞ;ðtÞ È É are the obtained estimates, then the transformed measurements Z ij can be computed as Under the BCCG model, marginally Z ij has approximately a standard normal distribution. Under the BCT or the BCPE models, additional transforms of Z ij are needed to make Z ij normal. For simplicity, we assume all proper transformations have been applied. The Gaussian process is fully identified by its correlation matrix, which we estimate with zero mean and unit variance.

Correlation models
In this section, we estimate a correlation matrix for the Z-scores. We compare several parametric and nonparametric models. The parametric models considered here have been applied to child growth. The exponential model 21 where a; b 2 R þ are two unknown parameters that can be interpreted as the order and the rate of the change in the correlation. This model is commonly used due to its simple form and stationarity, i.e. the correlation depends only on the distance between two gestational ages. The second model (denoted by P2), proposed by 4 for child growth, takes the form where s; b 2 R þ are two unknown parameters. The model is non-stationary, but possesses the Markovian property. Indeed, via the transformation S ij ¼ logðT ij þ sÞ and S ik ¼ logðT ik þ sÞ, the correlation becomes corðZ ij ; Z ik Þ ¼ q jS ij ÀS ik j ; where q ¼ expðÀbÞ. Because growth measurements might have non-ignorable measurement errors, a nugget effect term is usually added to the above correlation models. The exponential correlation with a nugget effect model (denoted by P1þ) takes the form where r 2 is the variance of the measurement error in the Z-scores and 1 1 fg is the indicator function which is 1 if the statement inside the bracket is true and 0 otherwise. Similarly, the P2þ correlation model has the form Note that with the nugget term, neither the stationary property nor the Markovian property holds.
Parametric models are simple and easy to interpret, but they can be subject to model misspecification. Thus, in addition to the above parametric correlation models, we also considered two nonparametric correlation models. The first one is based on functional data analysis, 22 which models the Z-score of a subject as the sum of a smooth random function of the gestational age and a random measurement error term. Specifically, the functional data model is where b i ðÁÞ is a smooth random function modeled by a zero-mean Gaussian process with a smooth covariance function CðT ij ; T ik Þ ¼ Covfb i ðT ij Þ; b i ðT ik Þg, f i1 ; . . . ; im i g are independent measurement errors with variance r 2 , and b i ðÁÞ is independent from the measurement errors. Such a covariance function involves no parametric assumptions. Since VarðZ ij Þ ¼ 1, the correlation of the Z-scores at two distinct time points T ij and T ik with j 6 ¼ k is CðT ij ; T ik Þ. Estimation of this correlation matrix is described in Section 3.3.
The correlation function from the functional data method is in general nonstationary. We also consider a stationary but nonparametric correlation function by assuming that the correlation function C satisfies CðT ij ; T ik Þ ¼ gðjT ij À T ik jÞ, where g is a smooth decreasing but unspecified univariate function. Due to the presence of measurement error in the functional data model, the overall correlation between the Z-scores is still nonstationary. The estimation of g is addressed in Section 3.3. The various correlation models are summarized in Table 2.

Estimation of the correlation models
The parametric correlation models are fitted by maximizing likelihood of the Z-scores under normality. We now focus on the estimation of the two nonparametric models. Estimation methods for the functional data model are well developed in the statistics literature and here we use the fast covariance estimation method for longitudinal data, developed in Xiao et al. 23 We briefly describe the method here, which will also be useful for explaining our estimation method for NP2. First, empirical estimates of the correlation function are constructed. Specifically, let Thus, r ijk is an unbiased estimate of CðT ij ; T ik Þ whenever j 6 ¼ k. We will conduct a bivariate smoothing of the data fðT ij ; T ik ; r ijk Þ; 1 j 6 ¼ k m i ; 1 i ng to estimate the correlation function C. We use bivariate P-splines, 24 which approximate the bivariate correlation function with tensor-product B-splines and employ a penalty to avoid overfit. The penalty ensures the smoothness of the fitted correlation function, a desirable feature for fetal growth correlations. Moreover, constraints on spline coefficients are imposed to ensure that C is symmetric; see Xiao et al. 23 for further details. Denote the corresponding estimate byĈðs; tÞ, then we estimate the error variance r 2 using the identity Eðr ijj Þ ¼ CðT ij ; T ij Þ þ r 2 for 1 j m i ; 1 i n. For the second nonparametric model, by assumption, Eðr ijk Þ ¼ gðjT ij À T ik jÞ þ 1 1 fT ij ¼T ik g r 2 . Thus, we smooth the data fðjT ij À T ik j; r ijk Þ; 1 j 6 ¼ k m i ; 1 i ng to estimate the function g. Specifically, we use univariate P-splines, 25 which approximate g using B-spline bases and also control overfit using a smoothness penalty. Then the error variance r 2 can be estimated by the equality Eðr ijj Þ ¼ gð0Þ þ r 2 for 1 j m i ; 1 i n. Kernel smoothing 26 could also be used as an alternative to fitting splines.

Marginal standard charts
The estimated location, scale and skewness parameters show that a BCCG transformation model fits the data well (see Figure 2). Our empirical results also indicate that it suffices to use BCCG rather than the more complicated BCPE or BCT, as Figure 3 suggests that the estimated parameter of kurtosis is close to 2 for BCPE model and very large for BCT model. Figure 4 plots the smoothed first to fourth moments of the Z-scores against the gestational age. Specifically, nonparametric smooth functions are fitted to the data fZ k ij ; 1 j m i ; 1 i ng for k ¼ 1; 2; 3; 4. If the Z-scores are indeed marginally normal, then the estimated curves should be close to the respective constant lines y ¼ 0, 1, 0, and 3, respectively. Figure 4 suggests that the BCCG-transformed Z-scores are marginally normally distributed. A closer look at the smoothed fourth moments under different models in Figure 5 confirms that BCPE and BCT are not necessary.
Consequently, the BCCG model will be applied to construct marginal standard charts.

Correlation models
We use the BCCG model to fit the marginal distributions of the raw ultrasound measurements and then convert the transformed measurements into Z-scores. Then different parametric and nonparametric correlation models are compared via model selection criteria: AIC and BIC. Both criteria require the degrees of freedom of the model. For parametric correlation models, it is the number of free parameters. For nonparametric correlation models, the effective degrees of freedom, which evaluates the model complexity of nonparametric smoothers, 27 will be calculated.
Model comparison results for AC, FL, HC, BPD, and OFD are summarized in Table 3. Table 3 shows that the P1þ model is overall the best model across the three fetal growth measurements. It fits the data best among all parametric models and has a simpler form than all the nonparametric models, and yields the smallest BIC. To quantify the differences among different correlation models, we use P1þ as the reference correlation and evaluate  how the other models differ from P1þ. Denote q P1þ jk the correlation coefficient at times (j, k) in P1þ correlation matrix, the mean squared error (MSE) of NP1, for example, to P1þ is defined as where L ¼ 183 is the range of gestational age in days in this study.   Table 4 demonstrates an ignorable difference between P1þ and NP2, as expected because of the stationarity nature of both models. The difference between P1þ and NP1 is small, suggesting that an exponential correlation model with nugget effect is sufficient for fetal growth measurements. Indeed, the average absolute difference in correlation is only 0.020 for AC, 0.021 for FL, 0.025 for HC, 0.031 for BPD, and 0.032 for OFD. The correlations   from the other parametric models are relatively more divergent from those of P1þ compared to the nonparametric models NP1 and NP2, indicating that P1þ is superior to other parametric models. The estimated parameters for a P1þ model are summarized in Table 5. For illustration, we plot the fitted correlation surface on a grid of gestational age by weeks for AC in Figure 6. Correlation plots for FL, HC, BPD, and OFD are given in Figures 7 to 10.

Case study: dynamic growth velocity
We study the growth velocity of a randomly selected fetus using the fitted parametric correlation model, whose AC, FL, HC, BPD, and OFD are measured on six occasions between week 15 and week 38. The observed growth trajectories are shown as linked triangles in Figure 11. Based on each observed measurement at T j , we also dynamically predict the measurement T jþ1 shown as dots, each with a 95% prediction interval. Specifically, each observed measurement at T j is transformed to Z-score using equation (2). Then, a conditional Z-score Z hT jþ1 jT j ;ÁÁÁ;T 1 i at time T jþ1 is obtained assuming joint normality with the P1þ correlation model. This conditional Z-score is then transformed back to the original measurement given marginal references. Clinicians might use this approach to compare the observed fetal growth measurements versus its expected measurements at a certain age to assess whether a fetus is growing normally. They can also calculate and compare velocity increments. We will      use the correlations studied in this paper for the subsequent clinical paper on conditional fetal velocity for use by clinicians.
For this fetus, selected as random, the growth is regular for FL and HC and can be predicted accurately. For AC, its measurements are higher (still normal) than predicted during the third visit, but much lower than expected during the fourth visit. This suggests that closer monitoring might be needed. The following visits indicate that the AC of the sampled fetus falls consistently below the population mean.
To facilitate the usage of the results in practice, a Shiny application is built along with this paper, where functionalities such as visualization, calculating correlation, prediction and cSDS are integrated for all the five fetal growth measurements (https://lxiao5.shinyapps.io/shinycalculator/). Correlation tables for fetal growth measurements are provided in Tables 6 to 10. The correlations are for weekly intervals, so the results are presented in the form of five 27 x 27 correlation matrices.

Discussion
We have modelled the correlation function of the fetal growth for transformed HC, BPD, OFD, AC, and FL. Its values are the correlations of two measurements of these five variables made at any time points between 14 and 40 weeks. The FGLS cohort remained healthy with adequate growth and motor development up to 2 years of age, hence making the characterization of the expected correlation of fetal size measurements ideal. 9,11,12,15,28 The fit of the model for the correlations is adequate. Regression models such as in Ivanescu et al. 29 may also be used but in general are more difficult to deal with when the data are highly non-normal, as is the case for fetal metrics. The proposed two-stage approach is conceptually simpler, yields easy-to-interpret results, and achieves several aims. First, it gives a marginal standard chart that well handles non-normality of the measurements. Second, the correlation model combined with the   Table 9. Correlation matrix for BPD.
Week marginal standard chart provides a parsimonious approach to prediction and inference at a future visit. Indeed, not only could we predict a future growth given the previous visits (one visit, two visits, etc.) along with a prediction interval, but also we could assess if the current growth is within normal bounds given the previous records.
Although velocity charts could be an important complement to attained size charts, 9 they are not often used clinically. For example, a clinician may be interested to know whether fetal HC at 20 weeks is a good predictor of that same fetuses HC at 30 weeks. From the correlation between 20 and 30 weeks, we can predict the value of fetal HC at 30 weeks based on its value at 20 weeks. Such prediction can identify fetuses that lag behind in growth.
A limitation of the study is paucity of data and small sample sizes for some pairs of gestational ages especially in early gestation (first trimester) and at term (40 weeks).
In summary, we provide formulae for correlation coefficients for fetal biometry using prospectively collected data in eight countries and diverse settings. They were collected using unified protocols, measurement procedures and standardization. A rigorous data quality process was in place throughout the study.  Project is the largest prospective study of fetal growth involving multiple measurements per fetus. The correlation coefficients for any pair of data between 14 and 40 weeks and consequently the calculation of a velocity Z-score provide a tool for monitoring fetal growth and development over time. To facilitate this, a web application (Shiny application for now) that calculates the expected correlation between any two time points in the interval 14 to 40 weeks for HC, AC, FL, BPD, and OFD will be made freely available on the INTERGROWTH-21 st website where other applications for fetal, preterm, and newborn size are already available (https://intergrowth21. tghn.org/).
Our proposed two-stage approach can be able to accommodate simultaneous modelling of multiple fetal metrics by adapting our two-stage approach. The marginal standard charts can be estimated the same way as the first stage. Then we treat the transformed Z-scores as multiple measurements that are longitudinally observed and model the correlations across measurements and between different times. One option is a nonparametric multivariate functional data analysis. 30