Mixed-effects location scale models for joint modelling school value-added effects on the mean and variance of student achievement

School value-added models are widely applied to study, monitor, and hold schools to account for school differences in student learning. The traditional model is a mixed-effects linear regression of student current achievement on student prior achievement, background characteristics, and a school random intercept effect. The latter is referred to as the school value-added score and measures the mean student covariate-adjusted achievement in each school. In this article, we argue that further insights may be gained by additionally studying the variance in this quantity in each school. These include the ability to identify both individual schools and school types that exhibit unusually high or low variability in student achievement, even after accounting for differences in student intakes. We explore and illustrate how this can be done via fitting mixed-effects location scale versions of the traditional school value-added model. We discuss the implications of our work for research and school accountability systems.

services research. Two particular areas are methods for analysis of longitudinal data, and methods for minimizing bias due to missing data.

Introduction
School value-added models attempt to estimate school differences in student achievement and are widely applied in educational (Goldstein, 1997;Reynolds et al., 2014;Teddlie & Reynolds, 2000) and statistical research (American Statistical Association, 2014;Braun & Wainer, 2007;McCaffrey et al., 2004;Raudenbush & Willms, 1995;Wainer, 2004). They are also used in the US, UK, and other school accountability systems where the predicted school differences, often referred to as school value-added scores, provide the basis of reward and sanction decisions on schools (Amrein-Beardsley, 2014;Castellano & Ho, 2013;Koretz, 2017;Leckie & Goldstein, 2017;OECD, 2008). In educational and statistical research, there is an additional interest in identifying school policies and practices that predict the school differences and that might therefore prove effective at raising student achievement in schools in general.
The traditional school value-added model is formulated as a mixed-effects (multilevel or hierarchical) linear regression model (Goldstein, 2011;Raudenbush & Bryk, 2002;Snijders & Bosker, 2012) of student current achievement on student prior achievement measured at the start of the value-added period (typically defined as one or more school years or a phase of schooling) and a school random intercept effect to predict the school differences (Aitkin & Longford, 1986;Goldstein et al., 1993;Raudenbush & Bryk, 1986). The adjustment for student prior achievement is fundamental as simpler comparisons of unadjusted school mean achievement would in large part reflect school differences in student achievement present at the start of the value-added period. Such differences are argued beyond the control of the school. Student sociodemographic characteristics are often added to adjust for initial school differences in student composition more convincingly (Ballou et al., 2004;Leckie & Goldstein, 2019;Leckie & Prior, 2022;Levy et al., 2023). Schools with higher scores are described as adding more value: producing higher student achievement for any given set of students. The scores are argued to reflect the net influences of differences in the quality of teaching, availability of resources, and other policies and practices across schools which are typically unobserved to the data analyst. The regression coefficient on student prior achievement is occasionally allowed to vary across schools. The resulting random slope model is sometimes referred to as a 'differential school effectiveness' model as this extension allows schools to now have different effects for different types of students (Nuttal et al., 1989;Strand, 2010;Scherer & Nilsen, 2019).
While the traditional school value-added model is widely applied (Levy et al., 2019), it is important to realize that this model is just a regression model fitted to observational data and so the effects attributed to schools may also be caused by other factors that are not captured by the model (American Statistical Association, 2014). That is, while there is consensus that the predicted school effects are fairer and more meaningful measures to compare schools than comparing simple school mean achievement scores, the additional assumptions required to interpret these predicted school effects as causal effects rather than as merely adjusted school mean differences are challenging (Amrein-Beardsley, 2019; Reardon & Raudenbush, 2009;Rubin et al., 2004). For example, the school-level exogeneity assumption (independence of covariates and school random effect) will fail if higher prior achieving students select into more effective schools, perhaps because such students are from more affluent families who are more able to buy into the catchment areas of these schools (Angrist et al., 2021;De Fraine, 2002;Thomas & Mortimore, 1996;Timmermans & Thomas, 2015). The parameter estimates of the school value-added models presented in this article should therefore be viewed as measures of association and the predicted school effects as descriptive differences in means and variances of student achievement across schools where inevitably only partial and imperfect adjustments have been made for school differences in student characteristics at intake.
In the traditional school value-added model, the difference between observed and predicted student current achievement defines the total residual, which can be viewed as a covariate-adjusted (residualized) measure of student current achievement (i.e., a controlled comparison of student achievement levels). The total residual is modelled as the summation of the school random intercept effect and the student residual. The school random effect measures the mean student adjusted achievement in each school. In contrast, the constant residual variance implicitly assumes the variance in student adjusted achievement is the same in every school. This inconsistent modeling of the mean and variance does not seem realistic. Any given school policy or practice will have different effects on students as a function of their observed and unobserved characteristics and will therefore contribute to the variance in student adjusted achievement operating in each school. Indeed, this is the motivation for the random slope extension to the traditional value-added model described above. In practice, however, this extension can only be used to account for a limited number of observed student characteristics, not to all observed and unobserved student characteristics (Raudenbush and Bryk, 2002). Thus, the different sets of school policies and practices operating in each school will lead the variance in student adjusted achievement to vary across schools, even in random slope models.
Studying the variance in student adjusted achievement in each school may therefore provide valuable new insights into the differences in student learning between schools. Consider two schools which show similarly high levels of mean student adjusted achievement. The traditional school value-added model would describe these two schools as equally effective.
Suppose, however, the first school shows higher variance in their student adjusted achievement scores than the second school. Which school should now be viewed more positively? The school with the higher variance will have more students making exceptionally high adjusted achievement (a positive) albeit at the expense of more students also making unacceptably low adjusted achievement (a negative). All else equal, the school with the higher variance will also show a weaker link between prior and current achievement and so in this school low prior achievement students are more able to raise up the achievement distribution (a positive), but equally and necessarily, high prior achievement students are more likely to fall down the distribution (a negative). Thus, in part, how higher variance should be viewed depends on value judgements regarding whether such positives outweigh such negatives. These are not simple questions to answer. Also relevant is the underlying explanation for the difference in variance.
For example, if the higher variance seen in the first school is a result of its school policies and practices having greater differential effects on different student groups versus the second school, then higher variance might be viewed as a negative as the explanation implies that the school might not be in sufficient control in the implementation of its policies and practices and is exacerbating inequities in student learning versus the first school (Nuttal et al., 1989;Strand, 2010;Scherer & Nilsen, 2019). Though, here too, a tension lies around what is the optimal level of control. Again, these are not simple questions to answer. More generally, school differences in the adjusted variances, just like school differences in the adjusted means, may also reflect unmodelled school differences in student intake and so it is important to attempt to adjust fully for such differences.
A necessary first step to addressing these bigger questions and debates is to first measure school differences in the variance in student adjusted achievement. Only then can school effectiveness and other researchers follow up individual schools which show unusually high or low variance to try to identify the specific school policies and practices which are associated with this. Similarly, only then, can school accountability systems, via school inspections, ask schools to reflect on any unusual school variance scores and discuss these within the broader context of what is happening in these schools and other schools facing similar challenges. All these discussions should be alert to the descriptive rather than causal nature of the statistics and to the limitations of the data more generally, and these statistics should not be used to make automatic high-stakes judgements on schools.
The aim of this article is to therefore broaden the traditional school value-added model to study the effects of schools on not just mean student current achievement, but the variance in student current achievement. We do this by applying mixed-effect location scale (MELS) models to student current achievement. MELS models are an extension to conventional mixed-effects linear regression models that model the residual variance not as a constant, but as a function of the covariates and a new random effect. Thus, the residual variance is now allowed to vary across the schools. Hedeker et al. (2008) illustrated the MELS model in the context of studying intensive longitudinal data on mood. Subsequently, Hedeker and others further developed this class of models and applied it to a range of other longitudinal psychological and health data (e.g., Goldstein et al., 2018;Hedeker et al., 2012;Nordgren et al., 2019;Parker et al., 2021;Rast et al., 2012). Just as mixed-effects models more generally are routinely also applied to clustered crosssectional data, so can MELS models. Indeed, several such applications have now been published, including in social science research, (Brunton-Smith et al., 2017Leckie et al., 2014;McNeish, 2020). However, the applicability of MELS models to school value-added studies has not yet been explored. We address this via an application to school value-added models for school accountability in London, England. Specifically, we examine the following research question: How does the variance in student adjusted achievement vary across schools?
This article proceeds as follows. In Section 2, we introduce our application. In Section 3, we present the traditional random-intercept and -slope linear regression school value-added models and their extensions to MELS models. In Section 4, we present the results. In Section 5, we provide a general discussion, including implications of our work for research and school accountability.

Background
In England, since 2004, the Government has published school value-added scores derived from school value-added models for all secondary schools in the country in annual school performance tables (https://www.gov.uk/school-performance-tables). These scores aim to measure the value that each school adds to student achievement between the end of primary schooling national Key Stage 2 (KS2) tests (age 11, academic year 6) and the end of compulsory secondary schooling General Certificate of Secondary Education (GCSE) examinations (age 16, academic year 11). The scores play a pivotal role in the national school accountability system, informing school inspections and judgements on schools. They are also promoted to parents as a source of information when choosing schools for their children. Their high stakes use and public presentation have drawn sustained criticism from the academic literature (Goldstein & Speigelhalter, 1996;Leckie & Goldstein, 2009Prior, Jerrim, et al., 2021).
Nevertheless, these authors also argue that when used carefully and collaboratively with schools in a sensitive and less public manner there is still an important role for these scores to help identify and understand differences in student outcomes across schools and it is in this spirit that we have carried out the current research (Goldstein, 2020).

Data, Sample, Variables
We focus on schools in London and on those students who took their GCSE examinations in 2018 and therefore KS2 tests in 2013. The sample is drawn from the National Pupil Database (DfE, 2023) a census of all students in state education and consists of 71,321 students in 465 schools (mean = 153 students per school, range = 14 to 330). Student current and prior achievement are measured by students' GCSE examination and KS2 test scores (DfE, 2020). We standardize these scores to have means of 0 and SDs of 1 so that the measures can be interpreted in SD units. Henceforth, we refer to these standardized scores simply as the student age 16 and 11 scores. Figure 1 shows both scores are approximately normally distributed and linearly related with a strong Pearson correlation of 0.72. There are very slight floor and ceiling effects in age 16 scores. FIGURE 1. Histograms and scatterplot of student age 16 and age 11 scores. Table 1 presents summary statistics for the student characteristics. Of note, 61% of students are non-white and 35% poor (as measured by receipt of free school meals). The London sample is therefore more ethnically diverse and poorer than the full English sample where only around 25% of students are non-white and 25% poor (Leckie & Goldstein, 2019). A minority of local authorities operate selective rather than comprehensive admissions. In these areas, grammar schools select students based on high performance in entrance examinations and so by definition have high mean age 11 scores and tend also to be educationally advantaged and homogenous in terms of student sociodemographic characteristics.
Secondary modern schools take those students not admitted to grammar schools.

Model 1: Random-intercept Model
The traditional school value-added model (Aitkin & Longford, 1986;Goldstein et al., 1993;Raudenbush & Bryk, 1986) can be written as the following random-intercept linear regression = 0 + 1 1 + + (1) where and 1 denote current and prior achievement for student ( = 1, … , ) in school ( = 1, … , ), 0 and 1 denote the regression coefficients, the school random intercept effect, the student residual, and where and are assumed independent of one another, independent of 1 , and normally distributed with zero means and constant variances 2 and 2 .
As discussed in the Introduction, the independence assumptions are unlikely to hold and so in this article we interpret the school value-added model and the predicted school effects as descriptive rather than causal. Further student and school covariates may be added to this model and we will explore this in the Results section.
The total residual + measures covariate-adjusted (residualized) student current achievement. That is student current achievement having adjusted for prior achievement. The overall average adjusted achievement is 0. The random effect therefore measures the mean student adjusted achievement in each school (the traditional school value-added score) while the residual measures the adjusted achievement of each student relative to their school mean. The random effect variance 2 measures the variation in school mean adjusted achievement across schools. The residual variance 2 measures the average variance in student adjusted achievement within schools. Crucially, this parameter is averaged across all schools. Thus, while the model allows mean student adjusted achievement to vary from school to school , it assumes the variance in student adjusted achievement is the same in every school 2 (homoskedasticity). Figure 2 illustrates the main details of this and subsequent models using hypothetical data on two schools. In each case, is plotted against 1 . In Model 1 (Figure 2A), the two solid lines represent the school-specific relationships 0 + 1 1 + . The dotted line depicts the average relationship between the two variables 0 + 1 1 . The school lines are parallel to the average line, because in this model only the intercept differs between schools. The line for school 1 lies above the average line, while the line for school 2 lies below it. The vertical deviations of the school lines from the average line correspond to the school-specific . In the current example we have 1 > 0 > 2 . Thus, on average, students in school 1 are predicted to score higher compared to students with the same prior achievement in the average school, while students in school 2 are predicted to score lower. The variability in these mean deviations across all schools corresponds to 2 . The vertical deviation of the student current achievement scores from their relevant school line corresponds to the . The variability in these deviations corresponds to 2 . This is constant across 1 and constant across schools.

Model 2: Random-intercept Model with Random Residual Variance
Model 2 extends Model 1 by allowing the variance in student adjusted achievement 2 to vary across schools. We do this by specifying a MELS version of the previous model (Hedeker et al., 2008). The model can be written as  (Hedeker, 2008). Figure 2B illustrates Model 2 where 2 > 1 and so School 2 shows greater variance in their student adjusted achievements than is the case for School 1 ,2 2 > ,1 2 .

Model 3: Random-intercept Model with Random Residual Variance Function
Recall the reason for entering student prior achievement (and potentially further student covariates) into the mean function of the model is that schools should not be held accountable for pre-existing differences in student achievement across schools at the start of the value-added period (Ballou et al., 2004;Leckie & Goldstein, 2019;Leckie & Prior, 2022;Levy et al., 2023). A similar argument applies when comparing the variance in student adjusted achievement across schools. For example, suppose the residual variance increases with increasing student prior achievement. This would suggest that schools with higher mean student prior achievement would in general be expected to show more variable student adjusted achievement than schools with lower mean student prior achievement and this is even though we have adjusted for student prior achievement in the mean function. However, following the arguments underpinning the traditional value-added model, this should be viewed as a reflection of their school intake rather than reflecting their school policies and practices. By entering student prior achievement into the model for the variance, we adjust for this overall variance trend. Focus then shifts to how schools deviate from this overall trend.
Model 3 therefore extends Model 2 by adding student prior achievement to the residual variance function. The model is written as where 1 is the residual variance function regression coefficient on 1 . All other terms are defined as before. Where further student and school covariates are added to the mean function, all or a subset of these may also be added to the residual variance function. However, in order to compare school intake-adjusted values of the school variance across schools we must now calculate the residual variance in each school at a common value of 1 such as the mean.
For example, , 2 = exp( 0 + 1 ̅ 1.. + ) where ̅ 1.. denotes the mean value for 1 across all students and schools. Figure 2C illustrates Model 3 where 1 > 0 and so the vertical scatter in student current achievement around each school line increases with student prior achievement in both schools and this is in addition to school 2 continuing to have greater within-school variance than school 1 ( 2 > 1 ).

Model 4: Random-slope Model
Model 4 is the differential effects version (Nuttal et al., 1989;Strand, 2010;Scherer & Nilsen, 2019) of the traditional school value-added model (Model 1) and can be written as the following random-slope linear regression where 0 and 1 denote the random intercept and random slope effects, assumed bivariate normally distributed and independent of the residual and covariates. The random intercept variance 0 2 measures the variation in school mean adjusted achievement across schools when 1 = 0. The random slope variance 1 2 measures the variation in the slope adjustment for prior achievement across schools. The random intercept-slope covariance 01 measures how these two terms covary. All other terms are defined as before. Where the model includes further student covariates, their regression coefficients may also be allowed to vary across schools.
The total residual, now 0 + 1 1 + , again measures covariate-adjusted student current achievement. However, school mean student adjusted achievement 0 + 1 1 now varies not only across schools, but also across students as a function of the covariate with the random slope 1 . Thus, this version of the model allows schools to be potentially more or less effective for students as a function of their prior achievement. Figure 2D illustrates Model 4 where 1,1 > 1,2 and so school 1 shows a steeper regression line than the average line, while school 2 shows a shallower line. The school lines are given by 0 + 1 1 + 0 + 1 1 . The vertical deviations of each school line from the average line correspond to 0 + 1 1 and so are a linear function of 1 . The figure shows the school value-added score for school 1 is positive in general, but especially positive for students with high 1 . In contrast, the school value-added score for school 2 is negative in general, but especially negative for students with high 1 .
School mean student adjusted achievement, averaging over all students in each school, is Thus, adding random slopes only partially recognizes that the variance in student adjusted achievement varies across schools.

Model 5: Random-slope Model with Random Residual Variance
Model 5 extends Model 4 by allowing the variance in student adjusted achievement to vary across schools. (Equally Model 5 extends Model 2 by adding a random slope in the mean function to student prior achievement.) We do this by specifying a MELS version of the previous model. The model can be written as where the second line of the equation specifies the log-linear function for the residual variance (see also Model 2). The three random effects 0 , 1 , are assumed trivariate normally distributed and independent of the residuals and covariates. Figure 2E illustrates Model 5 where 2 > 1 and so School 2 shows greater variance in their student adjusted achievements than is the case for School 1 ,2 2 > ,1 2 as well as a shallower slope (due to 1,1 > 1,2 ).
School mean student adjusted achievement (averaging over all students) is then given by 0 + 1 ̅ 1. as it was in the constant residual variance case (Model 4) and so we will again need to evaluate this at a common value of ̅ 1. for all schools. The variance in student adjusted achievement in each school (over all their students) is now given by 1 2 Var ( 1 ) + , 2 and so differs from the constant residual variance case (Model 4) in that the last term also now varies across schools.

Model 6: Random-slope Model with Random Residual Variance Function
Model 6 extends Model 5 by adding student prior achievement to the residual variance function. (Equally Model 6 extends Model 3 by adding a random slope to student prior achievement.) The model is written as where 1 is the residual variance function regression coefficient on 1 (see also model 3). All other terms are defined as before. Figure 2F illustrates Model 6 where 1 > 0 and so the vertical scatter in student current achievement around each school line increases with student prior achievement and this is in addition to school 2 continuing to have a shallower slope ( 1,1 > 1,2 ) and greater within-school variance than school 1 ( 2 > 1 ).
As in Model 5 (and Model 4) school mean student adjusted achievement (averaging over all students) is once again given by 0 + 1 ̅ 1. , while the variance in student adjusted achievement in each school (over all students) is now given by 1 2 Var ( 1 ) + E ( 2 ), where E ( 2 ) is the mean of the student specific residual variances in school . Crucially, this mean is free to vary across schools.

Software
The traditional school value-added models (Model 1 and 4) are typically fitted via maximum likelihood estimation using conventional mixed-effects linear regression routines in standard software (R, SAS, SPSS, Stata). However, the MELS versions of these models (Models 2, 3, 5, 6) cannot be fitted using these routines, nor can they be fitted in specialized mixed-effects modeling packages (HLM, MLwiN). Hedeker and colleagues have developed the MixWILD software to fit MELS models by maximum likelihood estimation . These models can also be fitted via Markov Chain Monte Carlo (MCMC) methods in Stata, and Mplus (McNeish, 2020), as well as dedicated Bayesian software such as Stan (including via the brms package in R; e.g., Parker et al., 2021), WinBUGS, OpenBUGS, and JAGS (including via the R2jags package R: e.g., Barrett et al., 2019). To support readers wishing to implement these models, we present annotated MixWILD, R, and Stata instructions and syntax and simulated data (Section S4 of the Supplemental information).
We fit all models using Stata . Specifically, we use the bayesmh command which implements an adaptive Metropolis-Hastings MCMC algorithm. We use hierarchical centering reparameterizations to improve mixing. We specify vague (diffuse) normal priors for all regression coefficients and minimally informative inverse Wishart priors for the random effects variance-covariance matrices. We specify overdispersed initial values for all parameters. We fit all models with four chains, each with 5,000 burnin iterations and 10,000 monitoring iterations. We judge convergence using Gelman-Rubin convergence diagnostics (Gelman & Rubin, 1992) and trace, autocorrelation, and scatter plots. All models converged and all parameters had effective sample sizes > 400. We compare model fit using the deviance information criterion (DIC) (Spiegelhalter et al., 2002). Smaller values are preferred.

Model 1: Random-intercept Model
Model 1 (Equation 1) is the traditional school value-added model. In other words, the random-intercept model. For simplicity and because not all researchers wish to additionally include student sociodemographics (Leckie & Prior, 2022;Levy et al., 2023) we only adjust for student prior achievement in this and subsequent models 1-6, but we do explore the role of further covariates in models 7 and 8. For the purpose of comparing to subsequent models, we parameterize Table 3 presents the results. The estimated slope coefficient on student age 11 score is ̂1 = 0.678, and so a 1 SD difference in age 11 score is associated with a 0.678 SD difference in age 16 score. The estimated residual variance is ̂2 = exp(−0.870) = 0.487 . The estimated total variance in student adjusted achievement is ̂2 +̂2 = 0.487 (and so student age 11 scores accounts for 51% of the variation in student age 16 scores (= 100{1 − (̂2 +̂2)}; Snijders & Bosker, 2012). The estimated between-school variance in school mean adjusted achievement is ̂2 = 0.067 and so 14% of the total variation in student adjusted achievement (= 100̂2 (̂2 +̂2) ⁄ ; Snijders & Bosker, 2012) is variation in the schools means. The betweenschool variance implies a 95% plausible values range (PVR) for the school means of Raudenbush & Bryk, 2002). Thus, students in what would be deemed the most effective schools (operating at the 97.5 th percentile of the distribution of all schools) are predicted to score 1.02 SD higher at age 16 than equivalent students in the least effective schools (operating at the 2.5 th percentile). In contrast, the estimated student residual variance ̂2 = exp(̂0) = 0.419, is assumed constant, naively implying the variance in student adjusted achievement is the same in every school. Plots confirm that the random effect and residual normality assumptions for this and subsequent models are reasonable (Supplemental information). Est. and SE denote the posterior means and SDs of the parameter chains. DIC denotes the deviance information criterion.

Model 2: Random-intercept Model with Random Residual Variance
Model 2 (Equation 2) extends the random-intercept model (Model 1, Equation 1) to allow the residual variance and therefore variance in student adjusted achievement to vary across schools. Model 2 shows a reduction in the DIC of 972 points confirming that this variation in variances is statistically significant. The mean function parameter estimates are largely unchanged. The estimated residual variance function intercept and estimated variance of the new school random effect are ̂0 = −0.881 and ̂2 = 0.037. The model-implied populationaveraged school variance in student adjusted achievement is estimated as 0.422 = exp(̂0 +̂2 2 ) (Hedeker et al., 2008), which, as expected, is close to the Model 1 estimate of 0.419. The estimated population 95% PVR of school variances of student adjusted achievement is (0.28, 0.61) = exp{̂0 ± Φ −1 (0.975)√̂2}. This range is substantial. For example, the estimated difference in student adjusted achievement between students performing at the 97.5 th and 2.5 th percentile within the most variable schools ̂, 2 = 0.61 is 3.05 SD while in the least variable schools ̂, 2 = 0.28 it is 2.09 SD (Raudenbush & Bryk, 2002).

Model 3: Random intercept Model with Random Residual Variance Function
Model 3 (Equation 3) further extends the random-intercept model to allow the residual variance to vary not just across schools (Model 2, Equation 2), but additionally as a function of student prior achievement. Model 3 is preferred to Model 2 (ΔDIC = 34) showing the residual variance significantly increases with student age 11 scores ̂1 = 0.029. Thus, schools with in general higher age 11 scores are predicted to show higher variance in student adjusted achievement. However, this relationship is very weak. The estimated population 95% PVR of school intake adjusted variances of student adjusted achievement is effectively the same as in the previous model where we did not adjust for school intake, (0.28, 0.61) = exp{̂0 +̂1 ̿ 1.. ± Φ −1 (0.975)√̂2} where ̿ 1.. = 0 denotes the London-wide average value for 1 . That is, the variation in the variance in student adjusted achievement across schools is not simply explained by some schools showing in general higher age 11 scores and therefore higher variances than others.

Model 4: Random-slope Model
Model 4 (Equation 4) is the differential effectiveness version of the traditional school-valueadded model. In other words, the random-slopes model. Recall that this model, like the traditional random-intercepts model (Model 1, Equation 1), assumes the residual variance is once again constant across all students and schools 2 . As in Model 1, we parameterize 2 as exp( 0 ).    the residual variance to vary across schools. Thus, the move from Model 4 to 5 for the current random-slope model mirrors the move we explored from Model 1 to 2 for the earlier randomintercept versions of these models.
Model 5 allows us to quantify the relative importance of the differential school effects with respect to prior achievement as a component of the overall variance in student adjusted achievement in each school. We calculate the estimated variance for each school in our sample for a common reference distribution of students with student age 11 score variance ̅ 1.. 2 = 0.83 (the mean of the sample school variances of student prior achievement). The resulting expression is ̂1 2 ̅ 1.. 2 +̂, 2 where ̂, 2 = exp(̂0 +̂) (see Section 3, Model 5). The first component less than 1% of the variance in nearly all schools. In sum, the inclusion of the random slope on prior achievement has done very little to explain the variance in student adjusted achievement in each school.

Model 6: Random-slope Model with Random Residual Variance Function
Model 6 (Equation 6) further extends the random-slope model to allow the residual variance to vary not just across schools (Model 5, Equation 5), but additionally as a function of student prior achievement. Thus, the move from Model 5 to 6 for the current random-slope model mirrors the move we explored from Model 2 to 3 for the earlier random-intercept versions of these models.
As with the sequence of random intercept models, Model 6 shows the residual variance in the random-slope model significantly increases with student age 11 scores ̂1 = 0.036. However, as with the random-intercept models, this effect is slight and does little to explain the variation in school variances across schools. Given adding the random slope has little practical importance and in order to illustrate the subsequence models as simply as possible, we return to the sequence of random-intercept models.

Characteristics
Model 7 extends Model 3 by adding student age, gender, first language, SEN status, and FSM status into the mean and residual variance functions (Table 1). Adding these characteristics to the mean function implies students are now compared to other students across London who not only share the same age 11 score, but who also share the same sociodemographic characteristics. The aim is to ensure that schools do not appear more or less effective simply as a result of recruiting more or less educationally advantaged students (Leckie & Goldstein, 2019).
The resulting improved accuracy of the predicted age 16 scores will lead the student adjusted achievement scores to in general reduce in absolute magnitude (and reorder) leading the overall variance in student adjusted achievement to decrease. In turn, the school means and variances of student adjusted achievement scores will also change, again in general reducing in magnitude and reordering. We then further adjust the school variances of student adjusted achievement by including the student characteristics in the student residual variance function. This ensures that if there are any London-wide relationships between the variance in student adjusted achievement and particular student characteristics, this again will not benefit or count against schools with disproportionate numbers of these students. students (relative to White), and students who speak English as a second language, are all predicted to score higher at age 16, than otherwise equivalent students. SEN and FSM students, in contrast, are predicted to score lower than otherwise equivalent students. These results are established and consistent with the literature (Leckie & Goldstein, 2019). What is not known is whether there are also sociodemographic differences in the variance in student adjusted achievement. The results show that, all else equal, the residual variance and therefore variance in student adjusted achievement again increases with age 11 scores but is now also shown to be higher for SEN and FSM students than for otherwise equal students. Thus, it proves harder to predict reliably the age 16 scores of these student groups relative to other student groups. In contrast, summer born students, girls, Black, and Asian students show lower variance in student adjusted achievement and therefore appear to perform in a more consistent fashion than otherwise equal student groups within schools. Est. and SE denote the posterior means and SDs of the parameter chains. Student ethnicity reference group is White. School type reference group is standard. School admissions reference group is comprehensive. School gender reference group is mixed-sex school. which ignores student background. The purpose of this figure is to explore the sensitivity of the school means and variances to the additional adjustments for student background and to therefore assess the importance of making such adjustments or not (Leckie & Prior, 2022;Levy et al., 2023). We calculate the estimated school variances in each model by plugging in the sample mean values for the covariates (Table 1) in the residual variance function, and so ̂, 2 = exp(̂0 +̂10.258 + ⋯ +̂1 1 0.348 +̂). The plots show both the school means and the school variances are correlated 0.94 across the two models. Thus, schools which show high mean adjusted achievement when one ignores student background nearly always still show high mean adjusted achievement after adjustment. The same applies for school variances of student adjusted achievement. However, even with such high correlations, the rank ordering of those schools whose social mix differ most markedly from the London-wide average still change considerably as shown by schools located furthest away from the 45-degree line in the bottom plots. Thus, the decision of whether to adjust for student background has a bearing on the manner in which many individual schools are viewed in terms of their school variances as well as their school means.

Characteristics
We now shift from attempting to best define and measure student adjusted achievement, and therefore the school means and variances of student adjusted achievement, to attempting to explain why some schools show higher mean student adjusted achievement and lower variance in student adjusted achievement than others. Unfortunately, we do not observe school policies and practices in our data. However, we do observe some school characteristics (Table 2). Model 8 extends Model 7 by adding school type, school admissions, school gender (mixed, boys, girls), and school religion to the mean and residual variance functions.
The results (Table 5) for the existing mean and residual variance function regression coefficients are very similar to before and so we restrict our interpretation here to the new results. First, consider the mean function. Relative to standard school types, school mean adjusted achievement is somewhat higher in sponsored and converter academies having adjusted for the other covariates. Similarly, school mean adjusted achievement is higher in girls schools and religious schools, all else equal. However, the most sizeable differential is related to school admissions: school mean adjusted achievement is considerably higher in grammar schools and lower in secondary modern schools relative to comprehensive schools. These results agree with the literature (Leckie & Goldstein, 2019). With respect to the residual variance function, we see new findings. School variances in student adjusted achievement tend to be lower in converter academies compared to standard school types, lower in grammar schools versus comprehensive school types, and lower in religious schools versus non-religious schools, and this is after adjusting for London wide relationships between the variance in student adjusted achievement and student characteristics. Thus, students in converter academies, grammar, and religious schools not only tend to show higher student adjusted achievement on average, but also tend to show more consistent student adjusted achievement.

Discussion
In this article, we have argued that the focus of school value-added models should broaden to measure not just school mean differences in student adjusted achievement (student achievement beyond that predicted by student prior achievement and other student background characteristics), but school variance differences in student adjusted achievement. To study school variance differences, we have proposed extending the traditional school value-added model, a random-intercept mixed-effects linear regression of student current achievement on prior achievement and other student background characteristics, by modeling the residual variance as a log-linear function of the student covariates and a new random school effect. The school random intercept effect and random residual variance in this model measure the school mean and variance in student adjusted achievement. This model can be viewed as an application of the MELS model popular in biostatistics (Hedeker et al., 2008). It is, however, important to reiterate that the school value-added models and their respective predicted school effects should be viewed as descriptive rather than causal since these models do not address the complex selection into schools processes that will be in play in many school systems.
We have illustrated this extended school value-added model with an application to schools in London. In response to our research question: Our results suggest meaningful differences in the variance in student adjusted achievement across schools. We also find a moderate to large negative association between the school mean and variance in student adjusted achievement. Thus, schools which show the highest mean student adjusted achievement also tend to be the schools which show the lowest variance in student adjusted achievement. One process by which school variance differences may arise is if there is a London-wide negative relationship between the variance in student adjusted achievement and student prior achievement. We adjusted for this by entering student prior achievement into the residual variance function. A second process by which school variance differences may arise is via interaction effects between the different school policies and practices envisaged to be represented by the school random intercept effect and observed and unobserved student characteristics. Previous research has studied this via entering a school random slope on student prior achievement and this showed schools to be differentially effective for students with low, middle, and high prior achievement.
In our application, however, these school-by-student prior achievement interactions are small and explain little of the variation in school variances between schools. We then turned our attention to entering student characteristics into the model, both in the mean and residual variance functions, to better measure student adjusted achievement. In terms of new results, we find that FSM and SEN students show greater variance in student adjusted achievement and therefore less predictable age 16 scores than otherwise equal students. The resulting predicted school means and variances of student adjusted achievement, however, are similar to those based on the model which only adjusts for student prior achievement. Nevertheless, schools whose sociodemographic student mix differ most from the average school still move up and down the London-wide rankings considerably, demonstrating the importance of adjusting for student background at least for some schools (Leckie and Goldstein, 2019;Leckie and Prior, 2022;Levy et al., 2023). Finally, we shifted our emphasis from measuring school means and variances of student adjusted achievement to seeking to explain them. We find converter academies and grammar schools tend to show lower variances in student adjusted achievement than other school types. Importantly, here too we adjusted for any overall relationship between the variance in student adjusted achievement and student prior achievement and background characteristics and so these differences in school variances lie beyond this simple explanation.
Future studies might seek to identify whether school variance differences can be predicted by specific school policies and practices. It will also be interesting and important to explore the role of school composition covariates such as the school mean and school SD of the student prior achievement (Raudenbush and Bryk, 2002). One issue that such studies should bear in mind is that some student current achievement measures may exhibit floor or ceiling effects.
Where these are pronounced, they may bias the model parameters relative to fitting models to measures without such effects. Tobit versions of the models might be considered to address this issue (Lu, 2018). Another issue is sample-size requirements. In general, we found that the residual variance function regression coefficients and predicted school effects were less precisely estimated than their analogous quantities in the mean function. This suggests that larger sample sizes are needed for these models than traditionally used for school value-added studies. Future studies might therefore use power calculations to guide such decisions (Walters et al., 2018).
More generally, however, expanding the focus of school value-added models to consider schools effects on the variance in student achievement raises value judgements and interpretational challenges that future work will need to engage with. Fundamentally, it is not clear how positively or negatively higher or lower variances should be viewed in general.
Similarly, where a given school policy or practice is identified as driving school differences in variance via differential effects on students as a function of their observed and unobserved characteristics, it will not typically be clear what the optimal degree of differential impact might be. Even if it is decided that higher variance should be interpreted in a particular way, faced now with two summaries of school effects on student learning (mean and variance effects), researchers and school accountability systems must make further value judgements as to how to best combine them into any overall summary of school effectiveness for the purpose of making overall inferences, judgements and decisions about schools (Prior, Goldstein, et al., 2021).
Crucially, it is only by extending the school value-added model to allow for school effects on the variance in student adjusted achievement that such debates are made possible. The extension we have presented paves the way for new substantive research into the reasons behind differences in variability and therefore how best such differences should be interpreted and addressed.
The school value-added model presented here can be further extended in various ways beyond simply adding further covariates and random slopes suggesting avenues for new methodological research. First, in the school effectiveness literature, there is interest in studying the consistency of school effects across academic subjects (Goldstein, 1997;Reynolds et al., 2014;Teddlie & Reynolds, 2000). We can further develop the school value-added model to study this phenomenon with respect to the school variance in student adjusted achievement.
Essentially, we would fit a multivariate response version of this model for multiple student achievement scores (Kapur et al., 2015;Leckie, 2018;Pugach et al., 2014). The model would have multiple residual variance functions, one for each academic subject. We can then study the correlations of the school means and variances of student adjusted achievement across subjects.
Second, the same multivariate response version of the model can be used to study the stability of school effects over time. Here we would fit a multivariate response model to a single achievement score, but for multiple student cohorts (Leckie & Goldstein, 2009). Third, we could include a random slope in the residual variance function (Goldstein et al., 2018;McNeish, 2021) to study whether schools exacerbate or mitigate any overall relationship between the variance in student adjusted achievement and student prior achievement. Fourth, while we have flexibly modelled the residual variance, we have not modelled the random intercept variance (the random slope model relaxed this, but in a rather specific way). It is also possible to model the random intercept variance as a log-linear function of school covariates (Hedeker et al, 2008). For example, the variability of school mean adjusted achievement scores across schools may appear greater for some school groups than others and this could then be tested by introducing the school group variable as a covariate in this second variance function. Fifth, we can expand the model to three levels to incorporate an additional random effect into the mean and residual variance functions relating to, for example, school district and thereby study school district differences in the mean and variance in student adjusted achievement. This then raises the possibility of entering school district random effects into the school random intercept variance function since school mean adjusted achievement might vary more in some school districts than in others and so with this extension we can potentially study differential school level inequalities in the education system by school district (Leckie and Goldstein, 2015). Alternatively, teacher random effects could be introduced as a new level between the student and school level. Finally, our focus has been on shifting attention from studying school mean of student adjusted achievement to additionally focusing on the variance in student adjusted achievement. In future work it would be interesting to explore further ways the distribution of student adjusted achievement might vary across schools, for example, with respect to skewness. Leckie, G., & Goldstein, H. (2017). The evolution of school league tables in England 1992-2016: 'Contextual value-added', 'expected progress' and 'progress 8'. British Educational Research Journal, 43, 193-212. Leckie, G., & Goldstein, H. (2019. The importance of adjusting for student background in school value-added models: A study of Progress 8 and school accountability in England. British Educational Research Journal, 45, 518-537. Leckie, G., & Prior, L. (2022). A Comparison of Value-Added Models for School Accountability.

Models
In this section, we describe Stata and R syntax to fit the models explored in this article by MCMC methods and MixWILD point-and-click instructions to fit these models by maximum likelihood estimation. To support readers, we provide script files and data to replicate the presented analysis.

Example Model
For simplicity, we focus on the two-level random-intercept model with a random residual variance function presented in Section 3. To illustrate the syntax as simply as possible, we consider a version of this model with only one student characteristic (student prior achievement).
This model can be written as (where a random slope is added to prior achievement).

Simulated Data
As we cannot share the data analyzed in the article, we analyze here simulated data where we use the above model as the data generating model. We simulate a single dataset with 100 schools and 25 students per school. We simulate as standard normal variate with intraclass correlation of 0.2. We specify the true parameter values as 0 = 0, 1 = 0.7, 2 = 0.05, 0 = −0.8, 1 = 0.05, 2 = 0.05, = 0.025. The resulting data can be found in the files data.dta and data.csv.

Stata: The bayesmh Command
We focus on the bayesmh Stata command . The bayesmh command implements an adaptive Metropolis-Hastings MCMC algorithm. We present the simplest possible syntax noting that mixing can be improved via model reparameterization (e.g., hierarchical centering) and by specifying various estimation options (initial values, blocking) and we encourage readers to consult the comprehensive documentation for further details.
The syntax to specify and fit this model is as follows.
.   The parameter estimates are similar to their true values and to those provided by brms in R and MixWILD (see below).

R: The brms Package
We focus on the brm function of the brms R package (Bürkner, 2017(Bürkner, , 2018 The brms package specifies model S1 using the following alternative parameterization.
Thus, the residual variance function is now specified in terms of the residual SD. Fortunately, the parameter and random effect values of the original parameterization can be easily recovered as follows 0 = 2 0 ′ , 1 = 2 1 ′ , = 2 ′ , 2 = 4 ′ 2 .
The syntax to specify and fit this model is as follows brm(bf(y ~ 1 + x + (1 |s| school), sigma ~ 1 + x + (1 |s| school)), data = mydata, where for further simplicity we use the default priors for all model parameters and random effects. These include improper flat priors for the regression coefficients, half student-t priors Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS and Tail_ESS are effective sample size measures, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).
The model ran four chains each with 1000 warmup (burn-in) iterations and 1000 monitoring iterations. Recall that the model is a reparametrized version of model S1. The outputs additionally shows that the elements of the random effect covariance matrix used in this alternative parameterizations are presented as SDs and correlations rather than as variances and covariances. We can recover the random effect variances of this alternative parameterization by squaring the random effect SDs. The random effect covariance of this alternative parameterization can be recovered by multiplying the random effect correlation by the two random effect SDs. We can then recover the parameter and random effect values associated with the parameterization S1 using the transformations listed previously. All these calculations are best applied to the underlying chains rather than the means which are displayed in the output.
Having carried out these steps, the results are as follows. The parameter estimates are similar to their true values and to those provided by bayesmh in Stata (above) and MixWILD (below).

MixWILD
The MixWILD software  is freely available at the software website https://reach-lab.github.io/MixWildGUI/. MixWILD fits models using maximum likelihood estimation via adaptive quadrature. We use the default estimation options which specifies 11 quadrature points. We encourage readers to consult the comprehensive documentation for further details.
The MixWILD software specifies model S1 using the following alternative parameterization.
ln (  The mean function random effect variance is specified on the log scale 0 . The mean and residual function random effects are assumed independent. The association between the mean and residual variance functions is instead allowed for via entering the mean function random effect as a standardized latent covariate in the residual variance function and estimating its regression coefficient . Fortunately, the parameter and random effect values of the original parameterization are easily recovered as follows 2 = exp( 0 ), = + ′ , 2 = 2 + ′ 2 , = .
The point-and-click instructions to specify and fit this model are as follows. The following Please wait … window will appear Once the estimation has completed, the window will automatically close and you will see the  2 cluster  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25  25   Dependent variable  mean  min  max  std     The MixWILD output is presented in five sections. The first four sections present the estimation options, descriptive statistics, and the results of two simpler versions of the full model used to generating starting values for the full model. The fifth and final block of output titled "Model WITH RANDOM Scale" presents the results for the full model. Recall that the model is a reparametrized version of model S1. The output additionally shows that the residual variance function random effect variance used in this alternative parameterization is presented as a SD rather than a variance. We can recover the random effect variance of this alternative parameterization by squaring the random effect SD. We can then recover the parameter and random effect values associated with the original S1 parameterization using the transformations listed previously. The corresponding standard errors can be recovered via the delta method.
Having carried out these steps, the results presented in tabular form are as follows The parameter estimates are similar to their true values and to those provide by bayesmh in Stata and brms in R (above).