Standard Errors of Kernel Equating: Accounting for Bandwidth Estimation

In standardized testing, equating is used to ensure comparability of test scores across multiple test administrations. One equipercentile observed-score equating method is kernel equating, where an essential step is to obtain continuous approximations to the discrete score distributions by applying a kernel with a smoothing bandwidth parameter. When estimating the bandwidth, additional variability is introduced which is currently not accounted for when calculating the standard errors of equating. This poses a threat to the accuracy of the standard errors of equating. In this study, the asymptotic variance of the bandwidth parameter estimator is derived and a modified method for calculating the standard error of equating that accounts for the bandwidth estimation variability is introduced for the equivalent groups design. A simulation study is used to verify the derivations and confirm the accuracy of the modified method across several sample sizes and test lengths as compared to the existing method and the Monte Carlo standard error of equating estimates. The results show that the modified standard errors of equating are accurate under the considered conditions. Furthermore, the modified and the existing methods produce similar results which suggest that the bandwidth variability impact on the standard error of equating is minimal.

paramount goal of adjusting the scores on the test forms so that they yield interchangeable results (Kolen & Brennan, 2014).
Observed-score equating is one of the fundamental methods used in test equating. Rooted in classical test theory, it is concerned with establishing the equivalence of the observed scores on two test forms and includes both linear and equipercentile equating functions (von Davier, 2011). In this study, we focus on an equipercentile observed-score equating method called kernel equating, which was initially introduced by  and further developed by von .
The conceptual framework of kernel equating follows that of equipercentile observed-score equating and posits a series of steps to obtain the equated scores: (1) pre-smoothing of the data to reduce sampling variability, (2) obtaining discrete score probability distributions, (3) obtaining continuous approximations to the discrete score distributions, (4) calculating the equating function, and (5) calculating the standard errors of equating (SEEs) (von Davier, 2011;von Davier et al., 2004). A feature that distinguishes kernel equating from other equipercentile methods is that the continuous approximations of the score probability distributions are achieved through kernels that utilize bandwidth parameters. The bandwidths allow the density functions to be as smooth as possible while retaining the properties of the original distributions. Estimating such parameters, however, introduces additional sampling variability. This variability is typically not accounted for when calculating the standard errors of kernel equating, and therefore constitutes a threat to their accuracy von Davier et al., 2004).
Accurate estimation of the SEE is integral to making correct inferences and comparisons. When estimated incorrectly, it can lead to unjustified certainty. One previous study derived modified standard errors of kernel equating when using a variant of the Silverman's rule of thumb for bandwidth estimation (Andersson et al., 2014). The current study derives modified SEEs when using the more commonly applied approach to estimate the bandwidth by minimizing a penalty function. Such an approach is more generally appropriate and does not rely on a particular distributional assumption for the test scores. Thus, the objective of this article is to introduce a modified method of calculating the SEE which accounts for the additional variability stemming from the bandwidth estimation. The new approach is compared via simulations to the current method of calculating the SEE  across several sample sizes and test lengths.
We structure this article as follows. In the subsequent section, we give a brief background to the kernel method of equating and expand on the issue of bandwidth estimation and how it influences sampling variability. We also discuss how the standard errors of kernel equating are currently estimated. Next, the asymptotic variance of the bandwidth parameter estimator is derived and is incorporated in a modified method for calculating the SEE. This modified method is further verified and compared to the existing method in a simulation study. Lastly, the results are reported and discussed.

Data Collection Designs
An observed-score equating procedure consists of two fundamental components, namely, the data collection design and the equating method . Hence, before we focus on the equating itself, it is essential to review, if only briefly, the common approaches to collecting the data. There are several data collection designs widely used in practice and they can roughly be divided into two categories: designs which use examinees from a common population taking both test forms and designs which use common items on the test forms . The first category of data collection designs includes the equivalent groups, the single group, and the counterbalanced designs, and the second category includes the common-item non-equivalent groups design. In this study, we focus on the equivalent groups design, where two independent random samples are drawn from a common population, P, and one group takes test form X, while the other takes test form Y. In the following, we will use X and Y to denote both the test forms and the random variable corresponding to the test scores from each of the test forms.
The choice of an appropriate data collection design is subject to considerations like the available sample size, time, and costs. The designs subsequently affect the equating procedure implying that some designs, such as the equivalent groups design, allow for a relatively straightforward comparison between the test forms. Other designs are much more complex, such as the common-item non-equivalent groups design. A more detailed account of the considerations and procedures involved in various data collection designs can be found in von .

Kernel Equating
In the following, we adopt the notation of von . Let the target population be T, the possible score values on the test form X be x j for j = 1, …, J; and let the possible score values on the test form Y be y k for k = 1, …, K. Thus, we define the score probabilities as and Further, an equipercentile equating function is defined in terms of the cumulative distribution functions (CDFs) which are given by and When the CDFs are continuous, we obtain the equipercentile equating function of X to Y from Strictly speaking, however, most score distributions are discrete, and their continuous approximations are required. Kernel equating addresses this problem by introducing a series of steps which can be applied to various data collection designs and which provides continuous CDFs. The steps of kernel equating are pre-smoothing, estimation of the score probabilities, continuous approximation to the discrete score distributions, equating, and calculating the SEE . We now briefly review the first two steps and dedicate subsequent subsections to present the remaining steps in more detail as they pertain to the subject at hand.

Pre-Smoothing.
In the pre-smoothing step, a parametric statistical model is fitted to the observed data. This can be done by fitting log-linear or item response theory (IRT) models to the data. The methods are described in detail in Andersson and Wiberg (2017), and Holland and Thayer (1987), and are not repeated here.
Estimation of the Score Probabilities. Having estimated the score distributions with a pre-smoothing model, the score probabilities can be obtained using a linear or non-linear transformation which, following von , we call the design function. The design function depends on the data collection design. For instance, consider the equivalent groups design and let r ¼ ðr 1 ,…, r J Þ t denote the column vector of the score probabilities of X and s ¼ ðs 1 ,…, s K Þ t denote the column vector of the score probabilities of Y. The design function (DF) is then a simple identity function, that is where I J and I K are J × J and K × K identity matrices. Design functions for other data collection designs are given explicitly in von .

Continuous Approximation and Equating
The third step in kernel equating, distinguishing it from other equipercentile methods, is how the continuous approximations to the discrete CDFs, F h X ðxÞ and G h Y ðyÞ to F(x) and G(y), are obtained.
In kernel equating, this is achieved by applying a kernel with a smoothing bandwidth parameter . There are different kernels available in the literature (Lee et al., 2008), but in this study we focus on the Gaussian kernel, which is most commonly used. Following the notation of von , let Φ(Á) denote the CDF of the Gaussian distribution, and let h X denote the bandwidth parameter. Then, the Gaussian kernel smoothing of the distribution of X has the CDF defined by where R jX (x) is given by and a X , μ X , and σ 2 X are functions of r, that is It is evident from equations (7)-(11) that for the continuous approximation to be carried out, the bandwidth parameters h X and h Y have to be estimated. The primary goal of introducing such parameters is to make the density functions as smooth as possible while retaining the properties of the original distributions. Various methods of estimating the bandwidth parameter have been suggested in previous research (Andersson et al., 2014;von Davier et al., 2004). Of particular interest to this study is a method described in  and von , which estimates the bandwidth parameter by minimizing a penalty function, named PEN 1 (h X ) in von , with respect to the bandwidth. The penalty function itself is based on the squared distances between proportions and the density function and is given by where f h X ðx j Þ is the density function, found by differentiating Equation 7 with respect to x, that is and R jX (x) is given in Equation 8. The density obtained as a result of estimating the bandwidth by minimizing PEN 1 is typically a good approximation of the discrete score distribution. Note that sometimes it can be beneficial to smooth the density function further, in which case an additional component named PEN 2 in von Davier et al. (2004) can be applied combined with PEN 1 . However, because of the complexity in accounting for estimation variability with PEN 2 , we focus exclusively on PEN 1 in the present study.
Once the continuous approximations are obtained, the equating function estimator for equating X to Y is given by The equating function for equating Y to X is analogous and found by substitution.

Standard Error of Kernel Equating
The SEE is the measure of random equating error or uncertainty which stems from the equating function being subject to estimation and thereby sampling variability. We largely base this subsection on the work of , who derived the asymptotic standard error for the kernel method of equating using the standard delta method for computing large sample approximations to the sampling variances of functions of statistics. Before proceeding, we see it appropriate to briefly introduce the multivariate delta method (Rao, 1973). Adopting the notation of Rao (1973), let the (k × 1)-dimensional random vector ffiffi ffi n p ðT kn À θ k Þ converge to a multivariate normal distribution with zero mean and covariance Σ, where T kn is an estimator, θ k is the true parameter vector, and n denotes the sample size. Let g denote a vectorvalued function with components g 1 ,…, g q , such that all the entries of g are differentiable. Then, ffiffi ffi n p ðgðT kn Þ À gðθ k ÞÞ converges to a multivariate normal distribution with zero mean and co- where G is the (q × k) Jacobian matrix of partial derivatives of g with respect to θ k . In this study, the equivalent of T kn , θ k , and g are the estimator of the score probabilities, the true score probabilities, and the equating function, respectively. The delta method was employed by , where they defined the SEE for equating X to Y by The SEE for equating Y to X is defined analogously. Treating the bandwidth parameters h X and h Y as constants,  assert that all the uncertainty in the equating function comes from the estimation of the score probabilities r and s. Hence, the variance of the equating function, and in turn the SEE, reflects the data collection design, the choice of the pre-smoothing technique used in the estimation of the population score probabilities, and the equating function itself.
Reiterating the notation used previously, let r and s define the vectors of the pre-smoothed score distributions. The calculation of the SEE per  then requires two components: the vector ∂ e Y with derivatives of the equating function e Y with respect to r and s, and the asymptotic covariance matrix Σ ðr, ŝÞ . Using the delta method (Rao, 1973), the variance of the equating function ê Y can then be expressed as where s and Σ ðr, ŝÞ is the covariance matrix of the independently estimated score probabilities, given by The matrix Σ ðr, ŝÞ has dimensions (J + K) × (J + K) where J is the dimension of r and K is the dimension of s (von . The calculation of Σ ðr, ŝÞ for different equating designs can be found in Andersson and Wiberg (2017),  and von . The second component, ∂ e Y , can be defined as follows Recalling Equation 14, the derivatives needed to compute ∂ e Y are defined in  as where ∂e Y ∂r is a row vector with dimensions 1 × J, ∂e Y ∂s is a row vector with dimensions 1 × K, and G 0 is the density evaluated at e Y (x), that is and ∂Fðx; rÞ where ∂Fðx; rÞ ∂x is given in Equation 13, R jX (x; r) in Equation 8, and where z jX is defined as The derivatives of e X are analogous to those above and can be computed by substitution. At this point, it is important to emphasize that the method of SEE calculation described above treats the bandwidth parameters h X and h Y as fixed and not as functions of r and s. Hence, the additional variability introduced by the bandwidth estimation is currently not accounted for in the calculation of the SEEs, and consequently poses a challenge with respect to their accuracy von Davier et al., 2004).

Accounting for Bandwidth Estimation Variability in Kernel Equating
In this section, we first derive the bandwidth parameter estimator variance and then introduce a modified method for the calculation of the analytical SEE that accounts for bandwidth estimation variability.

Asymptotic Variance and Standard Error of the Bandwidth Parameter Estimator
Recalling the standard delta method restated in the previous section (Rao, 1973), it is important to note that the bandwidth parameter estimator is not defined explicitly but rather as an implicit function of other asymptotically normal variables. Therefore, we use a generalization of the delta method presented by Benichou and Gail (1989) which facilitates computing the asymptotic variance of the implicitly defined bandwidth parameter estimator. Following the notation of von , h X denotes the bandwidth parameter selected to minimize PEN 1 defined by Equation (12) and r denotes the vector of estimated score probabilities. Consider further that PEN 1 is a continuously differentiable function of the estimated score probabilities r in h X , and the function is minimized so that ∂PEN 1 ∂h X ¼ 0. Applying the implicit function theorem (Rao, 1973), we can then define h X as a function of r such that h X ¼ g h X ðrÞ, and compute the partial derivatives of g h X ðrÞ with respect to r as where ∂ 2 PEN 1 ∂h 2 X is a scalar second order partial derivative of PEN 1 with respect to h X and ∂ 2 PEN 1 ∂h X ∂r 0 is a 1 × J vector of second-order partial derivatives of PEN 1 with respect to r. The ∂ 2 PEN 1 ∂h 2 X and ∂ 2 PEN 1 ∂h X ∂r 0 derivatives are unequivocally calculated using the chain rule and implicit differentiation. The equations, however, are lengthy, and we summarize them in the appendix.
Let Σ r denote the asymptotic covariance matrix of the estimated score probabilities r with dimensions J × J where J is the dimension of r. By applying the delta method for implicit functions (Benichou & Gail, 1989), we can define the asymptotic variance of the bandwidth parameter estimator b h X as and its standard error as The variance and the standard error of b h Y are analogous to those given for b h X and can be computed by substituting X by Y and r by s.

Standard Error of Equating Accounting for Bandwidth Variability
To account for the bandwidth estimation variability when computing the SEEs, we apply the chain rule together with the delta method and obtain a modified expression for the SEEs (Cox, 1984). Treating b h X and b h Y as functions of the score probability estimators r and ŝ, we redefine Equation 14 by adding a term which accounts for the bandwidth estimation variability to obtain where ∂e Y ðxÞ ∂ðr,sÞ is presented in equations (19)-(25) and Σ ðr,ŝÞ in Equation 18. When evaluating these expressions in practice, the true parameters are replaced by the parameter estimates. The additional components are then a (2 × (J + K))-matrix of partial derivatives of the bandwidth parameters as functions of the estimated score probabilities with respect to the estimated score probabilities, and ∂e Y ðxÞ ∂ðh X ,h Y Þ , a (2 × J)-matrix of first-order derivatives of the equating function with respect to the bandwidth parameters, h X and h Y , defined by where and where and G 0 is defined in Equation 22, with R jX (x) and R kY (y) given in Equation 8. Lastly, we define the SEE which accounts for bandwidth variability by

Simulation Study
To confirm the accuracy of the presented derivations, we conducted a simulation study to evaluate the estimators of the standard error of the bandwidth parameter estimator and the modified SEEs. We evaluated the estimated standard errors with respect to the Monte Carlo standard errors and compared the modified standard errors to the original standard errors that do not account for the bandwidth estimation.

Simulation Design
Data for two test forms X and Y were simulated using the two-parameter logistic (2-PL) model within the framework of IRT (de Ayala, 2009), where test lengths of 20, 40, and 80 items were considered. The discrimination parameters for both test forms were selected from the U (0.5, 2)distribution and the difficulty parameters for one test form were selected from the N (0.25, 1)distribution and the other from the N (À0.25, 1)-distribution. These distributions were considered to mimic realistic item parameters found in standardized testing (National Center for Education Statistics, 2004). The equivalent groups design was used in which two independent random samples of individuals are drawn from a single common population and where each random sample takes either of the test forms X and Y (von . Dictated by the design, no differences in the latent distributions were present between the groups. The latent distributions were set to the standard normal distribution. The equivalent groups design was considered because of its simplicity. Relative to other data collection designs, it provided an opportunity for direct comparison of the results on the test forms X and Y without additional considerations or assumptions. The score distributions for the tests X and Y with 20, 40, and 80 items are provided in Figure 1. The means (SD) of the test score distributions with 20, 40, and 80 items were 10.35 (4.52), 21.40 (8.43), and 44.03 (16.06) for test X and 7.58 (4.24), 18.17 (8.22), and 35.95 (16.89) for test Y.
In order to systematically verify the accuracy of the modified method of calculating the SEE as well as to explore how well it performs in a variety of sample sizes, sample sizes 1000, 4000, and 16,000 were considered. The study was conducted using version 3.6.2 of the statistical software environment R (R Core Team, 2019), primarily employing the packages kequate (Andersson et al., 2013), mirt (Chalmers, 2012), and numDeriv (Gilbert & Varadhan, 2019), while also utilizing newly written code implementing the modified SEEs (available in the Supplementary material). In each simulation setting, we used 10,000 replications which enabled us to all but eliminate the simulation random error. The convergence rate for all simulation settings was 100%.
The study followed the recommended kernel equating procedure , albeit with a few adjustments to verify the derivations presented in this article. For each generated data set, the following steps were carried out: (1) Pre-smoothing. The package mirt (Chalmers, 2012) was used to pre-smooth the irregularities of the raw data by estimating two separate 2-PL models to obtain item parameter estimates pertaining to each of the two groups and tests. The expectation-maximization (EM) algorithm was used for estimation (Bock & Aitkin, 1981), with the tolerance level 0.0001 and maximum number of iterations equal to 500. The asymptotic covariance matrix was estimated based on the method described in Oakes (1999).
(2) Estimation of score probabilities. Under the equivalent groups design, the score probabilities b r j and b s k were estimated based on the item parameter estimates and the assumed distribution of the latent variable (Andersson & Wiberg, 2017;von Davier et al., 2004).
(3) Continuous approximation. By adapting code from the package kequate (Andersson et al., 2013), continuous approximations to the discrete distributions were obtained by applying a Gaussian kernel with an optimal bandwidth parameter. Optimal bandwidth parameters b h X and b h Y were obtained by minimizing the first part of the penalty function, PEN 1 . When optimizing the penalty function, a golden section search with successive parabolic interpolation (Brent, 1973) using the default tolerance of 1.50 × 10 À8 was used.
The analytical derivations for the bandwidth parameter estimator variance were paramount to the study. Hence, upon obtaining the optimal bandwidths, the average standard errors of the bandwidth parameters were computed following the equations introduced in the previous section, and their accuracy was assessed using the Monte Carlo standard error (MCSE) as the criterion. When calculating the asymptotic variance of the bandwidth parameter estimator, the bandwidth parameters h X and h Y were replaced with the estimated parameters b h X and b h Y , and the asymptotic covariance matrices of the estimated score probabilities Σ r and Σ ŝ were calculated based on the implementation in kequate (Andersson et al., 2013).
(4) Equating. Upon obtaining continuous CDFs, an equipercentile equating function was applied to equate the test forms X and Y.
(5) Calculating the SEE. The average analytical SEEs were computed using the original method for calculating the SEE without accounting for the bandwidth variability , and the modified method of calculating the SEE accounting for the bandwidth variability. The Monte Carlo SEEs (MCSEE) were used as a criterion for comparing the accuracy of the modified and the original methods of the SEE calculation. Furthermore, two measures were used to assess the performance and the accuracy of the modified method as compared to the original method. We computed the absolute differences of the means of the SEEs calculated with the original and the modified methods. Additionally, the average coverage probabilities were considered which explored the average proportion of time that the 95% confidence intervals calculated employing the original and the modified methods contained the true values of the equated results. The confidence intervals were estimated with b e Y ± z 0:975 × c SEðb e Y Þ, with z 0.975 indicating the 0.975 quantile of the standard normal distribution. The analytical derivations used in computing the bandwidth estimator variance and standard errors, as well as the SEE, were verified numerically using the R package numDeriv (Gilbert & Varadhan, 2019). The R code is available for review in the Supplementary material.

Simulation Results
The study largely depended on the accuracy of the asymptotic variance and standard error of the bandwidth parameter estimator derivations. The results of the simulation for the standard errors of the bandwidth parameter estimators b h X and b h Y given in Table 1 confirmed that the derivations were correct, and the asymptotic standard errors of the bandwidth parameter estimator (ASE) were accurate as witnessed by the comparison to the MCSE. As can be expected for asymptotic variance approximation (Ferguson, 1996), the differences between the ASEs and the MCSEs were larger in smaller sample sizes.
Subsequently incorporating the bandwidth estimation variability into computing the modified SEEs, Table 2 presents two performance measures used to compare the accuracy of the original standard errors of equating (ASEE) and the modified asymptotic standard errors of equating (ASEE mod ). These measures are the absolute aggregate differences between the SEEs for two pairs, ASEE -MCSEE and ASEE mod -MCSEE, and the average coverage for both the original and the modified methods. From the average differences, it was evident that when compared to the MCSEE estimates, the modified asymptotic SEEs which take bandwidth variability into account were accurate for all sample sizes and test lengths. Furthermore, the modified asymptotic SEEs in most cases appeared to be nearly identical to those not accounting for bandwidth variability, suggesting that the bandwidth estimation influence on the SEEs was minimal. This finding was further supported by the average coverage for both the original and the modified methods. Although the modified method performed better in most settings, the differences in coverage were small.

Discussion
The kernel method of equating is an equipercentile equating method in which number-correct scores are transformed into percentile rank scores from test form X to the scale of test form Y, and the scores from the two test forms with the same percentile rank are considered to be equivalent . However, in order to obtain those equivalent scores, continuous approximations to the discrete score distributions are needed. To satisfy this requirement, kernel equating uses a Gaussian kernel with a smoothing bandwidth parameter that determines the characteristics of the continuous approximations to the raw discrete distributions . The most commonly used method for bandwidth estimation is minimizing a penalty function with respect to the bandwidth parameter . The bandwidth, in turn, is influenced by the estimated score probabilities and therefore is subject to variability. This variability, however, is not currently accounted for when calculating the SEE , challenging its accuracy and, ultimately, the fairness of the equated results. The present study explored the issue of the additional variability stemming from the bandwidth estimation and its impact on the SEE. Building on the existing methodology of  and von , we derived the asymptotic variance of the bandwidth parameter estimator using the delta method for implicit functions (Benichou & Gail, 1989) and incorporated those derivations to expand the existing formulas for calculating the SEE . Thus, we have introduced SEEs that account for bandwidth estimation variability. A simulation study with 18 data sets generated for a wide range of sample sizes and test lengths was used to illustrate the results of the modified method as compared to the current method of the SEE calculation ) and the MCSEEs.
The results offered several observations which are valuable to the testing industry. Firstly, the newly introduced SEE were accurate and close to the MCSEE estimates for all sample sizes and test lengths, suggesting that the method is suitable for practical use. Secondly, using the MCSEE as a criterion, the results of the study indicate that the original ) and the modified SEEs produce similar results, suggesting that the bandwidth estimation impact on the SEE is minimal. The presented results apply directly to any pre-smoothing method, provided that the asymptotic covariance matrix of the score probabilities has been defined for such a method. However, in this study we only utilized IRT as the pre-smoothing method and the results may be different if instead using, for example, log-linear models. However, previous research has indicated that the SEEs are fairly accurate even when not accounting for the bandwidth estimation with the penalty function, and so we do not anticipate that the results will differ substantially when using pre-smoothing with log-linear models instead of IRT models.
The method for accounting for bandwidth estimation that we used in the present study can be generalized to additional kernels and equating designs by modifying the presented results to account for the different expressions of the equating function and score probabilities with such approaches. It is furthermore possible to utilize the delta method for implicit functions with other bandwidth estimation methods provided that those specify a function that is minimized which fulfills the properties required for the implicit function theorem and the delta method. One approach which does not fulfill these requirements is the method based on PEN 2 , since the function PEN 2 is not differentiable and can have multiple local minima.
It is important to note that in this study, we derived the modified asymptotic SEE for two test forms in the setting of the equivalent groups data collection design. It can be the case that the bandwidth estimation influence on the SEE is greater for other data collection and equating designs. It would, therefore, be beneficial for future theoretical and empirical studies to focus on determining the bandwidth estimation impact on the SEE in these additional scenarios.
As a final note, we believe that it is theoretically more sound to use a method which successfully accounts for all sources of variability, however negligible those may be. Introducing the modifications to the formulas for the SEE calculation akin to those explored in this study can improve the accuracy of the standard errors of equating, and consequently, facilitate fairness and comparability of the equated results.
The partial derivatives of the PEN 1 (h Y ) with respect to h Y , ∂PEN 1 ∂h Y , ∂ 2 PEN 1 ∂h Y and ∂ 2 PEN 1 ∂h Y ∂s ι , are computed analogously.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.