Median bias reduction in random-effects meta-analysis and meta-regression

The reduction of the mean or median bias of the maximum likelihood estimator in regular parametric models can be achieved through the additive adjustment of the score equations. In this paper, we derive the adjusted score equations for median bias reduction in random-effects meta-analysis and meta-regression models and derive efficient estimation algorithms. The median bias-reducing adjusted score functions are found to be the derivatives of a penalised likelihood. The penalised likelihood is used to form a penalised likelihood ratio statistic which has known limiting distribution and can be used for carrying out hypothesis tests or for constructing confidence intervals for either the fixed-effect parameters or the variance component. Simulation studies and real data applications are used to assess the performance of estimation and inference based on the median bias-reducing penalised likelihood and compare it to recently proposed alternatives. The results provide evidence on the effectiveness of median bias reduction in improving estimation and likelihood-based inference.

property of these estimators, not shared with the mean bias-reduced ones, is that any monotone component-wise transformation of the estimators results automatically in median bias-reduced estimators of the transformed parameters. 7 Such equivariance property can be useful in the context of random-effects meta-analysis where the Fisher information and, hence, the asymptotic variances of various likelihood-based estimators depend only on the heterogeneity parameter.
In this paper, we derive the median bias-reducing adjusted score functions for random-effects meta-analysis and meta-regression. The adjusted score functions are found to correspond to a median BRPL, whose logarithm differs from the logarithm of the mean BRPL in Kosmidis et al. 3 by a simple additive term that depends on the heterogeneity parameter. Since the adjustments to the score function for mean and median bias reduction are both of order O(1), the same arguments as in Kosmidis et al. 3 are used to obtain a median BRPL ratio statistic with known asymptotic null distribution that can be used for carrying out hypothesis tests and constructing confidence regions or intervals for either the fixed-effect or the heterogeneity parameter. Simulation studies and real data applications are used to assess the performance of estimation and inference based on the median BRPL, and compare it to recently proposed alternatives, including the mean BRPL. The results provide evidence on the effectiveness of median bias reduction in improving estimation and likelihoodbased inference.
The rest of the paper is organised as follows. Section 2 considers the cocoa intake dataset 8 and a random-effects meta-analysis model as a motivational example that demonstrates how conclusions in frequentist inference may vary using different methods when the number of studies is small. Section 3 defines the random-effects metaanalysis and meta-regression model and establishes notation. Section 4 gives a formal statement of the proposed median bias-reducing adjusted score function and penalised likelihood for these models, gives the algorithm used for computing the median BRPL estimates, and briefly discusses the median BRPL ratio statistic used for inference. Section 5 revisits the example in Section 2 and gives some simulations comparing the median BRPL method to alternative approaches. Section 6 gives more extensive simulations under the random-effects metaanalysis model that evaluate the performance of median BRPL and compare it with that of ML and mean BRPL methods. An application to meat consumption data 9 and a random-effects meta-regression model is given in Section 7. Section 8 concludes with a brief discussion, and Appendix 1 contains some technical details on the derivation of the median bias-reducing adjusted score functions.

Cocoa intake and blood pressure reduction data
Consider the setting in Bellio and Guolo 10 who carry out a meta-analysis of five randomised controlled trials from Taubert et al. 8 on the efficacy of two weeks of cocoa consumption on lowering diastolic blood pressure. Figure 1 is a forest plot with the estimated mean difference in diastolic blood pressure before and after cocoa intake from each study, and the associated 95% Wald-type confidence intervals. Four out of the five studies reported a reduction of diastolic blood pressure from cocoa intake.
The random-effects meta-analysis model is used to synthesise the evidence from the five studies. In particular, let Y i be the random variable representing the mean difference in the diastolic blood pressure after two weeks of cocoa intake in the ith study. We assume that Y 1 , . . . , Y 5 are independent random variables, where Y i has a Normal distribution with mean the overall effect and variance 2 i þ , with 2 i the estimated standard error of the effect from the ith study and the heterogeneity parameter.
The forest plot in Figure 1 has been enriched with several nominally 95% confidence intervals for using various alternative methods. As is apparent, the conclusions when testing the hypothesis ¼ 0 can vary depending on the method used. More specifically, the Wald test using the ML estimates, the DerSimonian and Laird method, 1 double resampling, 6 and the likelihood ratio (LR) test give evidence that there is a relationship between cocoa consumption and diastolic blood pressure, with p-values 0.005, 0.006, 0.016, 0.030, respectively. On the other hand, Knapp and Hartung's method, 11 the mean BRPL ratio, 3 the Bartlett-corrected LR, 12 and Skovgaard's test suggest that the evidence that cocoa consumption affects diastolic blood pressure is weaker, with p-values of 0.050, 0.053, 0.058, and 0.067, respectively.

Random-effects meta-regression model
Let y i and 2 i denote the estimate of the effect from the ith study ði ¼ 1, . . . , KÞ and the associated within-study variance, respectively, and x i ¼ ðx i1 , . . . , x ip Þ T denote a p-vector of study-specific covariates that can be used to account for the heterogeneity across studies.
The within-study variances 2 i are usually assumed to be estimated well enough to be considered as known and equal to the values reported in each study. Then the observations y 1 , . . . , y K are assumed to be realisations of the random variables Y 1 , . . . , Y K , which are independent conditionally on independent random effects U 1 , . . . , where is an unknown p-dimensional vector of fixed effects. The random effects U 1 , . . . , U K are typically assumed to be independent with U i having a Nð0, Þ distribution, where is a parameter that attempts to capture the unexplained between-study heterogeneity. In matrix notation, the random-effects meta-regression model has where Y ¼ ðY 1 , . . . , Y K Þ T , X is the K Â p model matrix with x T i in its ith row, and ¼ ð 1 , . . . , K Þ T is a vector of independent errors each with a Nð0, 2 i Þ distribution and independent of U ¼ ðU 1 , . . . , U K Þ T . Under this specification, the marginal distribution of Y is multivariate normal with mean X and variance-covariance matrixAE þ I K , where I K is the K Â K identity matrix andAE ¼ diagð 2 1 , . . . , 2 K Þ. The random-effects metaanalysis results as a special case of meta-regression, by setting X to be a column of ones.
The log-likelihood function for ¼ ð T , Þ T is l ðÞ ¼ flog jWð Þj À RðÞ T Wð ÞRðÞg=2, where jWð Þj denotes the determinant of Wð Þ ¼ ðAE þ I K Þ À1 and RðÞ ¼ y À X. The gradient of the log-likelihood (score function) is  12 the Skovgaard's statistic, and the median BRPL ratio statistic. The confidence intervals n are ordered according to their length. The estimate of has not been reported, as is commonly done in forest plots, because some of the methods considered (e.g. Skovgaard, Bartlett-corrected LR, and double resampling) are designed to produce directly p-values and/or confidence intervals and do not directly correspond to an estimation method.

Median bias reduction 4.1 The method
A popular method for reducing the mean bias of ML estimates in regular statistical models is through the adjustment of the score equation. 13,14 Kenne Pagui et al. 7 propose an extension of the adjusted score equation approach which can be used to obtain median bias-reduced estimators. Specifically, under the model, the new estimator has a distribution with median closer to the ''true'' parameter value than the ML estimator. Kenne Pagui et al. 7 consider the median as a centering index for the score, and the adjusted score function for median bias reduction then results by subtracting from the score its approximate median, obtained using a Cornish-Fisher asymptotic expansion. Let j ðÞ ¼ À@ 2 l ðÞ=@@ T be the observed information matrix (see Appendix 1 for its expression), and iðÞ be the expected information matrix with tth column i t ðÞ. Let also i t ðÞ and i tt ðÞ be the tth column and the tth diagonal element of fiðÞg À1 , with t 2 f1, . . . , p þ 1g. Kenne Pagui et al. 7 show that a median bias-reduced estimator y can be obtained by solving an adjusted score equation of the form s y ðÞ ¼ sðÞ þ A y ðÞ ¼ 0, where the additive adjustment to A y ðÞ is O(1), in the sense that A y ðÞ is bounded in absolute value by a fixed constant after a sufficiently large value of K. The median bias-reducing adjustment A y ðÞ has tth element The quantities P t ðÞ ¼ E ½sðÞs T ðÞs t ðÞ and Q t ðÞ ¼ E ½Àj ðÞs t ðÞ in equation (4) are those introduced by Kosmidis and Firth 14 for mean bias-reduction, and K y ðÞ is a ð p þ 1Þ-vector with tth element In the context of meta-regression values of t and u in f1, . . . , pg correspond to the elements of parameter , and t, u ¼ p þ 1 correspond to parameter . Given that A y ðÞ is of order O(1), y has the same asymptotic distribution as, 7 i.e. multivariate normal with mean and variance-covariance matrix fiðÞg À1 , which can be consistently estimated with fið y Þg À1 .
After some algebra (see Appendix 1 for details), the median bias-reducing adjustment for the random-effects meta-analysis and meta-regression models has the form where Hð Þ ¼ XðX T Wð ÞXÞ À1 X T Wð Þ. Substituting equation (5)

Computation of median bias-reduced estimator
A direct approach for computing the estimator y ¼ ð yT ,^ y Þ T is through a modification of the two-step iterative process in Kosmidis et al. 3 At the jth iteration (j ¼ 1, 2, . . .) In the above steps, ð j Þ is the candidate value for y at the jth iteration and ð jÀ1Þ is the candidate value for^ y at the ð j À 1Þth iteration. The equation in step 2 is solved numerically, by searching for the root of the function s y ð ð j Þ , Þ in a predefined positive interval. For the computations in this manuscript, we use the DerSimonian and Laird 1 estimate of as starting value ð0Þ . The iterative process is then repeated until the components of the score function s y ðÞ are all less than ¼ 1 Â 10 À6 in absolute value at the current estimates.

Median bias-reducing penalised likelihood
Although it is not generally true that s y ðÞ is the gradient of a suitable penalised log-likelihood, in this case s y ðÞ is the gradient of the median BRPL Hence, y is also the maximum median BRPL estimator. The median BRPL in equation (6) differs from the mean BRPL derived in Kosmidis et al. 3 by the term À log½trðWð Þ 2 Þ=6. An advantage of the median BRPL estimators over mean BRPL ones is that the former are equivariant under monotone component-wise parameter transformations. 7 In the context of random-effects meta-analysis and metaregression, this equivariance implies that not only we get a median bias-reduced estimator of , but we also get median bias-reduced estimates of the standard errors for by calculating the square roots of the diagonal elements of fiðÞg À1 in equation (3) at y . This is because iðÞ is a function of only, and moreover the square roots of the diagonal elements of fiðÞg À1 are monotone functions of .

Penalised likelihood-based inference
For inference about either the components of the fixed-effect parameters or the between-study heterogeneity , we propose the use of the median BRPL ratio. If ¼ ð T , T Þ T and y is the maximiser of l y ðÞ for fixed , then the same arguments as in Kosmidis et al. 3 can be used to show that the logarithm of the median BRPL ratio statistic 2fl y ð y , y Þ À l y ð, y Þg ð7Þ has a 2 dimðÞ asymptotic distribution, as K goes to infinity. Specifically, the adjustment to the score function is additive and of order O(1). As a result, the extra terms in the asymptotic expansion of the logarithm of the median BRPL that depend on the penalty and its derivatives disappear as information increases, and the expansion has the same leading term as that of the log-likelihood (see, for example, Pace and Salvan 15 Section 9.4).

Cocoa intake and blood pressure reduction data (revisited)
The ML estimate, the maximum mean BRPL estimate and the maximum median BRPL estimate of the heterogeneity parameter in the meta-analysis model in Section 2 are^ ¼ 4:199,^ Ã ¼ 5:546, and^ y ¼ 6:897, respectively. The estimates of the common effect are ¼ À2:799, Ã ¼ À2:811, and y ¼ À2:818, with standard errors 1.000, 1.128, and 1.242, respectively. The bias-reduced estimates of and, as a consequence, the corresponding estimated standard errors for are larger than their ML counterparts, which is typical in random-effects meta-analysis. The iterative process used for computing the ML, maximum mean BRPL, and maximum median BRPL estimates converged in 4, 5, and 11 iterations, respectively. The computational runtime for the two-step iterative process which computes the ML, maximum mean BRPL, and maximum median BRPL estimates is 1:1 Â 10 À2 , 1:8 Â 10 À2 , and 1:1 Â 10 À2 seconds, respectively. Figure 2 shows the value of LR, mean BRPL and median BRPL ratio statistic in equation (7) for a range of values of , when is either or . Here and in the following simulation studies, we compare median BRPL ratio statistic with only LR and mean BRPL ratio statistics because the mean BRPL ratio statistic is a strong competitor against other alternatives in terms of inferential performance. 3 The horizontal line in Figure 2 is the 95% quantile of the limiting 2 1 distribution, and its intersection with the values of the statistics results in the endpoints of the corresponding 95% confidence intervals. For both and , the confidence intervals based on the LR statistic are the narrowest and the confidence intervals based on the median BRPL ratio statistic are the widest. Specifically, the 95% confidence intervals for are ðÀ6:21, 0:52Þ, ðÀ5:73, 0:05Þ, and ðÀ5:26, À 0:40Þ for the median BRPL ratio statistic, mean BRPL ratio statistic, and LR statistic, respectively. The corresponding 95% confidence intervals for are ð1:4, 58:0Þ, ð1:0, 38:5Þ, and ð1:1, 23:5Þ, respectively. Contrary to the LR test, the mean BRPL and median BRPL ratio tests suggest that there is only weak evidence that cocoa consumption affects diastolic blood pressure with p-values of 0.053 and 0.077.
In order to further investigate the performance of the three approaches to estimation and inference, we performed a simulation study where we simulated 10,000 independent samples from the random-effects metaanalysis model with parameter values set to the ML estimates reported earlier, i.e. 0 ¼ À2:799 and 0 ¼ 4:199.  Figure 3. Boxplots for the ML, the maximum mean BRPL, and the maximum median BRPL estimates of and as calculated from 10,000 simulated samples under the ML fit using the cocoa data. 8,10 The square point is the empirical mean of the estimates. The dashed grey horizontal line is at the parameter value used to generate the data.  (7) when is (left) and (right). The horizontal line is the 95% quantile of the limiting 2 1 distribution, and its intersection with the values of the statistics results in the endpoints of the corresponding 95% confidence intervals.
large negative mean bias, maximum median BRPL tends to over-correct for that bias, while maximum mean BRPL almost fully corrects for the bias of ML estimator. The distribution of the median BRPL estimates has the heaviest right tail. The simulation-based estimates of the probabilities of underestimation for , P 0 ð^ 0 Þ, P 0 ð^ Ã 0 Þ and P 0 ð^ y 0 Þ are 0.708, 0.591, and 0.493 for the ML, maximum mean BRPL, and maximum median BRPL, respectively, illustrating how effective maximising the median BRPL in equation (6) is in reducing the median bias of the maximum likelihood estimator of .
The simulated samples were also used to calculate the empirical p-value distribution for the two-sided tests that each parameter is equal to the true values based on the LR statistic, the mean BRPL ratio statistic, and the median BRPL ratio statistic. Table 1 shows that the empirical p-value distribution for the mean and median BRPL ratio statistics are closest to uniformity, with the latter being slightly more conservative than the former. The coverage probability of the 95% confidence intervals of based on the mean BRPL ratio and the median BRPL ratio are notably closer to the nominal level than those based on the likelihood ratio. Specifically, the coverage probabilities for are 88%, 93%, and 96% for LR, mean BRPL ratio, and median BRPL ratio, respectively, and the corresponding coverage probabilities for are 88%, 94%, and 96%, respectively.

Simulation study
More extensive simulations under the random-effects meta-analysis model (1) are performed here using the design in Brockwell and Gordon. 16 Specifically, the data y i , i 2 f1, . . . , Kg are simulated from model (1) with true fixedeffect parameter ¼ 0:5. The within-study variances 2 i are independently generated from a 2 1 distribution and are multiplied by 0.25 before restricted to the interval ð0:009, 0:6Þ. Eleven values of the between-study variance ranging from 0 to 0.1 are chosen, and the number of studies K ranges from 5 to 200. For each combination of and K considered, we simulated 10,000 data sets initialising the random number generator at a common state. The within-study variances were generated only once and kept fixed while generating the samples.
Zeng and Lin 6 compared the performance of their proposed double resampling method with the DerSimonian and Laird 1 method, the profile likelihood method in Hardy and Thompson, 17 and the resampling method in Jackson and Bowden 5 and showed that the double resampling method improves the accuracy of statistical inference. Based on these results, Kosmidis et al. 3 compared the performance of their mean BRPL approach with the double resampling method and illustrated that the former results in confidence intervals with coverage probabilities closer to the nominal level than the alternative methods.
We take advantage of the results reported in Zeng and Lin 6 and Kosmidis et al. 3 and evaluate the performance of estimation and inference based only on the median BRPL with that based on the likelihood and the mean BRPL. The estimators of the fixed and random-effect parameters obtained from the three methods are calculated using variants of the two-step algorithm described in Section 4.2. In the second step of the algorithm, the candidate values for the ML, and maximum mean and median BRPL estimators of the between-study variance are calculated by searching for the root of the partial derivatives of l ðÞ, l Ã ðÞ, and l y ðÞ with respect to , in the interval (0, 3).
First, we compare the performance of the ML, maximum mean BRPL and maximum median BRPL estimators in terms of percentage of underestimation. Figure 4 shows that the median bias-reducing adjustment is the most effective in reducing median bias even for small values of K. As expected, the ML and maximum mean BRPL estimators also approach the limit of 50% underestimation as K grows, with the latter being closer to 50% than the former. Figure 5 shows that maximum median BRPL is also effective in reducing the mean bias of the ML estimator of but only for moderate to large values of K, while maximum mean BRPL results in estimators with the smallest bias. Figures 6 and 7 show the estimated coverage probability for the one-sided and two-sided confidence intervals for based on the LR, mean BRPL ratio and median BRPL ratio statistics at the 95% nominal level.  shows the estimated coverage probability for the two-sided confidence intervals for based on the LR, mean BRPL ratio and median BRPL ratio statistics at the 95% nominal level. For small values of or small and moderate number of studies K, the empirical coverage of the intervals is larger than the nominal 95% level. In general, the confidence intervals based on mean and median BRPL ratio have empirical coverage that is closer to the nominal level with the latter having generally better coverage. The differences between the three methods diminish as the number of studies K increases. Figures 9 and 10 give the power of the LR, the mean BRPL ratio, and the median BRPL ratio tests for testing the null hypothesis ¼ 0:5 against various alternatives. Specifically, we simulated 10,000 data sets under the alternative hypothesis that parameter is equal to b ¼ 0:5 þ K À1=2 , where ranges from 0 to 2.25. In Figure 9 the power is calculated using critical values of the asymptotic null 2 1 distribution of the statistics. In Figure 10, the power is calculated using critical values based on the exact null distribution of each statistic, obtained by simulation under the null hypothesis. In this way, the three tests are calibrated to have size 5%.    shows that the three tests have monotone power and for small values of K the LR test yields the largest power. This is because the LR test is oversized, while the mean and median BRPL ratio tests are slightly more conservative and this conservativeness comes at the cost of lower power. As the number of studies K increases, the three tests approach the nominal size and provide similar power. The use of the exact critical values in Figure 10 allows us to compare the performance of the tests without letting the oversizing or the conservativeness of a test skew the power results. Figure 10 shows that the power of the median BRPL ratio test is almost identical to that of the mean BRPL ratio test, and both tests have larger power than the LR test. Again, inference based on either of the two penalised likelihoods becomes indistinguishable from classical likelihood inference as the number of studies increases.
Across all and K values considered, the average number of iterations taken per fit for the two-step iterative process to converge is 6.20, 5.75, and 5.86 iterations for ML, maximum mean BRPL, and maximum median estimation is achieved rapidly and after a small number of iterations for all three methods, with only negligible overhead with the two bias reducing methods.

Meat consumption data
Larsson and Orsini 9 investigate the association between meat consumption and relative risk of all-cause mortality. The data consists of 16 prospective studies, eight of which are about unprocessed red meat consumption and eight about processed meat consumption. Figure 11 displays the information provided by each study in the metaanalysis. The results from the studies point towards the conclusion that high consumption of red meat, in particular processed red meat, is associated with higher all-cause mortality. We consider the random-effects meta-regression model, assuming that Y i has a Nð 0 þ 1 x i , 2 i þ Þ, where Y i is the random variable representing the logarithm of the relative risk reported in the ith study, and x i takes value 1 if the consumption in the ith study is about processed red meat and 0 if it is about unprocessed meat ði ¼ 1, . . . , 16Þ. Table 2 gives the ML estimates, the mean BRPL estimates, and the median BRPL estimates of 0 , 1 and , along with the corresponding standard errors for 0 and 1 . The median BRPL estimate of and the standard errors of the fixed-effect parameters are the largest. The iterative process for computing the ML, maximum mean BRPL, and maximum median BRPL estimates converged in eight, nine, and twelve iterations, in 1:2 Â 10 À2 , 2:4 Â 10 À2 , and 1:5 Â 10 À2 seconds, respectively.
The LR test indicates some evidence for a higher risk associated to the consumption of red processed meat with a p-value of 0.047. On the other hand, the mean and median BRPL ratio tests suggest that there is weaker evidence for higher risk, with p-values of 0.066 and 0.074, respectively. Skovgaard's test also gives weak evidence for higher risk with p-value 0.073.
Similar to Section 5, we performed a simulation study in order to further investigate the performance of the three methods in a meta-regression context. We simulated 10,000 independent samples from the meta-regression model at the ML estimates reported in Table 2. Figure 12 shows boxplots of the estimates of 0 , 1 , and . Maximum likelihood underestimates the parameter , while mean BRPL and median BRPL almost fully compensate for the negative bias of ML estimates, with the latter having a slightly heavier right tail. The percentages of underestimation are 72.6%, 56.6%, and 49.9% for the ML, maximum mean BRPL, and maximum median BRPL estimators, respectively. The simulated samples were also used to calculate the empirical p-value distribution for the tests based on the likelihood, mean BRPL and median BRPL ratio statistics. Table 3 shows that the empirical p-value distribution for the median BRPL ratio statistic is the one closest to uniformity.

Concluding remarks
In this paper we derive the adjusted score equations for the median bias reduction of the ML estimator for random-effects meta-analysis and meta-regression models and describe the associated inferential procedures.   Figure 11. The meat consumption data. 9 Outcomes from 16 studies are reported in terms of the logarithm of the relative risk (Log RR) of all-cause mortality for the highest versus lowest category of unprocessed red meat, and processed meat consumption. Squares represent the mean effect estimate for each study; the size of the square reflects the weight that the corresponding study exerts in the meta-analysis. Horizontal lines represent 95% Wald-type confidence intervals (CI) of the effect estimate of individual studies. Table 2. ML, maximum mean BRPL, and maximum median BRPL estimates of the model parameters for the meat consumption data. 9 Method 0 1 We show that the solution of the median bias-reducing adjusted score equations is equivalent to maximising a penalised log-likelihood. The logarithm of that penalised likelihood differs from the logarithm of the mean BRPL in Kosmidis et al. 3 by a simple additive term. The computation of the maximum median BRPL estimators can be performed through a two-step iteration that involves a weighted least squares update and the solution of a nonlinear equation with respect to a scalar parameter, and which converges rapidly, as illustrated by the computational times and number of iterations reported in the paper. The reported times and number of iterations were computed using a workstation with 24 cores at 2.90 GHz and 80 GB memory running under the CentOS 7 operating system, using one core per data set.
Using various settings we were able to retrieve enough information on the performance of the maximum median BRPL estimators. All our simulation studies illustrate that use of the median BRPL succeeds in achieving median centering in estimation, and results in confidence intervals with good coverage properties. Furthermore, while tests based on the LR suffer from size distortions, the median BRPL ratio statistic results in tests with size and power properties, sometimes better to those of the mean BRPL ratio statistic in Kosmidis et al. 3 The main advantage of the maximum median BRPL estimators from the maximum mean BRPL ones is their equivariance under monotone component-wise parameter transformations, which, in the case of random-effects meta-regression, leads to median bias-reduced standard errors. . Boxplots for the ML, maximum mean BRPL, and maximum median BRPL estimates of 0 , 1 , and as calculated from 10,000 simulated samples under the ML fit using the meat consumption data. 9 The square point is the mean of the estimates obtained from each method. The dashed grey horizontal line is at the parameter value used to generate the data.