Pairwise meta-analysis of aggregate data using metaan in Stata

A few years ago, we developed metaan, a package to perform fixedor random-effects meta-analysis. In terms of random-effects meta-analysis, it offered a wide choice of models, including maximum likelihood, profile likelihood, or restricted maximum-likelihood, in addition to the established DerSimonian–Laird method available in metan or Cochrane’s RevMan software. Other unique features included a wide range of reported heterogeneity measures and a plot of the maximum likelihood function. Since then, metaan has been continuously updated to offer improved graphics, more options, and more meta-analysis models. In this necessary update, we describe these additions and discuss the new models and the evidence behind them.


Introduction
Meta-analysis is a statistical methodology that combines the results of several independent studies, clinical trials or observational, that are considered appropriate to combine (Huque 1988). Traditionally, the term "meta-analysis" refers to pairwise meta-analysis of aggregate data. Pairwise implies a two-arm comparison, usually an intervention and a control arm, as opposed to the more recently developed network meta-analysis methods (Lumley 2002;Salanti, Ades, and Ioannidis 2011). Aggregate implies pooling reported results, usually in research articles, and involves extracting data on effects (or associations) and their variability, as opposed to the more challenging but also more promising individual patient data meta-analysis (Riley, Lambert, and Abo-Zaid 2010). Meta-analysis of aggregate data methodology focuses on the second step of a two-step process, the "pooling" of the extracted summary statistics into a weighted average, with the metaan and metan packages in Stata, dealing with pairwise analyses (Harris et al. 2008; Kontopantelis and Reeves 2010). A new set of commands, meta, is also available in Stata 16, offering a comprehensive approach with advanced meta-analysis methods. In R (R Core Team 2018), the metafor package offers a comprehensive suite of meta-How to appropriately address heterogeneity is another key validity issue. Fixedeffects models will not account for identified heterogeneity, assuming a single "true" effect, and thus limit the generalizability of the findings to the pooled studies. Randomeffects models will incorporate identified heterogeneity by assuming multiple "true" effects (some will assume they are normally distributed as well) and are typically more conservative and allow generalization. Considering that in the presence of heterogeneity, the performance of a fixed-effects approach deteriorates rapidly (Brockwell and Gordon 2001) and that random-effects models are robust even when the "true" effects deviate from normality (Kontopantelis and Reeves 2012a,b), we think that heterogeneity should always be modeled for the results to be generalizable-at any level and not only when above a certain arbitrary threshold (for example, I 2 > 50% as is sometimes the case). Although methods to model heterogeneity are robust, the underlying problem is how to accurately estimate existing heterogeneity. Although homogeneity has been found to be the exception rather than the rule and some degree of "true" effect variability between studies is to be expected (Thompson and Pocock 1991), unobserved heterogeneity is a real problem, especially for meta-analyses of a few studies that tend to be the norm (Kontopantelis, Springate, and Reeves 2013).
To estimate heterogeneity, the standard approach is the method proposed by DerSimonian and Laird (1986), which has been widely implemented and is the default random-metaan effects approach in generic and specialist meta-analysis statistical software. However, the between-study variance component can be estimated using alternative approaches, including iterative or nonparametric methods. These include maximum likelihood, profile likelihood, restricted maximum likelihood (REML) (Hardy and Thompson 1996;Thompson and Sharp 1999), and the nonparametric "permutations" method proposed by Follmann and Proschan (1999). A nonparametric bootstrap of the DerSimonian-Laird (DL) estimator was also shown to be a better performer, especially in small metaanalyses that were falsely assumed to be homogeneous under the standard DL model (Kontopantelis, Springate, and Reeves 2013), and that was recently verified in a more complete and independent simulation comparison (Petropoulou and Mavridis 2017). Because of the difficulty in detecting underlying heterogeneity in small meta-analyses and inaccurate estimates, sensitivity analyses using a range of heterogeneity levels have also been recommended (Kontopantelis, Springate, and Reeves 2013). Small meta-analyses are also more likely to end up with a sample of studies that is not representative, which can result in incorrect estimates of true heterogeneity.
We have implemented all these methods in metaan, offering numerous alternatives to the DL random-effects model. Since the first appearance of metaan, we have been continuously updating it, and the latest version offers improved graphics, dichotomous outcomes analyses, and more meta-analysis models. Besides the standard fixed-effects model, we have also implemented the Mantel-Haenszel and Peto methods for dichotomous outcomes (Greenland and Robins 1985;Yusuf et al. 1985), which have been shown to perform well for rare and very rare events, respectively (Bradburn et al. 2007). The command requires the study-specific estimates and standard errors, or arm-specific event and nonevent counts, for each study to be meta-analyzed. A complementing command, metaeff (Kontopantelis and Reeves 2009a), can be used to calculate the effect size and its standard error from the input parameters supplied by the user for each study, using one of the methods described in the Cochrane Handbook for Systematic Reviews of Interventions (Higgins and Green 2011;Kontopantelis and Reeves 2009b). The metaan command can also perform stratified meta-analysis and meta-analysis of proportions with the Freeman-Tukey arcsine transformation as well as provide elaborate forest plots using the dispgby package and likelihood plots for estimates obtained from the maximum likelihood models. Numerous measures of heterogeneity and their confidence intervals are also reported, including I 2 (0 to 100% range), H 2 (≥ 1), and the between-study variance estimate τ 2 (Higgins and Thompson 2002;Mittlböck and Heinzl 2006). In the following sections, we describe these additions through several examples.

The metaan command 2.1 Updates to metaan
Since metaan first appeared in 2010, we have added numerous features. We felt that these additions necessitated this article to serve as a more comprehensive guide for users, compared with the command's help file. Major additions included the following: • February 25, 2013: new meta-analysis models, bootstrap DL, and sensitivity analysis using preset values for I 2 .
• September 18, 2014: support for back-transforming effects to odds ratios (ORs) for binary outcomes.
• June 20, 2015: major update for forest plot to use program dipgby by Ross Harris and Mike Bradburn (same program used by metan).
• July 25, 2018: added support for binary data with all commonly used methods (Mantel-Haenszel, Peto, or inverse variance), with a new four-variable syntax.
Details on all these additions, along with the original capabilities and options for the command are described in detail below.

Description
The metaan command performs a meta-analysis on a set of studies and estimates the overall effect µ and its confidence interval. The command also displays various heterogeneity measures: Cochrane's Q, I 2 (in the [0, 100%) range with larger scores indicating higher levels of heterogeneity), H 2 (equals 1 in the case of homogeneity), and the between-study variance (τ 2 ) estimate. Cochrane's Q is the same across all methods, but the between-study variance estimate (and hence I 2 and H 2 ) can vary between the DL (dl) and maximum likelihood (ml) methods. Only one method option must be selected, with the exception of the four methods for event data: Mantel-Haenszel OR (mhor), Mantel-Haenszel risk ratio (mhrr), Mantel-Haenszel risk difference (mhrd), and Peto OR (por). Each one of these four approaches for binary data can be combined with the other models for different weighting and random-effects options. For calculating the effects and variance of the effects for a group of studies from various statistical parameters, please see the package metaeff. The command will automatically detect the alpha level in the environment and use it (edit by typing set level 99, for example).
The command requires the user to first specify either two or four variable names. When only two are specified, metaan assumes these refer to study effects and standard errors; when four are specified, the command assumes they represent study-group specific event and nonevent counts for a binary outcome. When the study effects and standard errors are specified, metaan uses the standard fixed-effects model or one of metaan many inverse-variance weighting methods to account for heterogeneity. Binary and continuous outcomes can be implemented using this syntax, provided the effect and its variability are available. In this case, the command expects as input the log() of the measure (OR, risk ratio, or hazard ratio [HR]) in varname1 and its standard error in varname2 (see appendix tables in Kontopantelis and Reeves [2009b] regarding relevant methods, which are available in metaeff). The default output is then the overall log() of the measure unless the user specifies the exp option, which returns the exponentiated results (to ORs, risk ratios, HRs, etc.). The name of the measure can be input with the forest plot option effect(), the default being OR.
When group-specific event and nonevent counts are specified, metaan calls Mantel-Haenszel and Peto fixed-effects meta-analysis methods for binary outcomes. These methods necessarily ignore heterogeneity but have been shown to work well in rare (Mantel-Haenszel) and very rare (Peto) event settings-where heterogeneity is difficult to measure anyway. Three Mantel-Haenszel approaches are provided, in terms of the effect calculation: OR, risk ratio, and risk difference. The Peto "OR" is not really an OR, but it has come to be known as that. For the Mantel-Haenszel approaches, zero-cell corrections are applied. Corrections are not needed for Peto because of the way the effects are calculated. Weightings with these approaches differ from standard inverse-variance weighting and are closer, but not the same, to fixed-effects weighting, where the size of the study is the primary determinant (Peto uses an inverse-variance approach but different weights and a different estimand). Each of these four methods can be combined with any of the inverse-variance models or the fixed-effects approach. In that case, the effect and its variance are calculated using the respective method for event data (one of mhor, mhrr, mhrd, or por), which is then meta-analyzed. This approach allows for modeling heterogeneity, but the weighting can be different, as previously mentioned (these "hybrid" methods are provided in Cochrane's RevMan). The command always meta-analyzes the ORs and risk ratios on the log() scale and, by default, returns results on that scale. Users can specify the exp option to obtain exponentiated results of these (note that risk difference is not a ratio and the log() scale and the exp option do not apply).
For meta-analyses of time-to-event outcomes, the use of adjusted HRs is advised with the two-variable syntax. A reasonable alternative is the calculation of the log HR from a log-rank analysis. The log HR is estimated by where O is the observed number of events on the intervention, E is the log-rank expected number of events on the intervention, O − E is the log-rank statistic, and V is the variance of the log-rank statistic. The Peto method can also be used in this context, because it is O − E based, to pool HRs (or ORs) with option poe, which is equivalent to a fixed-effects approach. Dichotomous data approaches (mhor, mhrr, mhrd, or por) are not advised here even if the follow-up times are identical across studies, because these methods cannot account for censored data.

Syntax
if in , fe dl bdl ml reml pl pe sa poe mhor mhrr mhrd por grpby(varname) label(varlist) varc prp exp backt(varlist) pscl reps(#) seed(#) sens(#) plplot(string) forest Two syntaxes are possible. In the first, the user is expected to provide varname1 and varname2 only, which include information on study-effect sizes and study-effect variation (the default being standard error/deviation), respectively. If the prp option is selected for meta-analysis of proportions, varname1 and varname2 should hold numerators and denominators, respectively. The second syntax is used for binary outcomes with event data, and four variables are expected, with varname1 being the number of events in group 1, varname2 the number of nonevents in group 1, varname3 the number of events in group 2, and varname4 the number of nonevents in group 2.

Options
Meta-analysis model fe fits a fixed-effects model that assumes there is no heterogeneity between the studies. The model assumes that within-study variances may differ but that there is homogeneity of effect size across all studies. Often, the homogeneity assumption is unlikely, and variation in the true effect across studies is to be expected. Therefore, caution is required when using this model. Reported heterogeneity measures are estimated using the inverse-variance weighting and the DL model.
dl fits a DL random-effects model, which is the most commonly used model. The model assumes heterogeneity between the studies; that is, assumes that the true effect can be different for each study. The method assumes that the individual-study true effects are distributed with a variance τ 2 around an overall true effect but makes no assumptions about the form of the distribution of either the within-study or the between-study effects. Inverse-variance weighting is used as opposed to fixed-effects weighting, which is based on study size, and small studies may be given disproportionally large weights. Reported heterogeneity measures are estimated using the DL model.
bdl fits a bootstrap DL model, which is similar to DL but better performing, especially for small meta-analyses. It uses a nonparametric bootstrap to estimate the between-study variance and other heterogeneity parameters. Reported heterogeneity measures are estimated using the bootstrap DL model.
ml fits a maximum-likelihood random-effects model. This model makes the additional assumption (necessary to derive the log-likelihood function and also true for reml and pl below) that both the within-study and the between-study effects are normally metaan distributed. It solves the log-likelihood function iteratively to produce an estimate of the between-study variance. However, the method does not always converge; in some cases, the between-study variance estimate is negative and set to zero, in which case the model is reduced to the fe specification. Estimates are reported as missing in the event of nonconvergence. Reported heterogeneity measures are estimated using the maximum likelihood model.
reml fits an REML random-effects model. This model is similar to ml and uses the same assumptions. The log-likelihood function is maximized iteratively to provide estimates as in ml. However, under reml, only the part of the likelihood function that is location invariant is maximized (that is, maximizing the portion of the likelihood that does not involve µ if estimating τ 2 , and vice versa). The method does not always converge; in some cases, the between-study variance estimate is negative and set to zero, in which case the model is reduced to the fe specification. Estimates are reported as missing in the event of nonconvergence. Reported heterogeneity measures are estimated using the REML model.
pl fits a profile-likelihood random-effects model. This model uses the same likelihood function as ml but accounts for the uncertainty associated with the between-study variance estimate when calculating an overall effect, which is done using nested iterations to converge to a maximum. The confidence intervals provided by the method are asymmetric and hence so is the diamond in the forest plot. However, the method does not always converge. Values that were not computed are reported as missing. Reported heterogeneity measures are estimated using the maximum likelihood model (the effect µ and τ 2 estimates are the same; only the confidence intervals are reestimated) but also provide a confidence interval for the between-study variance estimate.
pe fits a permutations random-effects model. This is a nonparametric random-effects method, which can be described in three steps. First, in line with a null hypothesis that all true study effects are zero and observed effects are due to random variation, a dataset of all possible combinations of observed study outcomes is created by permuting the sign of each observed effect. Then, the DL method is used to compute an overall effect for each combination. Finally, the resulting distribution of overall effect sizes is used to derive a confidence interval for the observed overall effect. The confidence interval provided by the method is asymmetric and hence so is the diamond in the forest plot. Study weights and heterogeneity measures are estimated using the DL model.
sa fits a sensitivity analysis model. This allows sensitivity analyses to be performed by varying the level of heterogeneity, with I 2 taking values in the [0, 100) range. Undetected heterogeneity is the norm rather than the exception, and we encourage users to test the sensitivity of their results in the presence of moderate (50%) and high (80-90%) levels of heterogeneity (please see Kontopantelis, Springate, and Reeves [2013]). Reported heterogeneity measures are based on the preset I 2 level.
poe fits a Peto fixed-effects approach for time-to-event O −E (observed minus expected) data. By default, the log() of HRs is assumed to be provided for meta-analysis, which can be exponentiated to HRs with the exp option. ORs can also be meta-analyzed [again, expecting log(OR) as input and the variance of the effect]; in this case, if a forest plot is requested, the outcome label should be edited with the effect() option. The method is identical to fe and is provided only for completeness and clarity.
Binary outcome data with group-specific event and nonevent counts mhor fits a Mantel-Haenszel fixed-effects model with effect calculation based on OR.
It can be combined with inverse-variance models to account for heterogeneity or a fixed-effects model, for different weighting approaches. If a random-effects or fixed-effects approach is used in combination with the mhor option, weighting follows the random-effects or fixed-effects approach used, and Mantel-Haenszel is relevant only in the way the effect is calculated within each study from events and nonevents.
The exp option needs to be used to report effects as ORs and not on the log() scale.
mhrr fits a Mantel-Haenszel fixed-effects model with effect calculation based on risk ratio (RR). It can be combined with inverse-variance models to account for heterogeneity or a fixed-effects model, for different weighting approaches. If a random-effects or fixed-effects approach is used in combination with the mhrr option, weighting follows the random-effects or fixed-effects approach used, and Mantel-Haenszel is relevant only in the way the effect is calculated within each study from events and nonevents.
The exp option needs to be used to report effects as RRs and not on the log() scale.
mhrd fits a Mantel-Haenszel fixed-effects model with effect calculation based on riskdifference. It can be combined with inverse-variance models to account for heterogeneity or a fixed-effects model for different weighting approaches. If a randomeffects or fixed-effects approach is used in combination with the mhrd option, weighting follows the random-effects or fixed-effects approach used, and Mantel-Haenszel is relevant only in the way the effect is calculated within each study from events and nonevents.
por fits a Peto OR fixed-effects model. It can be combined with inverse-variance models to account for heterogeneity or a fixed-effects model for different weighting approaches. In that case, the por option is relevant only to the effect calculation (which is technically not an OR, but the name has stuck). The exp option needs to be used to report effects as ORs and not on the log() scale.
General modeling options grpby(varname) specifies the grouping variable for subgroup analyses. An integer numeric variable is expected, ideally with appropriate value labels; see [D] label. The groups will be ordered according to the variable provided, and headers for the results and the forest plot (if requested) will be obtained from the variable's value labels (if no labels are present, the relevant numbers will be used). All analyses are repeated for each group category and overall. Results are presented as separate analyses in the Results window but are all aggregated into a single forest plot (if one is requested).
label(varlist) selects labels for the studies. One or two variables can be selected and converted to strings. If two variables are selected, they will be separated by a comma. Usually, the author names and the year of study are selected as labels. The final string is truncated to 20 characters.
varc specifies that the study-effect variation variable, varname2, holds variance values. If this option is omitted, metaan assumes the variable contains standard-error values (the default).
prp specifies that numerators (varname1) and denominators (varname2) are provided and a meta-analysis of proportions will be executed. The Freeman-Tukey arcsine transformation is used, variance is calculated as 1/(varname2 + 1), and effects and confidence intervals (study and overall) are back-transformed using {sin(x/2)} 2 .
exp specifies that the results will be exponentiated for dichotomous outcomes. For dichotomous outcomes, the log() of the measure (OR, RR, or HR) and its standard error are expected as input and the overall log() of the measure is returned by default unless this option is specified. If it is, the input is still expected to be the log() of the measure in varname1, but results are exponentiated.
backt(varlist) specifies the back-transformation to percentages, from simple or empirical logit (Stevens et al. 2016). One or two variables can be selected. The first holds the study "anchor" percentages, which are essential for the back-transformation to take place. The second holds the relevant denominators. If only percentages are provided, they need to be in the (0, 1) range for back-transformation from simple empirical logit. However, if any zeros or ones are present in the percentages variable, the empirical logit will need to be used, and in this case the denominators are needed as well. The overall effect is back-transformed using the mean percentage across all studies as the "anchor". If denominators are provided, then the mean percentage is weighted on denominators; if not, it is unweighted.
pscl scales to [0%-100%] when using the prp or backt() option. The default display for percentages in the output and forest plot is in the [0, 1] range.

Bootstrap DerSimonian-Laird options
reps(#) specifies the integer number of repetitions for the bootstrap DL method. Fewer than 100 repetitions are not permitted.
seed(#) specifies the random-number seed to be used in the bootstrap DL method.

Sensitivity analysis options
sens(#) specifies the preset heterogeneity level, with I 2 taking values in the [0, 100) range. The default is sens(80).

Graphs
Only one graph output is allowed in each execution.
plplot(string) requests a plot of the likelihood function for the µ or τ 2 estimates of the ml, pl, or reml model. The plplot(mu) option fixes µ to its model estimate in the likelihood function and creates a two-way plot of τ 2 versus the likelihood function.
The plplot(tsq) option fixes τ 2 to its model estimate in the likelihood function and creates a two-way plot of µ versus the likelihood function.
forest requests a forest plot. The weights from the specified analysis are used for plotting symbol sizes. Since version 2.0 of the command, we use the dispgby program by Ross Harris and Mike Bradburn, which is integrated with other popular meta-analysis commands (for example, metan). We allow all relevant options (see below).

Forest plot options
Various graph options can be used to specify overall graph options that would appear at the end of a twoway command. This allows the addition of titles, subtitles, captions, etc.; it also allows control of margins, plot regions, graph size, aspect ratio, and the use of schemes. Specific options are listed below. dp(#) specifies the decimal points for the reported effects. The default is dp(2). effect(string) allows the graph to name the summary statistic used [for example, OR, RR, standardized mean difference (SMD)].
favours(string # string) applies a label saying something about the treatment effect to either side of the graph (strings are separated by the # symbol).
null(#) displays the null line at a user-defined value rather than 0 or 1.
nulloff removes the null hypothesis line from the graph.
noverall prevents display of overall effect size on the graph (automatically enforces the nowt option).
nowt prevents display of study weights on the graph.
nostats prevents display of study statistics on the graph.
nowarning switches off the default display of a note warning that studies are weighted from random-effects analyses.
nohet prevents display of heterogeneity statistics in the graph.
nobox prevents a weighted box from being drawn for each study, and only markers for point estimates are shown. metaan boxsca(#) controls box scaling. The default is boxsca(100) (as in a percentage) and may be increased or decreased as such (for example, 80 or 120 for 20% smaller or larger, respectively).
xlabel(numlist) defines x-axis labels. Any number of points may be defined, and the range can be enforced with the use of the force option. Points must be comma separated.
xtick(numlist) adds tick marks to the x axis. Points must be comma separated.
force forces the x-axis scale to be in the range specified by xlabel().
textsize(#) specifies font size for the text display on the graph. The default is textsize(100) (as in a percentage) and may be increased or decreased as such (for example, 80 or 120 for 20% smaller or larger, respectively).
astext(#) specifies the percentage of the graph to be taken up by text. The default is astext (50), and the percentage must be in the range [10,90].
summaryonly shows only summary estimates in the graph.
classic specifies that solid black boxes without point estimate markers are used as in previous versions.
lcols (varlist) and rcols(varlist) define columns of additional data to the left or right of the graph. The first two columns on the right are automatically set to effect size and weight unless suppressed using the options nostats and nowt. textsize() can be used to fine-tune the size of the text to achieve a satisfactory appearance. The columns are labeled with the variable label or the variable name if the label is not defined. The first variable specified in lcols() is assumed to be the study identifier, and this is used in the table output.
double allows variables specified in lcols() and rcols() to run over two lines in the plot. This may be of use if long strings are to be used.
boxopt(options), diamopt(options), pointopt(options), ciopt(options), and olineopt(options) specify options for the graph routines within the program, allowing the user to alter the appearance of the graph. Any options associated with a particular graph command may be used except some that would cause incorrect graph appearance.

Stored results
metaan stores the following in r(): Scalars r(eff) overall effect size r(effvar) variance of the overall effect r(efflo) lower confidence interval for the overall effect r(effup) upper confidence interval for the overall effect r(Q) Cochrane's Q statistic r(df) degrees of freedom r(Qpval) p-value for Cochrane's Q statistic r(Isq) I 2 statistic r(Isq lo) lower confidence interval for I 2 statistic r(Isq upper) upper confidence interval for I 2 statistic r(Hsq) H 2 statistic r(Hsq lo) lower confidence interval for H 2 statistic r(Hsq upper) upper confidence interval for H 2 statistic r(tausq dl) between-study variance estimated using the DL model If random-effects models other than DL are requested, model-specific outputs are also obtained.

Scalars r(tausq bdl) between-study variance estimated with bootstrap DL r(tausq ml)
between-study variance estimated with maximum likelihood r(conv ml) convergence of the maximum likelihood model r(tausq reml) between-study variance estimate ( τ 2 ) using REML model r (conv reml) convergence of the REML model r(tausq pl) between-study variance estimate ( τ 2 ) using the profile likelihood r(tausqlo pl) lower confidence interval for τ 2 with profile likelihood r(tausqup pl) upper confidence interval for τ 2 with profile likelihood r(cloeff pl) convergence for lower confidence interval of overall effect using the profile likelihood model r(cupeff pl) convergence for upper confidence interval of overall effect using the profile likelihood model r(ctausqlo pl) convergence for lower confidence interval of τ 2 with profile likelihood r(ctausqup pl) convergence for upper confidence interval of τ 2 with profile likelihood r(exec pe) permutations method successfully executed r(tausq sa) assumed τ 2 using a sensitivity analysis

Examples
We provide a few examples with the two-and four-variable syntaxes to give users a better understanding of the required data and the analysis options.

Meta-analysis based on effects and their standard errors
Let us assume an artificial dataset of 11 studies with information on a particular intervention, for example, a specific psychological intervention to reduce depression levels in populations diagnosed with severe depression in primary care. Let us also assume that various scales to quantify depression levels have been used across these studies and we have collected data on the mean difference in depression levels between the metaan intervention and control group, six months following the intervention. In other words, for study i, we have the mean difference MD i = M N a6 Ai − M N a6 Bi , where M N a6 Ai and M N a6 Bi are the mean depression levels in study i, six months postintervention, for the intervention (A) and control (B) groups, respectively. Assuming the studies are randomized controlled trials, as they often are, available preintervention data points are not used because the groups should be well balanced. If there is no balance in the outcome preintervention, as in many observational studies, then the last available preintervention time point needs to be accounted for. For example, the mean difference would be where M N b0 Ai and M N b0 Bi are the mean depression levels in study i, just before the administering of the intervention, for the intervention (A) and control (B) groups, respectively. However, severely unbalanced group comparisons can be problematic for the validity of such analyses. Getting back to our example, we assume good preintervention group balance, but the outcome is measured across numerous scales (for example, BDI, CES-D, or PHQ-9). Therefore, we would need to calculate and meta-analyze the SMD for each study as We use the pooled variance in the postintervention time point that is the focus of our analysis (in this case, six months), with SDa6D Ai and SDa6 Bi denoting the standard deviation (SD) in study i, six months postintervention, for the intervention (A) and control (B) groups, respectively. In our example dataset below, variable eff is assumed to hold the SMD values, and variable SEeff the pooled SD values, or SDa6 2 A + SDa6 2 B .
. First, we meta-analyze studies using a random-effects model and bootstrap DL estimator, with 1,000 repetitions.
. metaan eff SEeff, bdl reps(1000) seed (16)  High levels of heterogeneity were observed and modeled. The overall effect is not statistically different from zero. Therefore, the conclusion would be that the overall effect of this hypothesized specific psychological intervention for severe depression is not statistically different from zero, although there was great variability in its effectiveness across included studies. Figure 1 shows the requested forest plot.  The heterogeneity estimates are similar to what we observed with the bootstrap DL model, as is the overall effect. Figure 2 shows the requested likelihood plot for µ. The profile likelihood method converged for all estimates and their confidence intervals.

Meta-analysis using event and nonevent counts
Let us assume that we are interested in treatment for heart attacks, particularly primary percutaneous coronary intervention. We hypothesize that the outcome of interest is five-year postoperative mortality and that there are no censored values (all patients are followed up for five years or until death). Our aim is to evaluate the role of a new blood-thinning medication, so we use a second dataset with events and nonevents for the binary outcome. Variable Ine is the number of operated cases who were administered the medication and survived, and Ie the number of deaths in that group. Similarly, variable Cne is the number of operated cases who were not administered the medication and survived, and Ce the number of deaths in that group.
First, we meta-analyze studies using a Peto OR fixed-effects model. Results are exponentiated, and a forest plot is requested. We used the Peto fixed-effects approach (with Peto weighting), although heterogeneity was identified (under an inverse-variance approach). Assuming we are analyzing adverse events, the blood-thinning agent appears to have a modest prophylactic effect on five-year survival, which is also statistically significant. Figure 3 shows the requested forest plot.  Figure 3. Peto fixed-effects model Next, we account for heterogeneity using the bootstrap DL random-effects model with inverse-variance weighting. In the first step, we use the Mantel-Haenszel risk-ratio approach to calculate the effect size and its variance for each study. When we account for heterogeneity, the overall effect is closer to 1.0 and not statistically significant. This is not surprising considering random-effects models are generally more conservative from incorporating between-study variability. Finally, we wish to examine how results would change if this sample of studies underrepresented actual heterogeneity. For this, we use the sensitivity analysis model with a preset heterogeneity level of 70% (the upper limit of I 2 in the previous model, though other values may be chosen).
. metaan Ie Ine Ce Cne, mhrr exp sa sens (70)  Under this higher heterogeneity assumption, the overall effect is even more slightly reduced, though the greater impact is a wider confidence interval. This approach can be useful in evaluating how sensitive analyses are to undetected or underestimated heterogeneity, something rather common in small meta-analyses in particular.

Discussion
The metaan package for Stata was first published in 2010. Since then, we have incorporated numerous additions, particularly additional features for conducting random-effects meta-analysis. It is now an extensive meta-analysis package for binary, continuous, or prevalence (proportions) data, allowing numerous options and offering attractive graphical outputs. In this article, we have described and demonstrated many of the package's capabilities and have highlighted the methodological evidence behind the implemented methods. Currently, unique to metaan are some of the best performing random-effects models like the nonparametric bootstrap DL, allowing users to potentially obtain more reliable results. Unique features also include the profile likelihood plot and the backtransformation options, while the heterogeneity sensitivity analysis was first presented in metaan and is now available in the built-in meta command from Stata 16. We need to highlight that meta, Stata's new and comprehensive suite of meta-analysis commands, will be suitable for the needs of most users and includes advanced methods; for example, it does not use the DL estimator for its default random-effects model but instead uses restricted maximum-likelihood. However, metaan includes some unique features that would still make an attractive alternative, like the best-performing nonparametric bootstrap DL (Petropoulou and Mavridis 2017). In addition, it will serve well users of Stata 15 or previous, in which the meta suite of commands is not available. We will continue developing and updating metaan hand in hand with emerging best-practice evidence. The latest version is available from http: // www.statanalysis.co.uk / .

Funding
This study was partially funded by the National Institute of Health Research School for Primary Care Research through a personal fellowship for Evangelos Kontopantelis. The study represents independent research by the National Institute of Health Research. The views expressed in this publication are those of the authors.