Evaluating Legislative Districts Using Measures of Partisan Bias and Simulations

A well-developed body of research offers ways to measure the partisan advantages that result from legislative districting. Although useful to researchers and legal practitioners, those studies also suffer from theoretical, empirical, and legal limitations. In this essay, we review measures of partisan bias and methods for both simulating election results under existing maps and simulating hypothetical maps. We start by describing the concept of partisan bias and how it has been measured. Then, we turn to new simulation methods that generate hypothetical election results or districts to judge the fairness of a map. While both kinds of evaluation are useful, we point to some unresolved questions and areas for future research.


The Concept of Partisan Bias
At the root of evaluating the fairness of a set of legislative districts is whether the map in question is biased in favor of one political party or the other. Foundational work by Tufte (1973) and others helped to distinguish two key elements of a single member districting system. First is responsiveness, which is how much a change in seat shares results from a change in vote shares. This is also sometimes called the "swing ratio," but that term implies a constant relationship between votes and seats (Niemi & Fett, 1986). Second is bias, which is the inherent advantage for one party due to asymmetry in the responsiveness curve.
We shall address responsiveness only briefly, focusing mainly on measures of partisan bias. The focus on bias is in part because symmetry is an easier standard to accept than a specific amount or form of responsiveness. For example, a one-to-one relationship between votes and seats would indicate responsiveness that is proportional. Although proportional representation might have some normatively appealing properties, it is not likely to be realized in the United States. The traditional use of single member simple plurality across federal and most state legislative elections usually results in the winning party gaining disproportionately greater seats. Congress and the courts have made clear that proportional representation is not mandated by law and empirical studies typically find much bigger swings in seats than a proportional response would generate. Even if this issue is set aside, judging responsiveness requires the analyst to make assumptions about exactly how swings in the vote operate across districts. Many analysts simply assume that the swing is uniform across districts so that they all move in parallel as the statewide vote shifts toward one party or the other. Yet it is possible that some districts are more responsive than others. This raises the broader question of whether the right unit of analysis is the district or the state.
It is also possible that bias and responsiveness are connected in systematic ways. Although Stephanopoulos and McGhee (2018) contend that their preferred measures of partisan bias are highly stable across elections, Goedert (2015) shows that partisan bias varies substantially between subsequent elections using the same districts. Maps designed to protect a party in one political environment can fail when the environment changes. McGann et al. (2016) show that maximum responsiveness is often not at the point of a 50-50 split in the partisan vote as many analysts would assume. In addition, Katz et al. (2020) suggest that partisan bias generally declines as responsiveness increases. These findings indicate that bias and responsiveness might interact rather than being independent. We avoid this question for now but highlight it as an important topic for further study.
A series of articles in the 1980s and 1990s established methods for estimating parameters for both responsiveness and partisan bias based on a seats-votes curve (Gelman & King, 1994b;King & Browning, 1987). The basic form of this model defines S as the proportion of seats and V as the proportion of votes within a state. The seats-votes curve relationship is defined with the equation S/(1 − S) = β[V/(1 − V)] ρ , where the parameter β estimates bias and the parameter ρ estimates responsiveness. In this article, we limit our focus to the two-party vote between the Democrats and Republicans without consideration for minor parties or independent candidates. Models for multiparty elections are possible (e.g., King, 1990).
The seats-votes curve is created by simulating changes in the vote for the two parties to observe how the division of seats between the parties varies as the statewide vote shifts toward one party or the other. This approach, which is still in wide use, investigates partisan bias using the concept of symmetry. Symmetry occurs when a set of districts awards a party a specific number of seats for earning a specific share of the vote, regardless of which party it is. For example, if Democrats win 55% of the vote in a given election but take 60% of the seats, then Republicans should have also gotten 60% of the seats if they had won 55% of the vote. This fundamental idea that parties should be rewarded to equal degrees for earning equal numbers of votes has generally been adopted by researchers as a theoretical standard for judging the partisan fairness of legislative map. As this example reminds us, a seats-votes curve that meets the symmetry standard need not result in proportional representation or any other specific functional form other than treating the two parties equally (Nagle, 2017).
Starting in the 1990s, scholars became more focused on the role of race in redistricting. This new focus came in response to U.S. Supreme Court cases, gerrymandering efforts in the 1990 and 2000 cycles that used racial groups in strategic fashion, and normative debates about symbolic and descriptive representation. The political community and broader public have since developed a renewed interest in the partisan consequences of legislative maps as a response to recent events including the cases that left open the door for more partisan gerrymandering, aggressive efforts by Republicans via the REDMAP initiative to take advantage of success in the 2010 elections to skew maps in their favor, commission-based redistricting reforms in states such as Arizona and California, and high-profile federal and state lawsuits over maps in Florida, Maryland, Michigan, North Carolina, Pennsylvania, and Wisconsin. Since the 2010 round of redistricting, scholars and other interested parties have proposed a plethora of new measures to assess partisan advantage in legislative maps.

Measures of Partisan Bias
Here, we provide descriptions of the most prominent measures of partisan bias, describing briefly how they are formulated and what concerns have been raised about them. One thing all the measures have in common is that they purport to capture both the direction of the bias (for either the Democrats or the Republicans) and the magnitude of the bias, two important considerations for understanding whether a set of districts deviates from the symmetry standard.
First, the efficiency gap (EG) seeks to measure the relative number of votes for Democratic and Republican candidates that are "wasted." Wasted votes are those that do not contribute to the election of a legislator, either because they fall short of or a majority or go above the majority level needed for election. These are theoretically important sources of inefficiency because they may be the result of "packing" of supporters of the disfavored party into a small number of districts or "cracking" of across many districts to dilute their influence. The efficiency of each party's vote is the total number of its votes that are wasted by going to losing candidates or contributing to winning candidates beyond winning 50% of the vote. The measure is calculated as the difference in wasted votes between the two parties divided by the total number of votes cast.
Conveniently, proponents of the EG contend that it has an intuitive interpretation as the percentage of seats a party won above what would be expected in a neutral map with no partisan bias. The EG's defenders have offered a vigorous defense of its superiority over other measures for the specific purpose of detecting a partisan asymmetry (Stephanopoulos & McGhee, 2018).
A second prominent measure of partisan bias is the meanmedian gap, which captures the skewness in district outcomes. Skew is indicated by an asymmetric distribution with right and left sides that are not mirror images of one another. The measure assumes that the true support for a party can be assessed using its statewide vote share, which happens to be the same as the mean vote for the party across all districts. If a legislative map discriminates against a party, then the median district will be more favorable to the opposition than is the statewide mean. The measure's computation is even simpler than the EG. After arranging districts in order by vote share for a party, one subtracts the percentage for the vote for the party in the mean district from that in the median district. Although not always accurate, the difference between the mean and median of a distribution is often used as an indicator of its skew. If the mean and median are equal, the gap is zero, and the distribution is presumed to be symmetric. (It is possible for the mean to be equal to the median even when the distribution is asymmetric.) The mean-median gap is thus seen as a meaningful indicator because the goal of intentional gerrymandering is to allow a party to win more districts than the vote alone would imply. This happens because the votes for the disfavored party carry less weight in determining the election outcome.
The mean-median gap has a particularly nice substantive interpretation: How much improvement (swing) above 50% is needed by the losing party to break even. For example, a pro-Democratic mean-median gap of 4.8% would indicate that Republicans need to win 54.8% of the statewide vote to win half the seats. Particular advocates of the measure are McDonald and Best (2015), who promote it for being transparent, simple, and reliant on observed facts rather than hypotheticals.
A third, and closely related, measure is the lopsided margins test, which focuses on asymmetry in victory margins for the two parties. The test is based on this idea: If the controlling party has packed the opposition, the disadvantaged party should have higher margins of victory (Wang, 2016a). Like the EG, smaller margins of victory are seen as advantageous because they result in a larger number of efficient wins for the advantaged party. This measure compares the mean margins of victory in districts won by Democrats versus Republicans. Wang (2016a) recommends a t test to evaluate the statistical significance of the difference.
Fourth, declination reveals whether there is a discontinuity in district vote shares between those where one party wins to those where the other party wins. Advocated by Warrington (2018), the measure uses geometry to identify whether Republican and Democratic districts are created with different concentrations of supporters. Without intentional gerrymandering, this approach assumes a relatively smooth increase in the share of votes going to party if the districts were lined up from most Democratic to most Republican. However, when one party takes advantage of the redistricting process, it should pack supporters of the opposing party in a small number of districts where the vote share for the opposing party jumps well above 50%. The measure thus compares results from below the 50-50 threshold where the parties fare equally to results above that threshold. The measure compares the angle of the line segment connecting the median district won by a Democrat with the district where the vote is divided equally (the 50-50 point) to the line segment connecting the median district won by a Republican with that same 50-50 point. If the system treats the two parties equally, the lines should have an equal slope. The degree to which the second segment is angled away from the first segment is declination, a measure of partisan asymmetry.
Other measures continue to appear in the literature. For example, Nagle (2015) suggests that bias is revealed by the area between the existing seats-votes curve and the curve when it has been symmetrically inverted. This is an extension of the idea of partisan symmetry, the concept that both parties should benefit to the same degree if they were to win equal shares of the vote. It is clever but also faces the question of how to precisely estimate a smooth curve so that areas are highly exact.
The measures we have highlighted here sometime yield conflicting conclusions, but most of them are also closely related in practical applications, especially in settings with highly competitive parties. There are in fact tight mathematical relationships between the EG and median-median gap (Stephanopoulos & McGhee, 2018), which means the strengths and weaknesses of the two measures are similar. As Cover (2018) establishes, the simplified form of the EG is always equal to the difference between the parties' average margins of victory-at least if two small tweaks are made to the latter measure. Nagle (2015) and others have demonstrated that the mean-median difference is equal to partisan bias divided by the slope of the seat-vote curve. In real data, the measures are highly correlated (Stephanopoulos & McGhee, 2018;Warrington, 2019).
To get a sense of the similarity and differences of these measures, Table 1 compares estimates of partisan bias (King & Browning, 1987) with the EG and mean-median gap measures across 40 available state house plans from 2012, taken from the website PlanScore.org. All three measures are coded such that positive numbers indicate a plan is biased toward Republicans and negative numbers indicate a plan favors Democrats. Although the three measures differ in their scales and interpretations, they usually agree in their assessment of the direction and degree of a plan's partisan advantage. The mean-median gap measure has a strong correlation with estimates of partisan bias (.91), but the EG has a slightly lower correlation with both the mean-median gap (.74) and the partisan bias estimate (.70). And for only 15% of the state plans do these measures conflict in their assessment of the direction of bias. As noted, conflicts among these measures seem to correspond to states that lack uniform party competition across the state. The EG measure places New York as strongly biased in favor of Republicans, but the other two measures do not. In contrast, Tennessee shows a high partisan bias score, but a much smaller value for EG. Rhode Island ranks as strongly favoring Republicans by partisan bias estimates, but largely favoring Democrats by the EG. So while the measures are in agreement in the large majority of cases, research should continue to interrogate the settings where they lead to different assessments.

Concerns About the Measures
A common concern about the measures is that they are essentially providing circumstantial evidence. None of these measures on their own can demonstrate why partisan bias appears. Although favoritism in a legislative map might result from intentional partisan discrimination, it is widely understood that bias might also occur unintentionally because of the underlying residential patterns of Democratic and Republican voters (e.g., Chen & Rodden, 2013). Each measure needs a reference point to make these kinds of determinations. The reference point might be the same measure prior to redistricting to show a before-and-after effect (see McGann et al., 2016), other states with similar geographic features, or simulations of counterfactual legislative elections, which we discus below. Thus, the presence of bias and its causes are theoretically and empirically distinct. All the measures must reckon with uncontested races where either the Democrats or Republicans failed to field candidates. At a theoretical level, uncontested races might actually reveal something about gerrymandering. For example, if a map advantages the Democrats by packing Republicans heavily into a small number of districts, then Democrats might not bother running candidates in the packed districts because they are seen nearly guaranteed wins for the opposition. As a result, the advantaged party might ironically run fewer total candidates. At a practical level, the measures cannot be computed in uncontested districts, at least not in the same way they are in contested districts. Wang (2016b) reports that the presence of uncontested districts reduces the magnitude of the mean-median gap. Yet Stephanopoulos and McGhee (2015) contend that either dropping uncontested races or assuming a 0% to 100% split is inappropriate. To rectify this problem, researchers often impute the expected vote shares that the two parties' candidates would receive had they both fielded candidates. Some have simply imputed a default split of 25% to 75% (rather than 0% to 100%) in favor of the party that runs a candidate (Gelman & King, 1994b;Wang, 2016a). Others have turned toward statistical modeling to impute values, often using the vote for a higher level office to forecast missing values (Stephanopoulos & McGhee 2015). McGann et al. (2016) impute values using a regression model based on average of presidential elections in the district for the decade that the districts are in place. Jackman (2018) imputes both the expected Democratic vote share and turnout and allows the uncertainty of these predictions to be reflected in his estimates of the EG. Indeed, turnout variation across districts may also be a consequence of gerrymandering as a result of creating uncompetitive seats (McDonald & Best, 2015), a point we highlight below.
There is some disagreement, both legally and theoretically, about whether partisan bias is better measured in individual districts or in statewide maps. McDonald et al. (2018) discuss this as a distinction between an "exclusion" standard, which is due to vote dilution, and an "entrenchment" standard, which is due to unequal weighting of votes. They conclude that both types of measures have value. Aside from the EG, the statewide measures often have benefit of being known before maps are used ). Grofman (2019) has made the most explicit attempt to make logical connections between measures of at the district and statewide levels.
A related question, especially for the EG, is whether partisan bias should be measured using the outcomes of the legislative elections themselves or with respect to another statewide office such as governor. The EG is sensitive to small changes in vote in highly competitive districts because it relies on them to distinguish seats won, but the meanmedian is more robust to the choice of election used (Krasno et al., 2019). One benefit of using an "exogenous" office is that it avoids the problem of uncontested races that we just highlighted. It thus provides a common baseline for evaluating voter preferences across the state. But Krasno et al. (2019) have questioned how researchers are to identify the proper statewide elections to use as a baseline for measuring bias. The choice of which race to use might merely be a nuisance because it seems arbitrary and each race generates different values for the EG. However, having a set of statewide elections to choose from might actually be valuable for examining counterfactuals if Democrats win some races and the opposition party wins others. This would allow for observation of results with different degrees of swing in the vote. The EG is sensitive to small changes in vote in highly competitive districts because it relies on them to distinguish seats won (Campisi et al., 2019), but the mean-median is more robust to the choice of election used (Krasno et al. 2019).
However, Stephanopoulos and McGhee (2018) are emphatic that "it is poor methodological form to analyze plans with exogenous election results" "for offices that are unrelated to the map in question" (1544-5). Niemi and Deegan (1978) were agnostic on this point but suggested measuring the "normal vote" (Converse, 1966), an old concept capturing the expected vote that is presumably latent and thus neither from an "unrelated" office nor contaminated by candidate choices that are endogenous to the districts.
The equation for the most common version of the EG is simple to compute because it assumes equal population and equal turnout of districts. It has been criticized as misleading in cases where turnout varies across districts (Veomett, 2018), but advocates contend that it can handle those variations using the more complex original formulation that relies on raw votes rather than percentages (McGhee, 2014).
The simplified equation is the seat margin for a party minus two times the vote margin for that party. Both of these margin variables are calculated as the percentages for one party (usually the Democrats) minus 50%. Some scholars criticize the measure for imposing a swing ratio of exactly two, as requiring this form of "responsiveness" is artificial and constraining. For example, Grofman and Cervas (2018) are opposed to the EG's required slope of two between seats and votes at 50-50 mark where other measures of partisan symmetry allow more flexible relationships. As they note, the well-known "cube law" relating seats to votes implies a slope of about three at the crossover point. Empirical estimates of the seats-votes curve indicate that the slope appeared to be stable close to two in congressional elections from 1952 to 1982 (Jacobson, 1987), but analysis of a longer time frame suggests that the slope is not constant over time and has in fact flattened in recent years (Ansolabehere et al., 1992;Kastellec et al., 2008).
A second complaint about the EG is in its binary distinction of votes as either wasted or not wasted. In particular, the measure assumes that votes cast for the losing candidate are of equal importance as are excess votes for the winner. Challenging this view, one might reason, for example, that while winning seats with 50% + 1 votes is highly efficient for a party, it is also quite risky because it allows chance to be a more important factor in selecting the winner. A shrewd political party controlling the redistricting process might instead prefer to give many of its members districts with expected vote shares around 55% or 60%, which lowers the EG but actually helps the party maintain its majority. However, McGhee (2017) emphasizes that the EG is designed to measure efficiency in the use of votes to win seats, not other criteria such as entrenchment or competitiveness. Some critics have argued excess votes for the winner should be defined relative to the loser's vote share rather than the 50% baseline (e.g., Cover, 2018). Another view is that efficiency should be based only on votes for the loser, as they suffer the most from the system . In addition, equating excess votes for the winner and all the votes for the loser constrains the range values that the EG may take (Cho, 2017). Recent studies have also highlighted other areas of concern. Baas and McAuliffe (2017) complain that existing measures have flawed methods for assessing statistical significance or have no statistical modeling at all behind them. Katz et al. (2020) contend that prominent measures such as the EG and meanmedian split are not in fact measures of partisan asymmetry at all. It seems likely that neither researchers nor practitioners will ever settle on a single measure as the definitive indicator of partisan gerrymandering, but that does not mean the partisan bias is undetectable. Rather, discontent is likely to lead to development of new measures, further demonstration of existing measures, and triangulation by multiple indicators as is routinely done in other areas of inquiry such as education and criminal justice.

Future Research on Measures of Partisan Bias
In June 2019, a five-four majority on the U.S. Supreme Court declared in a pair of federal cases out of Maryland (Lamone v. Benisek) and North Carolina (Rucho v. Common Cause) that partisan gerrymandering is a nonjusticiable issue, a move that appears to halt the use of partisan bias measures in federal litigation. However, state-level courts in states including North Carolina and Pennsylvania have recently struck down legislative maps by relying in part on scholarly measures of partisan bias. Understanding and improving such measures is thus essential both to academic understanding how the redistricting process works and because of the potential legal weight the measures may carry. In addition, such measures likely to inform questions that courts have considered about how partisan bias may affect the associational rights of candidates, volunteers, and campaign contributors.
Beyond resolving the concerns and disagreements highlighted above, there are several promising frontiers for future research on partisan bias. As noted in our discussion about the measures, researchers are still conflicted about how to handle uncontested races. Some view districts with only one candidate on the ballot as a methodological nuisance to either ignore or impute, but others would take the lack of opposition as a substantive consequence of gerrymandering. For example, extensive packing should discourage candidates from the disadvantaged party from running. Yet some states with highly gerrymandered maps also see much contestation. A future area of research might examine not just whether candidates run but their "quality" or fitness for office, characteristics that could be endogenous to the design of districts. A party might field a candidate in a district where it is a small minority but only be able to run an amateur with little chance of victory rather than a more professional nominee who would have a better shot of winning. In other words, strategic candidate entry, which affects vote outcomes, should be an explicit consideration in judging the fairness of outcomes.
Measures of bias and responsiveness typically assume traditional single member districts with plurality rules for determining winners, or what is often called first-past-thepost. But we need to know more about gerrymandering in other settings where the electoral structures are different. Multimember districts, used in about a dozen state legislative chambers today and much more in the past, should get specific attention. Studies of gerrymandering in state legislatures generally make no mention of this fact, leaving a mystery as to how multimember districts in states such as Maryland and New Hampshire are handled. Ranked choice voting, already in use in several cities and statewide in Maine, also deserves specific analysis to understand how measures of partisan bias operate when voters rank multiple candidates rather than choosing either a Democrat or a Republican on the ballot. Finally, "top-two" primaries sometimes create situations where both general election candidates are from the same party. This situation needs to be scrutinized differently than traditional general election contests where candidates are from opposing parties. States such as California are seeing more general elections in which both candidates are Democratic. Such elections may distort the input data used to create measure of partisan bias and force researchers to think more about what the presence of two candidates of the same party indicates about partisan bias.
As we briefly noted above, legislative maps may produce both a conventional gerrymandering bias and a turnout bias that also favors one party over the other ; see also J. Campbell 1996). For example, packing high turnout supporters of an opposing party into a small number of districts dilutes their influence both because it wastes excess votes for the winner and because it allows the advantaged party to win other districts with a small number of votes cast. Earlier studies outlined methods for estimating partisan bias due to wasted votes as well as differentials in turnout and apportionment (Grofman et al., 1997). The two may be correlated, as when partisan packing makes a district uncompetitive and results in lower turnout there. Researchers should pursue comprehensive models that estimate both factors and allow them to interact.
As noted at the outset of our essay, responsiveness and bias are viewed as distinct, but they are in fact aspects of the same maps. Yet scholars do not know much about how they interact. Although researchers such as McGann et al. (2016) suggest how the two might work in tandem to foster protection of incumbents, we think this connection also deserves more attention from scholars as a matter of normative, empirical, and legal relevance.
Although racial and ethnic factors in redistricting continue to be important and unresolved, the purpose of this essay is to examine methods for assessing undue partisanship in legislative districts. Nonetheless, scholars would be wise to revisit some debates from the 1980s and 1990s to consider how the measures of bias might illuminate gerrymandering based and race and ethnicity rather than (or in addition to) party. In contemporary politics, it is difficult empirically to separate how racial and partisan considerations influence redistricting (Hasen, 2018), but is essential to do so considering the different legal standards (e.g., Voting Rights Act) that apply to racial rather than partisan discrimination.

Election Simulations
Measures of partisan bias apply to existing maps and election results, but they can also apply to computer-generated scenarios to provide baselines for evaluating real outcomes. We distinguish between two kinds of simulations. In this first section, we describe computational simulations that are used to evaluate existing districts. These simulations generate hypothetical results for the purposes of evaluating counterfactual scenarios. Even within this category of simulations, we will see the differences in the types of goals and approaches that researchers emphasize.
What we call election simulations are studies that take the characteristics of a set of districts as given. The method then uses these characteristics to simulate from a hypothetical set of possible election outcomes that could occur in these fixed districts based on either observed or counterfactual campaign scenarios. For example, the analyst might consider how the outcome in each district would differ if the Republicans had won one percentage point more of the vote in the last election. This is simulating from a real outcome to one that is slightly different than what was observed. Methods for estimating seats-votes curves operate in this way by starting with the real election results and then imagining how seat shares would differ if the votes had been cast differently.
As our discussion about the handling of uncontested races illustrates, simulations typically assumes that election results are a noisy indicator of a redistricting plan's overall tendency to produce a certain result (Gelman & King, 1994b). After all, an election is just one realization of a stochastic voting process. This view also affects how some researchers conceptualize the measures of partisan bias we reviewed. As a result, some researchers are now applying measures such as the mean-median gap and EG measures to simulated maps (see Krasno et al., 2019). This additional step allows scholars to determine the degree to which observed electoral conditions reflect a state's "natural" tendency to favor one party due to residential segregation to make inferences concerning what is an extreme result across the states.
Election simulation methods more broadly seek to estimate a model that accounts for observed attributes (such as national or state party swings or whether incumbents are running), so researchers can forecast the expected outcome under different campaign scenarios. This method is akin to efforts to standardizing the data so that particular set of circumstances or election results do not skew the estimates.
One example of the utility of election simulations examines the partisan consequences of the 1992 congressional districting. The effects were initially unclear because many long-serving Democratic incumbents ran (and won) in favorable Republican districts. However, simulation estimates of party support in those districts indicate that Republicans could expect to gain more seats as these incumbents retired (e.g., Hill, 1995). Gelman and King (1994a) present a notable application of this method to estimate party bias and responsiveness at the state legislative level for every state election cycle from 1967 to 1988. They generate measures of bias and responsiveness from these 237 different simulated state election years using both the observed characteristics and under the counterfactual of no incumbents running in any election. The authors found that the act of partisan redistricting advantaged a party in power but generally decreased partisan bias and increased responsiveness in the following election compared with elections in which no redistricting occurred previously. McGann et al. (2016) offer the most prominent use of these same simulation methods to estimate and categorize the partisan bias and responsiveness in recent congressional elections. They find many state maps were more biased toward Republicans after the 2010 round of redistricting. They call on legal arguments to justify a majority rule standard: A majority of voters should elect majority of legislators. McGann and colleagues contend that if all voters are given equal weight in the electoral system, then the party that wins the majority of the vote must also win the majority of the seats. Inspired by King (1994a, 1994b), they generate various seats-votes curves assuming an "approximately" uniform swing that is augmented by a small random element. The random element (a normally distributed term with mean zero and standard deviation of five) facilitates simulations to create a seats-votes curve for any particular set of elections. They then compute standard bias, symmetry, and responsiveness measures, as well as associated statistical tests to distinguish trivial deviations from meaningful ones. They measure the degree of partisan bias as the average symmetry between the parties between 45% and 55% of the vote, viewing that as a plausible range for actual election outcomes. They assert that this approach is flexible, reflecting reality within each application rather than assuming an amount or form of responsiveness as is required by measures such as the EG. They generate 1,000 simulations to calculate standard deviations and thus measures of statistical significance. This addition of uncertainty is important, but its statistical foundation is not well developed.

The Challenges and Value of Election Simulations
The main challenge of election simulation modeling is that it requires making model assumptions that may not account for the true distribution of possible electoral outcomes. This challenge is greatest in predictive applications, when evaluating proposed maps or plans for what will be a set of future elections. Because the goal of the method is to predict what the range of outcomes will be over multiple types of elections, accurately accounting for future variability in party support makes assessment more an act of forecasting than evaluation. In comparison, model estimation and simulations are more precise for retrospective assessment of a plan's effects across many observed elections (Gelman & King, 1994b), allowing bias and responsiveness parameters to be measured across a range of contextual campaign factors and partisan swings. Although certainly of value to social science understanding of the effects of redistricting, waiting two or three cycles is of less use to parties seeking to prevent the harm of a proposed map.
Another drawback to election simulation estimates is in their limited consideration of feasible alternatives given a state's political geography. Such simulations allow for evaluations based on a definable standard (e.g., zero partisan bias), by comparing across states (e.g., where a state's meanmedian gap ranks), or by comparing before-and-after a change in districts. But the simulations have difficulty evaluating the degree of intentional gerrymandering in a state. Election simulations take districts as given and only sometimes have a few other feasible maps for comparison (Ramachandran & Gold, 2018). This is a challenge for those evaluating existing or proposed maps that either seek evidence of illegal intent or need to outline a possible legal remedy.

Map-Based Simulations
A prominent way to address this second shortcoming of election simulation is to use map-based simulations. These simulations generate a large sample of feasible maps for each state. Rather than simulate results from an existing map, this approach actually creates the maps. Others further propose comparing predicted results under maps that optimize various criteria to define a range of "reasonable bias" for a map's features (Cain et al., 2018). Scholars describe these types of methods using terms such as simulation, sampling and outlier, ensemble, or post hoc comparison studies. Geographiccentric approaches can extend beyond map simulations. For instance, Eguia (2019) uses the time-constant nature of county lines to estimate the relative contribution that the geographical clustering of like-minded partisans has made within a map's observed partisan bias.
The promise of computational methods for legislative districting is not new (e.g., Nagel, 1965;Thoreson & Liittschwager, 1967;Vickrey, 1961;Weaver & Hess, 1963). But the exponential growth in computing power, computational methods, and data accessibility (as well as growing public familiarity) makes this an increasingly viable approach. Recent years have witnessed renewed scholarly efforts to develop computational approaches for generating sets of feasible district maps, either for the use of publicly proposing alternative maps or for the assessment of partisan gerrymandering effects (Bangia et al., 2017;Chen & Rodden 2013, Chen & Rodden, 2015Cirincione et al., 2000;Fifield et al., in press;Herschlag et al., 2017;Liu et al., 2016;Macmillan, 2001;Magleby & Mosesson, 2018;Mattingly & Vaughn, 2014). Public accessibility is likely to grow too as scholars take the additional step of developing free, opensource software modules to make these methods open to the public (Altman & McDonald, 2011;Fifield et al., in press).
Scholarly assessments of the value of these computational methods can be confusing to the outside observer. At first glance, the assessments of respected experts range from skepticism (Altman et al., 2015), to selective acceptance (Katz et al., 2020) and optimistic support of certain uses (Tam Cho, 2018). It is thus important to clarify where consensus on the current uses and future promise of these types of methods does and does not exist. We largely refrain from discussing the exact ways these proposed approaches differ (see Ramachandran & Gold, 2018), instead choosing to emphasize how these different methods face similar challenges in their implementation.

The Challenges and Value of Map-Based Simulations
Given the complexity of the criteria used for drawing a map, and the exponential number of possible maps to explore, most computational approaches seek to evaluate a representative sample of all possible maps to approximate what the full set of feasible maps looks like. As Duchin (2018, p. 57; emphasis in the original) states "the norm undergirding the sampling standard is that districting plans should be constructed as though just by the stated principles." A primary intention for generating these samples is to quantify how much a proposed map differs from the set of all possible maps to document evidence of partisan bias. For example, an analysis might be able to conclude that an enacted map has a mean-median gap greater than 99% of simulated maps. But reaching such a seemingly straightforward conclusion remains a computationally onerous task, and the ability of these algorithms to randomly sample from all feasible maps remains a challenge (e.g., Fifield et al., in press). Recent empirical evidence and theoretical efforts detail that some prominently proposed algorithms fail to select the whole set of possible districts with equal probability and cannot give an unbiased sample of all possible maps, even for relatively simple challenges after a large number of simulations. Their resulting estimates thus cannot be used for making claims of how much bias exists within a proposed map compared with a representative set of neutrally drawn maps.
An example test of these map-based simulators illustrates the challenge of random sampling. Consider a simple example: a square grid with units of equal population, such that it is symmetric vertically and horizontally. If a map simulator needs to sample all possible two-district maps with equal population, the expectation is that the simulator should be as likely to bisect the square grid horizontally as it is to bisect it vertically. In one application of this type of test, Altman et al. (2015, p. 31) find that two prominent algorithms for map generation create a disproportionate number of vertically bisected maps even after one million iterations. This is troublesome considering that the map and constraints in practical applications are far more challenging.
Recent advancements reply to this challenge by identifying and accounting for sources of such bias. Newly proposed computational algorithms now use randomness tests to verify their suitability (Magleby & Mosesson, 2018). And a key advancement has been to characterize the type of elements within sampling algorithms that theoretically have the potential for creating truly random samples (Fifield et al., in press). Specifically, techniques that Bangia et al. (2017, p. 12) refer to as "constructive" Markov chain Monte Carlo (MCMC) algorithms-those that grow new districts from random seeds-are likely to generate biased samples, whereas "moving boundary" MCMC algorithms are theoretically capable of providing unbiased and representative samples. Even with theoretically appropriate algorithms, there remains the problem of knowing what number of simulations is sufficient to allow a confident assessment of what is an atypical outcome compared with all possible neutral maps. Recognizing this limitation, scholars have proposed tests that confidently assess if an existing map's characteristics significantly differ from what is expected, even when the distribution of all feasible maps has yet to be definitely sampled (Chikina et al., 2017).
Computational challenges still exist, but progress in meeting these challenges has been rapid and is likely to continue. Assuming simulation methods accomplish the stated intention of representing what a typical map might look like based only on a small number of stated principles, there remain qualifications to the practical or theoretical value of these analyses.
First, the technical demands and effort required to establish suitable and efficient methods may limit any legal claims of a party deliberating optimizing partisan bias (Altman et al., 2015). It may be possible to use extensive computer power to show a set of optimally fair maps exist, but such efforts might be so intensive that it ultimately mitigates claims that a ruling party deliberately ignored or rejected these specific maps.
Likewise, the use of a random sample of maps to make probabilistic claims remains biased, or at least considerably naive, unless the laws of a state exhaustively define what principles can and cannot be considered and how they are to be formulated (Altman et al., 2015;Katz et al., 2020). When state laws require that the redistricting process prioritize compactness without stating how to assess it, or fail to prohibit the consideration of additional principles beyond what is explicit, then there is less reason to believe computational methods will be sampling from the set of feasible maps that were actually under consideration . Such biases question quantifiable claims about the rarity of a proposed or existing map. As Katz et al. (2020) note, this concern can be eliminated if a redistricter was forced to make such an exhaustive list of all the criteria that were considered. Assuming those criteria could be operationalized, one could then probabilistically assess how likely a map could be drawn with the observed partisan advantage of the proposed or existing map.
The contrast between popular ideas of compactness and how map-based simulations operationalize these concepts sharply illustrate this challenge. Numerous measures of compactness have been used to assess district shapes (e.g., Ansolabehere & Palmer, 2016;Niemi et al., 1990). State laws and federal courts, however, often only require that districts be compact without defining it. Kaufman et al. (in press) recently found agreement among the public and practitioners of what makes something appear compact, but also show this concept is multidimensional and correlates with prominent measures of compactness in selective ways. Future mapbased algorithms could use this study's results to design compactness evaluations that better correspond to the intent of compactness requirements. But the study also demonstrates the way simple concepts do not directly translate into quantifiable assessments of what is probable or best. This translation challenge is especially apparent as new approaches seek to develop methods of accounting for a district's racial composition in a manner consistent with the Voting Rights Act.
Even with a defined set of principles, a corresponding concern is that probabilistic measures of extremity alone, as a relative indicator, have unclear value compared with a fairness standard. As Katz et al. (2020, p. 176) question, Is a plan fair if it is at the 50th percentile of all possible plans but, when the parties split the vote equally, the Republicans receive 85% of the seats? Is a plan unfair if a party receives more seats than 99.99% of other plans but, when the parties split the votes equally, they split the seats equally?
If some extreme maps are deemed acceptable because of a fairness criterion, then the documentation of extremity is an insufficient basis for rejection. For instance, simulation evidence from McDonald et al. (2018) points to some Illinois congressional districts as certainly advantaging Democrats, but additional analyses found those same districts failed to demonstrate a partisan advantage in practice or produce bias statewide. Map-based simulation methods only provide a potentially valuable source of supportive evidence for demonstrating gerrymanders if they are consistent with other sources of evidence.
Despite these limitations, map-based computation methods provide considerable current value and more promise in the future. The clearest value of these methods is their ability to demonstrate what is possible or impossible within a state. Without making probability claims of what is an extreme result, these methods allow evaluations of whether fairer representation is even legally feasible. Recent studies have used these methods to provide evidence that, unless their methods are in error, the political geographies of states such as Florida and Massachusetts essentially favor one party in holding a representation advantage (Chen & Rodden, 2013;Duchin et al., 2019). Altman and McDonald (2018) evaluate publicly proposed alternative maps to demonstrate that Ohio's map could be improved to maintain its number of majority-minority districts while still having unbiased maps that can satisfy popular criteria.
Another value is in their ability to document what factors were likely taken into consideration by mapmakers by purposely not building them into the simulations. Magleby and Mosesson (2018) compare rates of majority-minority districts in maps drawn requiring only equal population and contiguity with what is observed in Mississippi, Texas, and Virginia. This evaluation allows some insight into the extent race was taken into account in drawing those state maps. Notably, they fail to find a single simulated Virginia congressional map that produces as many as the two majority-minority districts Virginia had in 2012, thus suggesting such a map was unlikely to be drawn without considering race. This is exceptionally helpful information for evaluating claims about vote dilution under the Voting Rights Act.
Map-based computational methods are also of value for generating "demonstration maps" that offer clear improvements over alternative proposals. For instance, Tam  propose redefining the set of counterfactual maps to those that are at least as good or better than a proposed map along nonpartisan characteristics. Instead of attempting to make probabilistic claims, this path allows practitioners or evaluators to demonstrate the extent to which Pareto improvements exist across multiple criteria (Cain et al., 2018). The implementation of evolutionary algorithms, which prioritize optimizing a single or set of objectives (e.g., most or least competitive, most compact) when searching over a set of maps, can then help define the possible range of observed values of a feature without necessarily stating what is typical or should be expected.

Future Research on Election and Map Simulations
Map-based simulation approaches recognize the need to incorporate and operationalize public and legal demands related to race, ethnicity, compactness, communities of interest, and the like, but there are host of other practical demands as well. Past studies found that practitioners uniformly reported that their top priority was finding satisfactory districts for incumbents (Gelman & King, 1994a). The academic and public community may also need to consider how simulation methods can best account for the practical issues of where incumbents live and whether a sufficient number of qualified or serious challengers live in a potential district as well.
Scholars using map-based simulations often wish to keep such techniques public and accessible to counter their use by political insiders for private gain (Tam Cho, 2018). This challenge will likely remain unmet, however, if extensive computational resources and training are needed to apply these methods to individual contexts. Although there might already be a broad enough community of academic experts who are willing to consult with state governments and leaders, redistricting is also a consequential process in local municipalities and school districts as well (e.g., Richards, 2014). The development of open-source algorithms is certainly a step forward in equipping concerned citizens with the ability to propose and evaluate alternative proposals (Altman & McDonald, 2011;Bradlee, 2020;Fifield et al. in press), as are scholarly attempts to find methods that require fewer computational resources (Magleby & Mosesson, 2018). But it is unclear whether these approaches are as practical in settings outside U.S. or state legislative contexts. Indeed, it is an open question if other redistricting applications share enough commonality in their guidelines to realize the promise of generalizable methods.
A related and perhaps clearer concern for map-based simulation approaches is how to reconcile practical data constraints with needs for precision and accuracy. Social scientists are well aware of hurdles such as Simpson's paradox, ecological inference fallacies, or modifiable area unit problems, wherein relationships observed at one level of aggregation can be of the opposite direction when looking at a larger or smaller level of aggregation. Similar to jigsaw puzzles that can have the same image but differ in the number or size of its pieces, map-based simulations all start with the same state but vary in how many divisions they consider and the population sizes of those districts. While scholars value finer disaggregation for units of political geography, such as precincts rather than counties, evaluations of these boundaries also rely on measures of party support based on smaller numbers, and they are more likely to develop boundaries that do not adequately align with census block estimates of demographics such as racial composition. Ironically, more precise disaggregation within these algorithms increases both the computational demands and the need for scholars to account for the uncertainties and imprecision of the input data.
These practical data challenges also point to the possible value of integrating the two types of simulation approaches. It is common for many map-based simulation studies to use a small set of observed election results to document partisan advantage (although see Krasno et al., 2019). Herschlag et al. (2017) provide one of the more extensive evaluations by utilizing observed votes from 11 elections for offices such as the U.S. House, U.S. Senate, U.S. President, and the Wisconsin Secretary of State. This effort is impressive, but it still represents a small sample considering these 11 elections come from only three distinct cycles, and there are clear dependencies across cycles for those offices having the same candidates running for reelection. It may be a too demanding of task to integrate election simulation estimates of party support within these already complex map-based simulation methods. Nevertheless, there is a disconnect between sophisticated map simulations that only consider a small set of election results with the work of election simulations showing that small sets of observed election data can be imperfect indicators of its expected partisan advantage (Gelman & King, 1994b). When seeking to evaluate a wide range of possible maps, it also seems prudent to evaluate them across a wide range of possible election results.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.