A Conservative Metric of Power Creep

Collectible card games are taking up more space in popular culture with traditional paper card games even embracing e-sports. However, longevity in such games is not as common, with some suspecting power creep as a culprit behind why some of these games fail. Yet, Magic: the Gathering has not just survived but thrived for over 25 years with the game’s designers publicly stating their aim to keep curbing power creep. Therefore, it is of interest to determine the rate of power creep in the game. Herein, we formally define a conservative metric power creep and calculate its occurrence in the game of Magic: the Gathering. Although having an increasing rate, power creep appears low with an average of 1.56 strictly better card faces released per year.


Introduction
Magic: the Gathering (MtG) is a successful collectable trading card game (TCG) started in 1993 owned by Wizards of the Coast (WotC) that has continued expanding for over 25 years. With over 20 billion cards printed between 2008 and 2016 and well over an estimated 35 million players, MtGas a gamecontinues to grow (Duffy, 2015;Webb, 2018; W. of the Coast, (n.d.)). Such is evident with an increasing number of newly designed cards released each year ( Figure 1). Yet, with an unprecedented number of card banning in 2020, the question of how this game has remained successful over its long lifetime is worth consideration. Mark Rosewater, head designer at Wizards of the Coast (WotC) since 2003, may have hinted toward MtG's longevity as avoiding power creep (Rosewater, 2016). Power creep, as its name implies, is the strengthening of the game and its pieces over time possibly to the point where new pieces invalidate older ones.
While generally seen through the lens of how power creep effects game play (which may in turn effect who plays the game), Perreault, Daniel, and Tham (Perreault et al., 2021) peer at how power creep may first alter those who play the game. In other words, with constant change of the game, those who play it must play more frequently to "keep up" irrespective of the game play quality (Falcão & Marques, 2019;Perreault et al., 2021;Zuin & Veloso, 2019). Given MtG's longevity, Perreault, Daniel, and Tham's perspective is of peculiar interest as MtG players acknowledge the phenomenon that one never quits magic, rather one takes a break (Cordell, 2020;Woods, 2014). A low rate of power creep could result in the game remaining similar enough that returning players recognize it. Similarly, a high rate of power creep-while suggested to increase frequency required of players playing the game-may result in a returning player unable to recognize the game resulting in them failing to return (Ashton & Verbrugge, 2011).
An alternative perspective is that of the new player. Ben Brode, game director of Hearthstone until 2018, acknowledges that producing new content at the very least increases the game complexity which may make the game more daunting to new players who might wish to become enfranchised (Brode, 2015). Further, while it is often the enfranchised player complaining about power creep, game designers wish to keep their games exciting by producing new and powerful cards and mechanics (Ashton & Verbrugge, 2011;Brode, 2015;Stoddard, 2019). It seems, regardless of one's perspective on power creep, it should be avoided. To that end, MtG's success in abating power creep has not gone unnoticed by players of other TCGs, such as Yu-Gi-Oh (Williams, 2020).
One of the ways MtG handles power creep is through the concept of rotating formats (Stoddard, 2013a;Rosewater, 2005). These formats use only the latest cards, thereby older "mistakes" are no longer relevant. In addition, it may be easier to keep the power relatively flat when focusing on a limited card pool instead of the over 20,000 cards (although these cards are still taken into account) (Stoddard, 2013a). However, that does not address constructed formats where "mistakes" still exist. Therefore WotC has tried an "Escher Stairwell" approach, where some aspects are lowered in power and others raised (Blogatog, 2012(Blogatog, , 2019Rosewater, 2005).
Power creep is on the minds of players and designers alike, yet how "power" is defined is more nebulous (Brode, 2015;Stoddard, 2013b). Sam Stoddard, senior designer at WotC, makes it clear that power is relative and dependent on the environment, for example, power is different in limited verses constructed formats (Stoddard, 2013b). Further Stoddard declares power creep as relative due to the many formats of MtG. Additionally, he stipulates that two separate cards with the same mana cost, power, and toughness, but different abilities may both be interpreted as "power creep" by players depending on the context regardless of print order (Stoddard, 2013a). It is clear power creep is a topic with much nuance and difficulty to define ( Figure 2).
With no publicly disclosed metric of power creep, analysis of MtG's longevity and how that relates to the power creep in the game, discussion is limited solely to feelings about the strength and "health" of a format by players. With a historic number of bannings in 2020, is Mark Rosewater's premonition of the game collapsing due to topheaviness coming to pass? (Rosewater, 2016) To investigate the health of the game, we constructed a conservative metric for power creep. As eluded to by many prominent designers from WotC and even Hearthstone, power is relative; accordingly, our definition is based on a relative relation as well. This is achieved by framing power creep in relation to cards that are strictly better than one another.
The concept of associating a score to game pieces from which to rank them is not new (Chen et al., 2018;Fancher, 2015;Karsten, 2015;Zuin and Veloso, 2019;Zuin et al., 2020). While selecting cards by how often they win may identify powerful ones, it may be insufficient to address power creep in a game as a whole (Chen et al., 2018;Fancher, 2015;Karsten, 2015). For example, two popular MtG formats "penny dreadful" and "pauper" are non-rotating formats with restricted card pools (the latter official supported) (Rasmussen, 2019). The cards therein are restricted via price and rarity (common), respectively. Cards with the highest win rate in such formats may be outright banned or "useless" in others. Thus, win-ratio is insufficient to encapsulate power creep for a game as a whole. Zuin and colleagues take a fundamentally different approach by attempting to tabulate the resource cost for a card's effect (Zuin and Veloso, 2019;Zuin et al., 2020). Conceptually, comparing the resource cost for an effect could allow one to check to see if the cost has changed over time.
While inspiring, unfortunately resource cost alone is insufficient for determining power creep. Consider a "singleton" format (one copy per game piece allowed) versus a non-singleton format (e.g., at most four copies of a game piece per constructed deck). Consider the given that the power of a game piece is dependent on the consistency upon which the player can access the game piece in a given game (which is dependent upon Figure 2. A universal metric for the "power" of a card in Magic: the Gathering is difficult to define. Facets of a card's power may be more easily tabulated. While not every card can be cleanly ordered by their power, there does exist a few strictly better cards (sub 2a). Additionally, functional reprints are an easily identifiable facet of power (sub 2b). Defining the power of a card in a format agnostic way is challenging as the card pool for synergies and number of copies one can run vary (sub 2c and 2d). the number of copies of a card that is in one's deck). An example of such a card is "Relentless Rats," which scales in power according to the other number of "Relentless Rats" in play. An alternative example, the resource cost of all cards per set could be used to see if the average rate of cards is decreasing. However, looking at only the cost of a card fails to acknowledge cases where effects are getting cheaper and more are being used, while the average of cost per expansion remains constant.
Additionally, one must still identify which cards are even contenders for being improvements over others. A card simply requiring less resource than another is insufficient for determining power creep. What makes Zuin and colleagues work of interest is that since power creep is seemingly an interplay of resource cost and the effect for that cost, utilizing Word2Vec embedding of the rules text to predict the cost opens the opportunity to use the rules text embedding to find which cards should be compared (e.g., via a distance metric to find similar effects) (Mikolov et al., 2013;Zuin and Veloso, 2019). Unfortunately, only utilizing distance in the embedding space may result in disagreement from the algorithm and players.
Since power has such a complex and contextual definition, can one define power creep for a game as a whole without defining power? Herein, we attempt to do that. We shall build up to such a definition by re-framing what power creep is. Let power creep be dependent on the re-visitation of design space at equal or lower resource cost. Then to calculate power creep without calculating power one must (1) assign a cost for the resource, (2) identity re-visitation of design space, for example, which cards to compare, and (3) assess change over time. Definitions, Terminology, and Notation starts by formalizing the anatomy of a card's face ( Figure 3) from which those cards which are clear improvements on rate for resource can be defined as StrictlyBetter. With the cards that have improved on rate found, the flux of their release over time can finally be analyzed (PowerCreep).

Definitions, Terminology and Notation
While care is taken to define the relevant attributes of MtG cards, the reader would benefit from some former familiarity of the game and its rules. Additionally, it may benefit the reader to search for mentioned cards via WotC's official card search engine Gatherer or fan-favorite Scryfall (Scryfall, 2021; W. of the Coast, 2021a).

Card Pool
The pool of cards under consideration is the set of cards with at least one officially supported format legality and is represented as C.

Card
A card, C 2 C, is what most players would know as a physical piece of card-stock paper measuring about 63 mm × 88 mm with a front/upside (C f ) and a back/down side (C b ).
However, such an understanding will be insufficient for this analysis (see Figure 4). Let a card be the ordered set of faces, where A 3 B denotes that "A" precedes "B" in the set At first this notion may seem jarring that a card can comprise n faces where max(n) ≠ 2. However, WotC has shown interest in printing more faces on cards as evidence in the playtest card "Smelt//Herd//Saw" and the Unsanctioned card "Who//What//When// Where//Why." Combined with Kaldheim's modal dual face cards (MDFCs), it is plausible we may one day see a card with 16 faces (or even more). Of note, all cards have minimal face cardinality of 2, where if WotC only defines face C 1 , the second face is the null face face C consisting of the iconic Deckmaster card backing.
Further, in the case of "Smelt//Herd//Saw," one can partition C into two subsets C f and C b , were In other words, let the card front C f be the subset of C defined as Figure 4. Examples of the multiple faces of card. Whereas the word face may refer to a side of a geometric rectangular cuboid, here it refers to a functional game object. Partially, this is motivated as printing on the four thinnest sides of a playing card is impractical. More generally, WotC has experimented with a singular playing card have multiple faces on the front (e.g., "Smelt//Herd//Saw"). Therefore, it is conceptually possible to formulate a card with more than two faces. With C representing the card pool, let F be the set of all card faces in said pool, that is In summary, the definition of a card face and the number of faces a card might have is related to that of being a game object rather than the physical construct of the piece of paper (see Figure 4).

Card Face
When a MtG player thinks of a prototypical card, perhaps "Llanowar Elves," they are thinking of card with two faces. Let C = Llanowar Elves, then Under this context it may help to further formalize the contents of the face.
Pips. Pips are the symbols used to define a face's casting cost (amongst other things like cost for ability activation, etc.). Specifically, the casting cost is the mana cost (defined shortly) required for casting the card defined on the card's title bar. Mana symbols are a subset of pips that specifically relate to generic, colored, and colorless mana. Briefly, generic mana is mana for which any color or colorless mana can be used to satisfy the cost. Whether or not mana is colored or colorless is a property of the pip. When a colored pip is present, only the color(s) matching the pip's designation will satisfy any accompanying cost. Likewise, colorless pips strictly require that mana used to pay for those costs are equally devoid of color. For example, the mana cost 1 represents one generic mana for which one green mana (G), one white mana (W), or one colorless mana (C) would satisfy the cost. A review of this terminology can be found under the section Mana Abbreviations (Rosewater, 2009); alternatively a comprehensive table of these symbols can be found on Scryfall's Color and Costs API documentation page (Scryfall, 2021). The reader is assumed familiarity with the UTF-8 variant of pips in the accompanying examples.
Let P be the set of all mana-related pips, M the set of all mana symbol pips (e.g., G), N the set of all numeric mana pips (generic mana), and V the set of all variable mana pips (e.g., X) then

M[N [V ¼ P, and M\N \V ¼M
ana Cost. Let the function pips return the set of pips required to cast the face i of a card C, then the mana cost of a face is the multiset of the pips required to cast the card is the multiplicity of the pip that is, how many times that pip appears. For clarity, consider a mana cost for a face face C i written as 4GG. The mana cost 4GG would be formalized as If a face has no pips, it does not have a mana cost, that is, manacostðface C i Þ ¼˘. This is most familiar to players in the form of faces with type "land." Notably, players play lands, rather than cast them. Similarly, the sorcery "Ancestral Vision" lacks a mana cost and cannot be cast from hand. This is expanded upon in an example under Playability briefly. Lastly, in the introduction we speak of power creep generally in terms of resource cost for an effect. The mana cost is the most overt, but not encompassing, resource cost a card's face might have (e.g., consider a creature that requires you to sacrifice a land when it enters the battlefield).
Converted Mana Cost. The converted mana cost (cmc) of a face is the total amount of mana (generic, colorless, and colored) required to cast the face

cmcðpÞ × mðpÞ
In other words, the cmc is the sum of the generic value (amount of mana) of a pip multiplied by the number of times that pip appears in the mana cost. Further, the cmc of a pip, p, follows the comprehensive rules (202.3) when a face is not in the game zone referred to as the stack (W. of the Coast, 2021b), for example Generally, the cmc of a card face is that as written, whereas on the stack it may be altered for example, the pip X, valued at zero, is a variable pip which on the stack can often take on a non-zero value (sometimes zero inclusive). For an overview of pips, mana, and cmc readers are directed to the comprehensive rules (W. of the Coast, 2021b) or for a cursory review see Scryfall's Color and Costs API documentation page (Scryfall, 2021).
Returning to the example of a mana cost written as 4GG Of note, cmcðface C i Þ ≠ cmcðCÞ for example, a fuse card with jC f j > 1 has a cmc equal to the sum of faces. While the rules are used as the definition for the cmc of a singular pip, the definitions of cmc and mana cost used here are not the same as those in the rules (W. of the Coast, 2021b); namely, this is because we are inspecting cards at the face level whereas in some instances the cmc and mana cost of a face, depending on zone, are defined at the card level (e.g., fuse cards).
Mana Efficient. Let the function colors return the subset of only mana symbols from a face's pips With an understanding of mana cost and cmc we can define a Boolean function to determine if one face is mana efficient in relation to another By this definition, we assure that lower cmc is insufficient to be considered mana efficient; the mana cost must also not add additional mana symbol pips Consequentially, a card face cannot introduce a different color pip and reduce cmc to be considered mana efficient for example, {U 1 } is not mana efficient in comparison to {1 1 , B 1 }. Additionally, a face cannot "drop" a color to be considered mana efficient for example, {1 1 , U 2 } is not mana efficient in comparison to {1 1 , U 2 , G 1 }. Notably, mana efficiency by this definition also means that: 1. Two faces with equivalent mana cost are equally mana efficient in relation to one another, and 2. Two faces with equal cmc can be equally mana efficient if the former reduces colored mana symbols.
This definition may be contentious as it means that the mana cost 4 is not more efficient than 3G, that is, a cost consisting of only generic mana, while loosely may be considered an improvement, is insufficient to qualify. Our rational for this decision stems from our aim to provide a conservative definition, especially as a designer's choosing to require a color at all may bias the resource cost for an effect (Zuin & Veloso, 2019).
Playability. The play-ability, playable of a face is whether or not that face can be played (in case it is a land) or cast directly from a player's hand. This is based on the comprehensive rules (W. of the Coast, 2021b) For example, the land "Agadeem, the Undercrypt" of "Agadeem's Awakening//Agadeem, the Undercrypt" can be played (as it is an MDFC), whereas "Adanto, the First Fort" of "Legion's Landing//Adanto, the First Fort" cannot be played directly from hand. For clarity outside of the comprehensive rules (W. of the Coast, 2021b), this stems from the first confusing similarity between the definitions of "playing" a card's face and "casting" one. To cast a face of a card, a cost needs to be paid, even if that cost is zero. Some faces have zero as a mana cost, while others have an undefined mana cost entirely (most notably faces bearing the type Land). The "Agadeem, the Undercrypt" and "Adanto, the First Fort" example, however, requires a more complex look at the rules (W. of the Coast, 2021b). At the time of writing, a card might be playable or castable from hand provided that (1) the card has a mana cost or (2) the card is either a land on the front of a card or the card is a MDFC. While both "Agadeem, the Undercrypt" and "Adanto, the First Fort" are defined on C b , the card "Legion's Landing//Adanto, the First Fort" in its entirety is not a MDFC. Thus, "Adanto, the First Fort" is accessible by first casting "Legion's Landing," then triggering the rules text to change the card's face. The definition of play-ability may require adjustment in the future depending on changes to the rules.
In short, the example of "Agadeem, the Undercrypt" and "Adanto, the First Fort" serves to emphasize the requirement of constraining directly playable cards, for example, face A i is not power crept compared to face B j if the latter can be played directly and the former requires other requirements to be accessible to the player.
Rules Text. The rules text of a face, R face C i is the set of lines {l 1 , …, l n }, reminder text excluded, as defined as by "Oracle Text" which can be found on Gatherer (W. of the Coast, 2021a).
Two faces are rules equivalent when each corpus of rules text is a subset of another Notably, this excludes card faces which may have additional lines with "upside." Similar to our definition of mana efficiency, the lack of including upside may cause uproar. However, the aim is a comprehensive definition of power creep for the game.
Including "upside" requires (1) that there be an effect which is always beneficial to the player and (2) defining what constraints qualify. In the most extreme example, the addition of the text "You lose the game." could be added to an otherwise functional reprint of a card. However, the existence of a card with rules text "The next time you would lose the game," suggests the admittedly niche but nevertheless existent interplay between even the arguably most detrimental effect (losing the game) a player could trigger for their own benefit. In other words, the rules of MtG are such where-if not now, in a future expansion-any attribute of the card could be an upside or a downside.
Supertypes, Types, and Subtypes. Each card face has at least one type. Additionally, they may have a supertype and/or a sub-type. Generally speaking, faces with "creature" amongst their types have at least one sub-type. Let T s represent the set of supertypes, T the set of types, and T s represent the set of subtypes, then T s ¼ fbasic, elite, host, legendary, ongoing, snow, worldg Additional types are subdivided into permanent ðT p Þ and non-permanent ðT np Þ types Let the functions supertypes, types, and subtypes return the corresponding set of each for the given face. For example, if C is "Ambassador Laquatus" then Combat Stats. If creature 2 typesðface C i Þ, then the face has an addition two numeric attributes: power and toughness. Let combat stats ðface C i Þ be the ordered set of power, p and toughness, t, of the face, that is, combat stats ðface C i Þ ¼ fp3tg. Then we can say a face face A i is combat worthy in comparison to face B j if both the power and toughness of face A i are each greater than or equal to the power and toughness of face B j combat worthy If either face is not a creature, then combat worthy ðface A i ,face B j Þ ¼ False. If both faces are not creatures, then combat worthy ðface A i ,face B j Þ ¼ True.
Rarity. At the time of writing there are four rarities: common, uncommon, rare, and mythic. Additionally, regardless of the number of faces a card has, all faces on a card share the same rarity. Rarity, with the exclusion of the card printed for charity "Rarity" (which has no legal formats), impacts only two formats: draft and sealed. This impact is that of the card's frequency in the card pool, not whether or not the card is in it all. As we attempt to address power creep of the game as a whole, which includes many more constructed formats (vintage, legacy, modern, pioneer, historic, brawl, commander, etc.) whether or not a face has a different rarity than another face is of lesser concern.
However, there are formats, such as pauper, where the card pool is limited by rarity. Additionally, depending on the intent of the limited environment (draft or sealed) a reprinted card may undergo a rarity shift, that is, being reprinted with a different rarity than its initial printing (e.g., "Alabaster Mage" had the uncommon rarity in the expansion Magic 2012 but was reprinted with a common rarity in Double Masters). This furthers the notion that rarity is a construct for a subset of formats rather than of impact during a game where the card is legal. Lastly, MtG was designed so that rarity was not a correlate of power (Rosewater, 2005). Hence, for the purpose of this article rarity is excluded.
Release. There are many expansions to MtG. Let the initial release date of a card be represented as release 0 (C). Then we can represent a card A being released after another card B as release 0 (A) > release 0 (B). As all faces of a card are released at the same time It is worth mentioning that not all newly released cards are available in limited environments, like draft. Additionally, these cards may also not enter a rotating format.
Preexisting Pool. Let the preexisting card pool C pre of card C be the subset of cards that were released prior to it Reprints. So far, we have discussed many aspects of a card's face, however, we have yet discussed a face's name. Normally, a format limits the number of copies of a card (and accordingly its faces) to up to four copies in a deck, or up to one if it is a "singleton" format. While there are cards with rules text that get around this limitation (e.g., "Persistent Petitioners") there is another way to get around this. WotC has often printed "functional reprints" of previously existing card. Here, we define a card's face to be a functional reprint of another if it: 1. Both are directly playable, 2. Has equivalent mana cost, 3. Has equivalent supertypes, types and subtypes, 4. Has equivalent rules text, 5. If both faces are creatures they have equivalent combat stats, and 6. Has been released after the other.

Thus
FunctionalReprint Additionally, we define a relaxed variant where the subtype requirement is waived We will use F to represent the set of all functional reprints in the card pool C and F r the relaxed set of functional reprints.

Strict Comparisons
Strictly Better and Strictly Worse. Here we will attempt to codify a sufficient, albeit not all encompassing, definition of "strictly better." Normally, when a player says a card is "strictly better" they are roughly saying that card A has the same effects as card B for a reduced cmc (resource cost), or A has the same effects as B at the same cmc with "upside." There is a lot of wiggle room within the word "upside." Therefore, we will use a far more conservative definition. Given face i of card A and face j of card B, face A i is strictly better than face B j if: Strictly Better and Strictly Worse at Release. While the above definition of strictly better clearly captures a card face that provides equal or better mana cost for equal or better combat stats and equivalent rules text (without being a functional reprint), it falls short; power creep occurs over time and it is therefore relevant to also define strictly better at time of release in order to prevent counting a face B j released after face A i which would satisfy this relation. Furthermore, as functional reprints are their own form of power creep it is insufficient for face A i to not be a functional reprint of face B j ; rather, face A i must not be a functional reprint at all. Additionally, WotC also prints strictly worse cards. Thus, we further constrain the strictly better "at release" definition such that if a "strictly worse" card face exists, that it does not predate the strictly better face.
We can define the function StrictlyBetter at release as StrictlyBetter at release given no continuous effects or triggers of which there is also the relaxed variate.

Sets of Faces
With these definitions of strictly better at release, strictly worse at release and functional reprints, we can define the sets of card faces we will use to define power creep. We can define the set B of strictly better cards at release as and the corresponding relaxed version These sets comprise the card faces that at their release date were both original and strictly better than another preexisting face of which there does not exist face that could be classified as strictly worse. Similarly, let the set F of strictly worse cards at release be defined as and its relaxed variant Although it is implied that any functional reprint is a factor of power creep, we postulate that a functional reprint of a strictly worse at release card face ought to be handled differently. Let Lastly, we define the release window variates as

Power Creep
Before defining power creep, it may be worthwhile to define some good tenants that a definition of which may reflex:  (Stoddard, 2013b). Thus, an extension to this model can be defined as y À 1993 each of which have a corresponding relaxed version (replace B with B r and F with F r ).
Escher Stairwell. The role of strictly worse at release card faces in power creep is unclear.
WotC may be implying that these faces are part of the Escher Stairwell approach to help manage power; however, players might argue that they do not, as players would not play a strictly worse card if the alternative is available. Should the power creep of the game be allowed to arrest, it is doubtful that the power of the game as a whole would ever recede. Yet, the goal here is to provide a conservative metric to see the base rate of power creep. Thus, we define the base Escher Stairwell Power Creep model as PowerCreep es ðB,W,y 1 ,y 2 Þ ¼ B y1,y2 À W y1,y2 Additionally, as MtG has increased the number of new card faces released each year let the normalized Escher Stairwell Power Creep model be defined as PowerCreep es norm ðB,W,F ,y 1 ,y 2 Þ ¼ jBy1,y2jÀjWy1,y2j jF y1,y2 j y 2 À y 1 and with functional reprints

Results
The first few years of the game saw no strictly better card faces released ( Figure 5). From 1997 to ≈2014 strictly better faces were released sporadically. However, starting in 2015 we see a marked shift in the number being released. This stands in stark contrast to the functional reprint policy, which spiked much earlier on. As for why this is the case, one might hypothesis the creation of a new format such as modern (2011) or WotC's attention being drawn to what is now the most popular format, commander, with the release of pre-constructed commander decks (2011). Since WotC designers generally work two to 3 years in advance, the (official) adoption of these two formats may have motivated the creation of more powerful cards later. Hypotheses aside, for one of the longest standing card games, the cumulative number of strictly better card faces over its 25 years seems remarkably low ( Figure 6). The strict version of our conservative metric for solely strictly better cards does not even pass one-hundred card faces. A relaxing of the metric sees an influx, however, even including reprints only nears 500. With more than 20,000 unique cards (even more card faces), this is less than 2% of the card pool. Including functional reprints pushes this over the 2% mark.
While the cumulative rate of power creep is not monotonic, it has a rising trajectory in the case of our base model. This is not surprising as to stabilize the rate, less than one new strictly better card face can be released in a year and to reverse it none for several years. Looking at the expanded model with functional reprints included, the cumulative rate of power creep take a substantial dip in the early 2000s; yet this rate is now recovering. Interestingly, although players point toward a new design philosophy starting in 2019 named "F.I.R.E" as the source of the mass bannings in 2020, the increase in the rate of power creep predates F.I.R.E. design (see Figure 7) (Stoddard, 2019). (Figure 7) With a year span of one, with the exception of the strict, reprint inclusive normalized Escher Stairwell Power Creep model, all models show an upward trajectory for the number of power crept faces released each year (Figure 8). This reflects the functional reprint influx from the Portal expansion block. Additionally, these models show that there are years where the number of strictly worse card faces at release and functional reprints thereof are greater than the strictly better card faces. However, even under the assumption that worse card faces can weaken the game, the cumulative normalized Escher Stairwell Power Creep model shows that power creep is nonetheless on an upward trajectory. Of note, this trajectory does not exist solely due to more cards being released each year (Figure 9). For a break down of power creep by type see Table 1.

Curious Examples of Power Creep
To highlight how exactly these definitions work in practice we find it useful to walk through two examples.
For the first example consider the faces, in release order, "Leonin Scimitar," "Veteran's Sidearm," "Honed Khopesh," and "Short Sword." The only differences between these cards are: Figure 7. The cumulative rate of power creep, that is, the total number of strictly better faces released divided by the duration of the games existence with (subfigures 7c, 7d) and without (subfigures 7a, 7b) functional reprints. While functional reprints substantially spike the cumulative rate, this rate nearly recovers by 2020.
1. "Veteran's Sidearm" costs two cmc, whereas all others only one, and 2. The name of each card.
"Leonin Scimitar" is not strictly better at release than "Veteran's Sidearm" as it was printed prior to it; consequentially 'Veteran's Sidearm" is strictly worse at release than the scimitar. "Honed Khopesh" and the "Short Sword" are functional reprints of the scimitar and are therefore part of F B .
For the second example consider the faces, in release order, "Regrowth," "Elven Cache," "Recollect," and "Bala Ged Recovery." "Elven Cache" is strictly worse at release than "Regrowth." While "Recollect" is strictly better than "Elven Cache" it is not strictly better at release as it is strictly worse than "Regrowth" printed prior. Thus, "Recollect" is strictly worse at release. "Bala Ged Recovery" is a functional reprint of "Recollect" as this metric occurs on the face level. Of note, "Recollect" has face Ø for its second face, whereas "Bala Ged Recovery" has the second face "Bala Ged Sanctuary." Players may say that makes the card strictly better. However, as discussed below with companions, wordiness, etc., it is not without reason that cards without the face Ø may have an intrinsic down side, for example, if WotC prints a card that prohibits opponents from casting them.

StrictlyBetter as a metric
With any metric there are strengths and weakness; our definition is no exception. Foremost, as it is largely dependent on the conservative metric of StrictlyBetter, this value moving forward could be manipulated by WotC. Knowing that their game could be evaluated in such a way may intentionally or unintentionally bias the company or its employees to design cards with slight tweaks to the rules text to reduce the size of strictly better card faces. Additionally, as mentioned in the general conceived notion of "strictly better," this metric does not attempt to address card faces with equivalent cmc and "upside." Further, if a card face is strictly better than several others it is only counted once. In this metrics favor, as "upside" is not considered, this metric can be calculated exhaustively without human annotation. To its benefit, our definition of strictly better does not punish WotC's innovation, for example, when a new expansion adds cards to the card pool that have new keywords or rules text, those card faces cannot be strictly better. Further, the expanded model, which includes functional reprints adheres to WotC's own interpretation of power creep as well as showing substantial mitigating during the early 2000s (Stoddard, 2013b).
To continue questioning our notion of strictly better, WotC has shown repeated interest in card faces with rules text that address the cmc of a card, for example, "Void Winnower." Such cards existence required the addition of "given no continuous effects or triggers" to the function StrictlyBetter as otherwise an even cmc face that is strictly better than an odd cmc face would, in some instances, be unplayable. Such is evident more so with the even more recently printed companion cards.
Companion, is a keyword in the rules text which can effect deck construction prior to the game actually being played. Since "Gyruda, Doom of Depths," "Obosh, the Preypiercer," and "Lurrus of the Dream-Den" are concerned with cmc, one could argue that any change in cmc is actually not strictly better.
Along this line of reasoning, one could postulate that any change in mana cost would be potentially detrimental due to another companion "Jegantha, the Wellspring" which cares specifically about the type of pips in the mana cost. That is, despite mechanics, like devotion, which care about types of pips in a face's mana cost, our definition generally overlooks this through the given statement in the StrictlyBetter definition (namely, the function manaefficient). Fortunately, there is not yet a card face with companion that is concerned with power and toughness.
Thus, a possibly more sound definition would consider only card faces which are equivalent in all ways except for a) combat worthiness and b) having rules text specifically with "upside." However, WotC has shown putative interest in designing faces with effects that concern the amount of rules text as well, that is, "Alexander Clamilton." While players may dismiss this as a "joke" card, the companion "Zirda, the Dawnwaker" concerns itself with other cards having activated abilities (a form of 'wordiness'). Therefore, the notion of a "strictly better" card face, which is true 100% of the time regardless of context, is moot. Tangentially, while this metric is not directly usable in other TCGs its core tenants might be, that is, finding card faces which are both directly playable and are otherwise identical except for combat worthiness and cost to cast/play the card. Lastly, there are instances where WotC prints two faces on separate cards during different years each of which are strictly better than a third face. In other words, two (or more) equivalent designs with different names that are better than another older design, but one of the first card faces is a functional reprint of the other(s). The expanded model will count each unique instance, while the base model does not; although, such occurrences are rare. This may be a point of contention with players perspective of what power creep is. Nevertheless, we feel confident that the conservative metric of strictly better, as well as the base and expanded models of power creep, introduced here is sufficiently stringent to be of use.

PowerCreep as a metric
Given the definition provided here one could easily ask the converse, that is, "What about faces which are strictly worse?" While a fascinating question, adding "strictly worse" cards to the card pool does not weaken the game. Outside of the two limited formats, players are not obligated to play with these cards. Given the choice between two card faces, one being strictly worse than the other, for competitive play the choice is trivial. In other words, introducing weaker faces does not mitigate the existence of stronger faces in the game. Correspondingly, it does not contribute to the definition of PowerCreep. Nonetheless, as we attempt to provide a conservative metric to find the base level of power creep in the game, we included it in the Escher Stairwell model (Figures 8 and 9). While greatly reduced, in three of the four variants the yearly power creep maintains a positive slope ( Figure 9) and always has a positive slope in the cumulative variants (Figure 8).
On the other hand, many of the card faces identified here as strictly better do not see much, if any, play. Therefore, although these cards are strict evidence of power creep, players may attest that these are not the cards which will cause the game to collapse inward on itself (Rosewater, 2016).
In the introduction, we elude to the relevance of resource cost in the definition but not sole determinate of power creep. Such raises the question of whether or not this formulation-power creep as the re-visitation of design space at equal or lower resource cost-generalizes to mediums outside of TCGs and CCGs. For example, consider a firstplayer shooter (FPS) video-game where a gun, in an update, has its reload rate lowered. The "resource cost" of the gun might be defined by any number of additional variables (paid downloadable content, time to unlock via a quest or mission, weight to equip the item, etc.) or the item might have been free and accessible since the game began. Surely, a tweak to the item's properties qualifies as re-visitation of design space. While the definition provided here is tailored for MtG, adjustment may be applicable elsewhere.

Power Creep as Defined versus Public Perception
As our definition of power creep utilizes a conservative definition of card faces which are strictly better (StrictlyBetter), PowerCreep generally addresses card faces which are not the source of scrutiny. The 2018 card "Wishcoin Crab" and it's first face being strictly better than the first face of the 2011 "Armored Cancrix" is not what pops into players head when they lambaste the game of power creep. Rather, they think of cards like "Oko, Thief of Crowns" and "Omnath, Locus of Creation," cards they believed were printed to increase sales (Chen et al., 2018;Perreault et al., 2021;Zuin & Veloso, 2019). Consequentially, given how restrictive our metric is, PowerCreep is the floor. Further, it is generally accepted that power creep will exist in any long running game; however, the rate of which is what concerns players. Finding an increasing rate, with a notable spike around 2014-2015, shows a putative shift in design prior to WotC announcing F.I.R.E design in 2019 (Stoddard, 2019).

Future Directions
Here, we presented three core models: the base model which solely concerns itself with strictly better at release card faces, the functional reprint inclusive model which factors in non-detrimental functional reprints, and the Escher Stairwell model which considers the quantity of strictly better cards at release versus those that are strictly worse. Each of these models may relax the sub-type restriction as well as normalize the rate to the newly release card faces each year. Currently, they conservatively capture the existence of power creep within the game, although it is marginal. Yet, there are a myriad of other factors to incorporate into these models to more accurately represent power creep. Such include the rate of card faces banned and how many formats in which the bans occur, card faces with "upside," card faces with "downside," format specific terms, etc.

Summary and Conclusions
Here, we introduce a conservative metric for defining strictly better faces of cards, a base, and expanded model of power creep in the game MtG. Although it does not touch the cards which players most likely imagine it would, it does capture a swath of cards reasonably distributed across types. The most relaxed definition suggesting only a total of less than 2.5% of cards printed being strictly better or functional reprints thereof, this metric seemingly confirms that WotC has successfully limited its occurrence, thus preventing the game from "failing" due to power creep. Conversely, it also shows a stark increase in the printings of strictly better cards and an increase rate of power creep that may relate to the unprecedented number of bannings in 2020 (partially due to more cards being released per year). Our work provides a solid foundation for 1.) more encompassing definitions of power creep as well as 2.) analysis of power creep in relation to other pertinent topics (e.g., correlating with sales data, amount of bannings, etc.). We believe these metrics are a good step forward at codifying power creep.
If we could press upon the reader three points to take away from this paper, they are as follows: 1. A perspective shift of viewing power creep as "greater power" in the game towards the re-visitation of design space at equal or lower resource cost may be beneficial in conceptualizing what constitute power. 2. A sensible definition of power creep is achievable without defining power via extreme constraints on qualifying game piece candidates. 3. Although many identified power crept game pieces here will likely be scoffed at by the players of the game due to "lack of power," an automated search for identifying such pieces at the perceived lowest levels of power may further claims of power creep being pervasive. Power creeping the worst pieces in the game strongly suggest that such is necessary due to an overall increase of power in the game. Even when factoring in intentional decreases of power, the number of strictly better card faces released result in a positive rate of power creep.