Querying feminicide data in Mexico

The full extent of feminicide in Mexico remains unknown. When available, data on the gender-related killing of women and girls are often incomplete, inaccurate, or inexplicable. In this article, a sociologist (Saide) and a statistician (Maria) query feminicide data in Mexico. Drawing on Timnit Gebru et al.’s ‘datasheets for datasets’ and Sarah Holland et al.’s ‘data nutrition label’ frameworks, we zoom in on the two primary governmental sources measuring feminicide in the country, the mortality records processed by the Instituto Nacional de Estadística, Geografía e Informática (INEGI) and the alleged feminicide investigation files published by the Secretariado Ejecutivo del Sistema Nacional de Seguridad Pública (SESNSP). In the discussion, we shed light on two noteworthy remarks. First, the discordance between INEGI and SESNSP data, whereby we outline four crucial variations: naming, underreporting, comparability, and availability. Second, the shortcomings of these data sources in measuring feminicide as we understand it sociologically. In other words, neither explicitly gauge the ‘gender-related’ motivation underlying the crime. Instead, what data from INEGI and SESNSP currently provide us with are discordant approximations of the phenomenon, aligning with what Sandra Walklate and Kate Fitz-Gibbon define as ‘thin’ feminicide counts. This contribution seeks to act as a guide to better understand feminicide data in Mexico, to enhance effective communication between data creators and users concerned with data-making practices, and to ignite the querying of data engaging with social justice and accountability against feminicide and beyond.


Introduction
Feminicide 1 (or femicide) is broadly defined as the gender-related killing of women and girls.Despite increased awareness about the issue worldwide, many fundamental questions about feminicide, such as its magnitude and trends, are not well understood because feminicide data are currently incomplete, inaccurate, or inexplicable in many instances.Globally, some outlying reasons for the current state of the data relate to the lack of political will, institutional patriarchy, wilful ignorance, limited resource allocation, and the difficulties of translating 'gender-related' motivation to a statistical category (Dawson and Carrigan, 2020;D'Ignazio, 2024;Gargiulo, 2022;UNODC, 2022).
In this article, a sociologist (Saide) and a statistician (Maria) query Mexico's feminicide data produced by the state.The structure of the article is as follows.First, we provide a brief background of the first attempts to document feminicide in the country and present the normative frameworks used to criminalise the intentional killing of women and girls.Next, we draw inspiration from the 'datasheets for datasets' (Gebru et al., 2021) and the 'data nutrition label' (Holland et al., 2020) frameworks to explore the two primary governmental sources measuring feminicide in Mexico, the mortality records processed by the Instituto Nacional de Estadística, Geografía e Informática (National Institute of Statistics and Geography; INEGI) and the alleged feminicide investigation files published by the Secretariado Ejecutivo del Sistema Nacional de Seguridad Pública (Executive Secretary of the National Public Security System; SESNSP).In the discussion, we shed light on two noteworthy remarks.First, the discordance between INEGI and SESNSP data, whereby we outline four crucial variations: naming, underreporting, comparability, and availability.Second, the shortcomings of INEGI and SESNSP in measuring feminicide as we understand it sociologically.In other words, neither explicitly gauges the 'genderrelated' motivation underlying the crime.The killing of a woman 'because she is a woman' is the result of a continuum of violence rooted in multiple and intersectional forms of discrimination and oppression against women.Yet, despite unprecedented efforts, translating atavistic stereotyped gender roles and ongoing unequal power dynamics into statistical variables remains a challenge.Instead, what data from INEGI and SESNSP currently provide us with are discordant approximations to the phenomenon, aligning with what Walklate and Fitz-Gibbon (2023) define as 'thin' feminicide counts.This contribution seeks to act as a guide to better understand feminicide data in Mexico, to enhance effective communication between data creators and users concerned with data-making practices, and to ignite the querying of data engaging with social justice and accountability against feminicide and beyond.

Counting feminicide in Mexico
The first sketches of feminicide data originated from activists on the ground.In 1993, Esther Chávez Cano, a journalist working for a local newspaper in Ciudad Juárez, started to document instances of extreme gender-based violence, including feminicides, in the face of omissions and negligence on the part of local authorities (see Tabuenca, 2014).Chávez Cano triggered what became the first feminicide documentation efforts in the country and the only one to record the killings of women and girls in Ciudad Juárez from 1993 to 2003.During this period, Chávez Cano collected reports from local newspapers about female murder victims in Ciudad Juárez in a physical archive.She organised the archive chronologically and systematised the information into four categories: victims found dead, non-governmental organisations, government statements, and arrests (New Mexico Archives Online, n.d.).
Thus, for activists, counting feminicide was (and still is 2 ) one of the many strategies available to make it visible, and efforts to document the extent of feminicide grew with time.Noted feminists, activists, and scholars -such as Marcela Lagarde y de los Ríos and Julia Monárrez Fragoso -and local, national, and international organisations leveraged feminicide into normative frameworks to criminalise these killings.

Normative frameworks to criminalise feminicide
Not until the violent killing of women and girls was recognised and typified as a crime at a federal level did the Mexican State begin to count feminicide as such.On 14 July 2012, 'feminicidal violence' (violencia feminicida) and 'feminicide' (feminicidio) were added to the Ley General de Acceso de las Mujeres a una Vida Libre de Violencia (General Law on Women's Access for a Life Free of Violence) and the Federal Criminal Code.'Feminicidal violence' (Art.21) is defined in the latter as 'the most extreme form of gender violence against women, produced by the violation of their human rights in public and private spheres and formed by the set of misogynist actions that can lead to the impunity of society and the State and culminate in homicide and other forms of the violent death of women'.'Feminicide' (Art.325), is the crime committed by anyone who deprives a woman of her life for gender-based reasons.Gender-based reasons exist when any of the following circumstances are present: 1.The victim shows signs of sexual violence; 2. The victim has been inflicted with shameful or degrading injuries or mutilations before or after the deprivation of life or acts of necrophilia; 3.There are antecedents or information about any type of violence in the perpetrator's family, work, or school environment against the victim; 4.There has been a sentimental, affective, or trusting relationship between the perpetrator and the victim; 5.There is information that establishes there were threats related to the criminal act, harassment, or injuries by the perpetrator against the victim; 6.The victim was held incommunicado for any amount of time before being deprived of life; 7. The victim's body is exposed or exhibited publicly.
While feminicide was incorporated into the Federal Criminal Code in 2012, it is worth noting that it was not until 2020 that all states included feminicide as a criminal offence within their respective criminal codes -Chihuahua, where Ciudad Juárez is located, was the last to follow suit.Mexico's federal system of government, consisting of 31 states and Mexico City, poses a significant challenge in generating and comparing crime data (Mobayed Vega et al., 2023).The autonomy, freedom, and sovereignty yielded to each state means 32 slightly different ways of defining, categorising, and regulating feminicide (Araiza Díaz et al., 2020).While some states adhere to most of the seven circumstances outlined in the Federal Criminal Code, others have either introduced additional ones or modified certain aspects within their state's criminal code.By 2021, a total of 18 different circumstances are considered to classify the intentional homicide of a woman as a feminicide in the country (Data Cívica and Intersecta, 2022).In other words, what is regarded as a feminicide in Quintana Roo might not be in Querétaro.Such variances hinder sub-national comparisons of feminicide trends and patterns.
In 2018, to circumvent legislation variations, the Centro Nacional de Información (National Information Centre; CNI) 3 agreed on guidelines for registering and classifying alleged feminicide crimes specifically for statistical purposes. 4This initiative seeks to ensure statistical integration despite the heterogeneity of the states' criminal codes.These circumstances derive from the federal and 32 states' criminal codes and INEGI's Norma Técnica para el Registro y Clasificación de los Delitos del Fuero Común para Fines Estadísticos (Technical Standard for the Recording and Classification of Common Crimes for Statistical Purposes). 5To summarise, the Mexican State has a set of circumstances to define feminicide for legal purposes -which vary slightly according to each state criminal code -and another set of circumstances explicitly designed to standardise feminicide for statistical purposes, irrespective of judicial proceedings.

Querying feminicide data in Mexico
Based on who produces the information, feminicide data can be categorised as 'official' and 'citizen-led'.Official feminicide data are generated by government institutions, typically sourced from administrative records and reports from police and law enforcement agencies.Citizen-led data -also framed as 'counterdata' (D'Ignazio et al., 2022) or 'data activism' (Milan and Gutiérrez, 2015) -are generated by engaged civil society using a variety of sources, including news media reports.Citizen-led data against feminicide aim to account for what has yet to be counted by official means (D'Ignazio, 2024;Madrigal et al., 2019;Mobayed Vega, 2022).
Catherine D'Ignazio (2024) thoroughly explores these practices in Counting Feminicide: Data Feminism in Action, whereby she considers 'missing data' a twin of official data.D'Ignazio defines missing data as those that are 'neglected to be prioritized, collected, maintained and published by institutions, despite political demands that such data should be collected and made available'.Missing data take shape as either a lack of contextual information from documented records or are totally absent (Gargiulo, 2022).Indeed, there is missing data in official data about feminicide 'precisely because grassroots feminist, Indigenous, Black, queer and women's groups and social movements make demands that such data be produced' (D'Ignazio, 2024).
Our contribution recognises and values the paramount influence of citizen-led data practices for making feminicide visible and accountable in Mexico (Collectif Féminicides Par Compagnons ou Ex et al., 2023;Mobayed Vega, 2022).However, we have chosen to query government data because they are often used for global comparability, research, advocacy, and policymaking.Understanding precisely what is (and is not) accounted for by these official data sources underscores the vital influence citizen-led efforts continue to have.
Two primary official sources are central to feminicide data as they give us an estimate of how many women are killed because of their gender: (1) administrative data on open investigation files about alleged femicides from state Attorney General's offices published by SESNSP, and (2) mortality data produced by the Department of Health and processed by INEGI.
Statistics about feminicide derived from the mortality data published by INEGI and the aggregate investigation file information published by SESNSP tend to mismatch, creating what Tavera Fenollosa (2008) refers to as a 'guerra de cifras', a war of statistics.These two sources' discordance (Suárez Val, 2020) is unsurprising: they measure different concepts using different information sources and methodologies and were designed with distinct uses in mind.Researchers and practitioners wishing to use official data sources related to feminicide in Mexico need to understand why these sources diverge and make informed choices about whether and how to use information from a particular source.

Data documentation frameworks: Datasheets for data sets and data nutrition label
Data users may not always have access to detailed contextual information when trying to make sense of feminicide data depending on the time, locale, or sub-population they are examining.To aid researchers and practitioners in making informed choices, we examine the data published by INEGI and SESNSP, drawing inspiration from the 'datasheets for datasets' (Gebru et al., 2021) and the 'data nutrition label' (Holland et al., 2020) frameworks.Both represent complementary efforts to formalise data documentation practices.The datasheets of Gebru et al. (2021) were designed for data producers to communicate information about database construction with data users.Datasheets are an adaptable and flexible tool arranged around a series of questions documenting the motivation, composition, collection, pre-processing/cleaning/labelling, use, and distribution of data.Datasheets are meant to be published alongside data, but the authors note that the contents of the datasheet should be considered before any data collection.Rather than focusing on data creators, the nutrition labels of Holland et al. (2020) were created for 'data specialists' -anyone who produces or uses data.Like the datasheet, the nutrition label is meant to be a flexible tool and includes modules on metadata, provenance, variables, statistics, pair plots, probabilistic models, and ground truth correlations.
While both frameworks were designed with specific applications to machine learning or artificial intelligence in mind, their utility is model agnostic, and their contents provide valuable tools for analysing feminicide data in Mexico.We adapt the general structure of the datasheet for our analysis, querying the motivation, composition, collection process, distribution, and maintenance of the data sets published by INEGI and SESNSP.To these categories, we add a section focused on how the data sets are utilised to study feminicide and the associated challenges with doing so.We draw the framing for our analysis from the nutrition label.While the datasheets approach was designed as a tool for data set creators, we approach this inquiry from the perspective of downstream actors who analyse and make sense of the data published by INEGI and SESNSP in our work.Our inquiry helps expand on official documentation from these sources and provides a user-centred analysis of how these data sets can aid in studying feminicide in Mexico.

For what purpose was the data set created?
The mortality data published by INEGI forms part of Mexico's civil registration and vital statistics system (CRVS).CRVS systems register information about vital events such as births, deaths, marriages, and divorces.Robust CRVS systems can provide real-time information about health and disease, making them an invaluable tool for designing public health policies, monitoring progress towards health goals, allocating financial and material resources, and targeting interventions at high-risk sectors of the population (Cobos Muñoz et al., 2020;Lopez et al., 2020).In addition to their utility for monitoring health outcomes, CRVS systems are essential for ensuring that public policies are designed to serve the needs of all members of the population and to guarantee human rights protections.

Who created the data set?
The data set is published by the Instituto Nacional de Estadística, Geografía e Informática (National Institute of Statistics and Geography; INEGI); however, INEGI does not produce the data.Instead, INEGI compiles mortality data from third-party informants drawing on various sources (INEGI, 2016).

What do the observations that comprise the data set represent?
The observations in the data set are derived from death certificates, and each row represents information about one individual's death.

Does the data set contain all possible observations, or is it a sample (not necessarily random) of observations from a larger set?
The data set aims to capture information about the universe of all deaths in the country.However, mortality statistics derived from the CRVS system are seldom 100% complete in any context, and Mexico is no exception (e.g.Glei et al., 2021;Murray et al., 2010).In addition, incompleteness is not necessarily randomly distributed over the entire universe of deaths, meaning that not all deaths have the same probability of generating a death certificate.For example, in the current militarised context, enforced disappearance can obfuscate deaths due to homicides because a death certificate cannot be issued if a body is never located and identified (Castro and Riquer, 2022;Intersecta, 2020).

What does each observation consist of?
As of 2021 data, each individual observation contains information about 59 distinct variables, which generally map onto the death certification form, the primary input data source. 6These fields include basic socio-demographic information about the deceased, such as their name (which is not published), age, sex, location of residence, whether they spoke an indigenous language, education level, occupational status, marital status, and affiliation with various healthcare providers.
They also include information about the death, such as the location and date, whether the individual was receiving medical care at the time, whether an autopsy was performed on the body, and codes for the primary cause of death and including up to three underlying causes.Additional information is also collected about the deaths of pregnant individuals. (Continued)

(Continued)
All violent deaths -deaths due to suspected homicides, suicides, or accidents -warrant the completion of additional fields.These fields include whether the death occurred while the individual was working, the type of location where the death occurred (e.g. at home, school, work, in public), the relationship with the presumed aggressor (in the case of homicide), the location where the injury that led to the death occurred (which may differ from the location of the death) and an indicator of whether the death was connected to domestic violence.The totality of information in these data aid our understanding of mortality trends in Mexico.However, even with this high level of detail, these records are still often missing information: nearly all fields contain some missing values, although the levels of missingness vary.

Is any information missing from individual observations?
Yes, nearly all variables have at least some missing information.The level of incompleteness varies; some have relatively small shares of missing information, whereas others have relatively high levels of missingness.Data documentation published by INEGI outlines the relevant missing value codes for each variable.

How was the data associated with each observation acquired?
INEGI does not produce their data but assembles, standardises, and publishes data from various third parties.The data associated with each observation is derived from many sources, including certificados de defunción (death certificates) and actas de defunción (death records) issued by Civil Registry Offices, forensic doctors and forensic medical services depending on the jurisdiction, and mortality and statistical notebooks issued by Public Prosecutor's Offices (Data Cívica and Intersecta, 2022;INEGI, 2023).

Over what time frame was the data collected?
Mortality data are available back to 1990, however, the system used to classify causes of death has changed over time.For data from 1990 to 1997, INEGI utilised the ninth revision of the WHO's International Statistical Classification of Diseases and Related Health Problems (ICD-9).Starting in 1998, INEGI began utilising the 10th revision (ICD-10) to classify causes of death and continues to do so. 7INEGI publishes preliminary aggregate mortality statistics in July and the finalised mortality microdata files in October each year.This information refers to deaths registered in the preceding year.

MAINTENANCE AND DISTRIBUTION How is the data set distributed?
The data are distributed by INEGI's Subsistema de Información Demográfica y Social (Subsystem for Demographic and Social Information): https://www.inegi.org.mx/programas/mortalidad/?ps = Micro datos#microdatos.

How frequently is the data set updated to include new observations?
New mortality microdata for deaths registered in the preceding year is available every October. (Continued)

USING ThE DATA SET TO STUDy fEMINICIDE How does the data set measure feminicide?
The data set does not measure feminicide as such.The mortality statistics can be used to identify killings of women and girls using information about the sex of the individual (information about gender is not collected) and their specific cause of death.Intentional killings (homicides) are typically identified from mortality data using base codes X85-Y09 from ICD-10.

What relevant contextual information is available in the data set?
The data set contains information about the sex of the individual, the specific cause of the violent death, the location where the person was killed (e.g.public spaces vs. private residence), the relationship between the victim and the presumed perpetrator, and whether the killing occurred in a context of domestic violence.

How has the data previously been used to study feminicide (or related topics)?
Information about the sex of the individual and their specific cause of death (ICD-10 base codes X85-Y09) has previously been used to analyse killings of women and girls by scholars, national NGOs such as Data Cívica, Intersecta, and México Evalúa and international and regional organisations.Some researchers have proposed methods to better proxy feminicide from the subset of mortality data that describes violent deaths of women and girls because INEGI does not measure feminicide as such.Torreblanca and Merino (2017) proposed operationalising a feminicide proxy as killings of females using the mortality data that correspond to one or more of the following characteristics: the homicide took place in the home (a proxy for the relationship between the victim and the presumed perpetrator), the death was due to sexual aggression, or the homicide was related to domestic violence.Frías (2023) further expanded this operationalisation strategy by including additional factors consistent with the definition of feminicide in the Federal Criminal Code using information captured by cause of death (e.g.deaths due to assault by corrosive substances).In addition, in the co-edited volumes Violencia contra mujeres.Sobre el difícil diálogo entre cifras y acciones de gobierno (Castro and Riquer, 2020, 2022) offer a thorough contextual analysis to design better public policies against violence and feminicide.

Does the data set have any limitations that may impact research about feminicide?
Concerning the study of feminicide, two key variables have extremely high levels of underreporting.The domestic violence indicator, first introduced into the database in 2003, has been left unspecified in over 99% of suspected homicides of women from 2019 to 2021; overall, this information is missing in approximately 94% of all alleged female homicides from 2003 to 2021. 8The variable documenting the relationship between the presumed perpetrator, introduced to the database in 2012, is unspecified in over 96% of suspected homicides of women from 2012 to 2021. 9Beyond missing contextual information, not all deaths result in a death certificate or death record and, therefore, cannot be included in the mortality statistics published by INEGI.The current militarised context creates additional circumstances that may result in the underreporting of homicides (and feminicides) in the CRVS system: enforced disappearances, clandestine and common graves, and the ongoing forensic crisis (Intersecta, 2020).If a body cannot be located and identified, a death certificate cannot be issued, and a corresponding record cannot be published in the mortality statistics.The extent and patterns of underreporting of homicides (and, by extension, feminicides) in mortality statistics are currently unknown.

For what purpose was the data set created?
The data set forms part of the Mexican government's crime statistics reporting.

Who created the data set?
The data set is published by the Secretariado Ejecutivo del Sistema Nacional de Seguridad Pública (Executive Secretary of the National Public Security System; SESNSP), however, SESNSP does not produce the data set itself.Instead, it aggregates and publishes data from the Public Prosecutor and Attorney General's Offices across Mexico's 32 states.

What do the observations that comprise the data set represent?
The observations in the data set are aggregate statistics of the number of investigation files (carpetas de investigación) opened in a particular jurisdiction (state or municipality) in a specific month for a particular type of crime (feminicide, homicide, etc.).These statistics refer to the number of opened cases, not the number of victims whose cases are being investigated, as a single case may refer to more than one victim.

Does the data set contain all possible observations or is it a sample (not necessarily random) of observations from a larger set?
The data set should include all investigation files opened by distinct entities across jurisdictions, assuming this information is shared by Public Prosecutor's Offices and Attorney General's Offices to SESNSP.While the data may represent a complete enumeration of all open cases, many crimes do not result in an investigation and, therefore, an investigation file.Impunity and resource constraints are two barriers to opening investigation files (Castro and Riquer, 2022).

What does each observation consist of?
SESNSP publishes three distinct files that provide information about individual crimes such as homicide or feminicide: counts disaggregated at the municipal (Cifras de Incidencia Delictiva Municipal), state (Cifras de Incidencia Delictiva Estatal), and federal level (Cifras de Incidencia Delictiva Federal), as well as a summary of state-level statistics compiled directly from reports to the local authorities in each state (that is, excluding any files opened at the federal level; Cifras de Víctimas del Fuero Común).
Each file provides slightly different information based on the level of geographic disaggregation (e.g.state-vs municipality-level data).The Cifras de Víctimas del Fuero Común file is often used in research because it provides more information about victims' identities.In its current form, these data contain 21 variables that give information on the month and year the case was opened, the state where the investigation file was opened, the type and subtype of crime (13 types and 15 subtypes in total), the legal right impacted by the crime, the modality of the violence, the sex of the victim, and whether the victim was a minor or an adult.In the case of feminicide, only the umbrella category of feminicidio is used, and no additional subtypes are identified.Still, four distinct modalities of violence are considered: firearms, cold weapons, other implements, and not specified.The Cifras de Incidencia Delictiva Municipal and Cifras de Incidencia Delictiva Estatal files do not currently provide information about the sex of the victims or whether they are minors or adults.

Is any information missing from individual observations?
Some information is not always fully specified for all observations.For example, in the Cifras de Víctimas del Fuero Común file, some types and subtypes of crimes are grouped into "other" categories, the modality of violence is unspecified, and the sex or age group is unspecified or unidentified.

COLLECTION PROCESS What actors were involved in the data collection process?
Various actors are involved in the data collection, namely agents from the Public Prosecutor's and Attorney General's Offices nationwide (referred to as 'los/as MPs' in Mexico).While the prosecutor heads decision-making and oversees the team in these offices, the agents conduct and coordinate the crime investigation, decide on the exercise of criminal action, order the relevant proceedings to prove or refute any accusation, and further categorise a crime and the responsibility of the perpetrators.

How were the data associated with each observation acquired?
Every month, SESNSP solicits information about the number of investigation files opened for various criminal acts from the Public Prosecutor and Attorney General's Offices across Mexico.This information forms the basis for the aggregate statistics published by SESNSP, although the methods of soliciting and synthesising this information from the various entities are not entirely clear.

Over what time frame were the data collected?
Data are available monthly going back to January 2015, when, through the CNI, a new methodology to record crime incidence was implemented (which resulted in the Instrumento de Registro, Clasificación y Reporte de los Delitos y las Víctimas CNSP/38/15).The latter expanded the offences catalogue, including gender-related crimes such as feminicide.

How frequently is the data set updated to include new observations?
Crime incidence statistics are updated monthly to include information about new investigation files opened in the previous month.

USING ThE DATA SET TO STUDy fEmInICIDE How does the data set measure feminicide?
Despite using the word feminicidio directly in the data, the statistics about feminicide published by SESNSP do not measure feminicide as we understand the phenomenon sociologically (as we will discuss in the next section).

What relevant contextual information is available in the data set?
Contextual information is scarce in the data set.Limited information about the victims is only available in the Cifras de Víctimas del Fuero Común (sex of the victim and whether the victim is a minor or an adult).Geographic information in all three files refers to the locale where the investigation file was opened rather than where the violence occurred or where the victim resided.None of the files distinguish between different types of feminicides or normative hypotheses being investigated (e.g.whether the body was found exposed or demonstrated signs of sexual violence). (Continued)

Discussion
Applying the data documentation frameworks shed light on two noteworthy remarks that we will address in this section.First, following Helena Suárez Val's (2020) concept of discordant data in Uruguay, we outline four crucial variations between INEGI and SESNSP's data.Second, we examine the shortcomings of INEGI and SESNSP in measuring feminicide as we understand it sociologically.

INEGI and SESNSP: Discordant data
Building on the preceding section, INEGI and SESNSP data sources are distinctly discordant.Although the variations stand out in the data frameworks, we find it significant to highlight the following.
Naming.INEGI does not count for 'feminicide' per se in their database, but rather the intentional killing of women.INEGI's data can be used to identify killings of women and girls using information about the sex of the individual 10 and their specific cause of death, 11 but there is no checkbox for whether a particular death was a feminicide, nor is feminicide neatly defined by a specific ICD-10 code or values of other existing fields.Thus, when using the mortality data, referring to 'violent killings of women and girls' rather than feminicides is most correct, although the two are related.
Pertaining to the SESNSP, although their database does include feminicide, naming the crime as such remains a challenge.Despite efforts of standardisation, what might be considered a feminicide in one state may not meet the normative hypotheses outlined in the criminal code of another state.
Underreporting.The 'cifra negra' (dark number) in Mexico, that is, the percentage of unreported crimes, is as high as 93.2% (INEGI, 2022).Considering missing data take

How has the data previously been used to study feminicide (or related topics)?
To study feminicide, researchers can filter the data only to include observations where the type of crime (tipo de delito) is feminicidio.

Does the data set have any limitations that may impact research about feminicide?
The crime incidence statistics published do not directly measure the number of feminicide victims but rather allow data users to identify the aggregate number of investigation files for feminicide opened in a particular locale during a particular month.These data are likely to undercount the actual number of victims because it counts cases rather than victims (a case may relate to more than one victim) and because many feminicides do not result in investigations due to resource constraints and impunity, among other factors.These statistics may also erroneously include some cases that, upon investigation, were determined not to be feminicides.However, it is impossible to identify how many cases have been reclassified as other crimes because of the structure of the database.
Finally, alongside measurement challenges, the aggregate nature of the data and the general lack of contextual information about victims and the violence they suffered make it difficult to examine the dynamics of violence in much detail.
shape as either absent contextual information from documented records or records that are totally lacking (Gargiulo, 2022), both INEGI and SESNSP grossly underreport feminicide.Although we have addressed these limitations in detail in the previous section, two key variances between them are worth stressing, which echo their primary data source.
In the case of INEGI, death certificates in their current form are not well suited to capture this information in its full complexity, and the data published by INEGI reproduce these limitations.For example, the presence of signs of sexual violence is considered a characteristic of feminicide in the criminal codes of all 32 states and the Federal Criminal Code.However, information about sexual violence only appears in the mortality statistics when the specific case of death has been attributed to 'sexual assault by bodily force', missing any instances where an individual experienced sexual violence but died from a different cause.Furthermore, the mortality statistics cannot address deaths for which a death certificate was never issued.Even if the operationalisations proposed by Torreblanca and Merino (2017) or Frías (2023) perfectly proxied the phenomenon of feminicide, feminicide statistics calculated using documented data alone would still undercount the true magnitude of the phenomenon due to death certificate incompleteness.
Regarding SESNSP, crime incidence, including feminicide, can only be measured insofar as an investigation file is opened.Rather than directly providing information about the phenomenon of feminicide, these data narrate a partial story about the investigation of such cases.Partial because not all instances of feminicide result in an investigation file opened under this criminal categorisation.In addition, as shown in various reports, evidence about the context during, before, and after the feminicide either gets lost or is never registered throughout the investigation process (Barragán and Zerega, 2022;Castro andRiquer, 2020, 2022;Fragoso, 2019;Impunidad Cero, 2022).Lack of financial, institutional, or servicerelated resources mixed with impunity and lack of political will impedes the completeness of both data sources, hindering an intersectional analysis.
Comparability.Comparing feminicide geographically and longitudinally persists as an obstacle.About the former, Roberto Castro and Florinda Riquer's co-edited volumes on VAWG and GBV in Mexico thoroughly explore the difficulties of determining the impact of public policies in certain municipalities addressing VAWG and GBV.Castro andRiquer (2020, 2022) also highlight the need to reinterpret gender violence in the context whereby insecurity and drug-related violence have drastically changed the landscape of certain regions.Evidently, the latter has a profound impact on how information is recorded.On that note, data published by SESNSP can be challenging to compare across jurisdictions as each state defines feminicide distinctly in its state criminal code, and different Attorney General and Public Prosecutor's Offices have distinct protocols for investigating feminicide.
When it comes to comparing feminicide longitudinally, INEGI is a more comprehensive source as it has been recording the intentional killing of women and girls since 1990.In contrast, SESNSP has only been generating data on feminicide since 2015.
Availability.Data access and transparency are of utmost importance for accountability.Although one can download INEGI and SESNSP data relatively easily, accessing the metadata is less straightforward, particularly for SESNSP.Also, the methodology followed by this source to commensurate feminicide data across states remains somewhat opaque.Another timely discordance relates to when their information is made available.While SESNSP publishes updates monthly, INEGI does so yearly.In addition, the lag between issuing a death certificate and the appearance of a record in the microdata file varies between 10 months (if the death occurred in December) and 22 months (if the death occurred in January).Finally, it is more likely for SESNSP data to considerably change as investigation files that might have been opened as alleged feminicide might be closed under a different crime once the investigation is concluded.

INEGI and SESNSP: Sociological shortcomings in measuring feminicide
Sociology has long been questioning the role of quantification in governing social life and social order (see Mennicken and Espeland, 2019;Nelson Espeland and Stevens, 2008).Although these are theoretical debates outside the scope of this article, it is worth recognising a crucial epistemological tenet: numbers do not objectively measure social phenomena, but they are socially and politically co-constituted and co-produced.
Scholars have applied this lens to critique the shortcomings of how VAWG and GBV are measured (Dawson and Carrigan, 2020;D'Ignazio, 2024;Fuentes and Cookson, 2020;Gargiulo, 2022;Merry, 2016;True, 2015;Walklate et al., 2020).For example, in The Seductions of Quantification, Sally Engle Merry (2016) thoroughly critiques the consequences of translating complex social phenomena into simplified, stripped-of-context, comparable metrics.Quantification seduces, argues Engle Merry, and it establishes value-free numbers and facts that stand as solid truths.Indeed, because 'the technical is always political' it is essential to understand how data are collected, compared, and socialised.
Feminicide emerged as a category from the bottom up.We would not be discussing how to measure this crime for statistical purposes had it not been for the constant leverage of activists in the past four decades.Yet translating feminicide into data remains an ongoing challenge.Neither SESNSP nor INEGI currently measures feminicide as we understand it sociologically, referring to a repertoire of violent practices and which necessitates deep contextual information to understand the role that gender plays in a particular death.
In an article concerned with feminicide and social science research, Corradi et al. (2016) consider the sociological approach to feminicide as that focusing 'on the examination of the features special to the killing of women that make it a phenomenon, per se' (p.979).Looking at detailed, contextual evidence that could point us to the 'gender-related' motivation underlying the crime, including data, remains vital for this endeavour.Following this line of reasoning, we argue neither INEGI nor SESNSP data allow us to either fully or explicitly grasp the continuum of violence rooted in multiple and intersectional forms of discrimination and oppression against women.Their methods are proxies of the phenomenon.Given their degree of underreporting and missingness, both operationalisations include some cases that a data user would not consider as feminicides and exclude others that would otherwise be regarded as feminicides if existing fields were not missing values or additional contextual information was available.Moreover, statistical data are categorised based on the assigned at-birth sex of the victim with the binary male/ female, troubling an intersectional analysis that would account for transfeminicides.
Finally, despite the Mexican state's efforts to produce rigorous data on feminicide (which we recognise and value), what is currently available aligns with what Sandra Walklate and Kate Fitz-Gibbon (2023) define as 'thin' feminicide counts: those that lack context or meaning.In contrast, drawing on the conceptualisation by Walklate et al. (2020) of 'slow femicide', 'thick' counts would comprehensively include a wide range of sources to underscore the everyday consequences of male violence against women leading to their killing.

Conclusion
Despite growing efforts, feminicide data remain incomplete, inaccurate, and inexplicable in Mexico (and globally).In this article, we queried the two primary governmental sources measuring feminicide in Mexico.Drawing on the 'datasheets for datasets' (Gebru et al., 2021) and the 'data nutrition label' (Holland et al., 2020) frameworks, we shed light on the discordance between government data sources and the limitations they pose to understanding feminicide sociologically.It is worth noting that we are not suggesting feminicide cannot be statistically measured, but that we are yet to achieve this in its full scope given the discordances we highlighted previously.
To conclude, we would like to bring valuable insight shared by a data activist when asked what made data actionable: 'when data are socialised, that is, when we [the citizens] become the owners of that information'.Querying the motivation, composition, collection process, distribution, and maintenance of the data sets published by INEGI and SESNSP is both a way of 'owning the data' and a tool to demand governments to name, document, and count feminicide.One we hope trickles beyond the Mexican context.INEGI (2000INEGI ( -2022)).9. Based on our own calculations using the data published by INEGI (2000INEGI ( -2022)).10.The mortality data do not provide information about gender, a barrier to intersectional analyses, or an understanding of the gendered dimensions of feminicide inherent in both legal and sociological conceptions of this form of violence.11.Intentional killings (homicides) are typically identified from mortality data using base codes X85-Y09 from the external causes of morbidity and mortality chapter of ICD-10, which relate to various forms of assault where the intent of the perpetrator is known.

What actors were involved in the data collection process?
Civil Registry Offices, Forensic Medical Services, and Public Prosecutor's Offices nationwide.The mortality data published by INEGI in 2022 was based on information from 378 Civil Registry Offices, 106 Forensic Medical Services, and 227 Public Prosecutor's Offices that had recorded at least one homicide (INEGI, 2023).
2. At present, digital maps of feminicide created by activists such as Maria Salguero (Yo Te Nombro), Sonia Madrigal (La Muerte Sale por el Desierto), and Ivonne Ramírez, (Ellas Tienen Nombre) remain a vital source for documenting feminicide in Mexico.Like Chávez Cano, their information is sourced primarily from news reports.3. The CNI operates as the technical and administrative area of the SESNSP.4.You can access the Lineamientos para el registro y clasificación de los presuntos delitos de feminicidio here: https://www.gob.mx/sesnsp/documentos/lineamientos-para-el-registro-yclasificacion-de-los-presuntos-delitos-de-feminicidio?state = published 5. Which, in turn, aligns with UNODC's (2015) International Classification of Crime for Statistical Purposes.6.The 59 variables currently published by INEGI have grown over time and have responded to the requirements and necessities of the information context in Mexico.For example, a question about whether the deceased spoke an indigenous language was only first collected on the 2012 version of the death certification form.Deaths registered in years before 2012 will not have this information because it was not collected.7. The ICD-10 catalogue contains 4,027 possible cause of death categories, each divided into a variable number of sub-categories.Sixty-eight of these categories are related to aggressions that are characteristic of homicide (Data Cívica and Intersecta, 2022).8. Based on our own calculations using the data published by Maria Gargiulo is a PhD student at the London School of Hygiene and Tropical Medicine.She holds an MPhil in sociology and demography from the University of Oxford and a BS in statistics and Spanish from Yale University.RésuméÀ ce jour, il reste difficile de définir ce qui ressort du domaine des féminicides au Mexique.Lorsque des données sur les meurtres de femmes et de filles liés au genre sont disponibles, elles sont souvent incomplètes, inexactes ou inexplicables.Dans cet article, une sociologue et une statisticien se penchent sur les données portant sur les féminicides au Mexique.À partir du cadre analytique des datasheets for datasets (fiches techniques pour les ensembles de données)de Gebru et al. (2021), et de celui de data nutrition label (étiquetage des éléments qui alimentent les données)de Holland et al. (2020), nous nous concentrons sur les deux principales sources gouvernementales de mesure des féminicides dans le pays : les registres de mortalité traités par l'Instituto Nacional de Estadística, Geografía e Informática (INEGI), et les dossiers d'enquête sur les féminicides présumés publiés par le Secretariado Ejecutivo del Sistema Nacional de Seguridad Pública (SESNSP).Deux observations importantes ressortent de cette étude.Premièrement, la discordance entre les données de l'INEGI et celles du SESNSP, laquelle est attribué à quatre variations essentielles : la dénomination, la sous-déclaration, la comparabilité et la disponibilité.Et deuxièmement, les lacunes de l'INEGI et du SESNSP dans la mesure des féminicides tels que nous les concevons d'un point de vue sociologique.Autrement dit, ni l'un ni l'autre ne prend explicitement en compte la motivation « liée au genre » qui sous-tend les féminicides.Au lieu de cela, les données de l'INEGI et du SESNSP ne fournissent actuellement que des approximations discordantes du phénomène, dans la logique de ce que Sandra Walklate et Kate Fitz-Gibbon (2023) définissent comme un décompte « mince » des féminicides.L'objectif de cet article est de servir d'orientation pour l'évaluation et la compréhension des données sur les féminicides au Mexique, d'améliorer la communication entre ceux qui produisent les données et les usagers qui se soucient de comment celles-ci sont produites, et d'inciter à s'interroger sur les données qui traitent de justice sociale et de responsabilité dans les cas de féminicides et au-delà.