Abstract
The social sciences are increasingly addressing the quality of research data and debating ways to improve data transparency, that is, the availability of original research data to corroborate claims made in academic publications. This article offers a systematic discussion of related problems and challenges with the example of post-Soviet area studies. It goes on to examine ways to improve data transparency. Although the Internet has a huge potential for linking research with resulting publications and underlying data as well as for organizing a collective discussion around the research, current data repositories do not truly go beyond basic upload and download functions for datasets. With the example of the Discuss Data project, this article gives an overview of more elaborated features that can easily be implemented to improve the visibility and quality assessment of data collections. Finally, it discusses ethical concerns about data transparency related to privacy protection and copyrights.
Introduction
The social sciences are increasingly addressing the quality of research data and debating ways to improve data transparency. In this context, data refer not just to quantitative (numerical) data but also to all kinds of qualitative data ranging from different forms of texts and artifacts (or pictures of them) to audio and video files.
Making data available to other researchers has two main purposes. First, it allows other researchers to check whether the claims made on the basis of specific data are correct. Spectacular cases in which the quality assessment of research data by the academic community has failed, repeatedly make the headlines. However, they only address academic studies that have gained public attention. A typical example is the study by two renowned US economists, published in 2013, which claimed on the basis of an analysis of worldwide economic statistics that economic growth declines drastically as soon as a country’s state deficit exceeds 90% of its gross domestic product. In times of austerity, the study was frequently cited in mass media. Because the authors made their dataset publicly available, mistakes in their statistics were found and, therefore, could be corrected. The results of the original study were no longer supported (Cassidy, 2013; “The 90% Question,” 2013; “Trouble at the Lab,” 2013).
However, it is not only mistakes in the data that cause problems. The interpretation of data can also be misleading. An assessment of 6,700 empirical studies in economics came to the conclusion that in half of the research areas, nearly 90% of the studies used samples that were too small for reliable conclusions, while of the remaining studies, the vast majority exaggerated the actual results (Ioannidis, Stanley, & Doucouliagos, 2017).
An analysis of some 300 studies published in three prominent political behaviorist journals found that only about half of the authors provided access to the underlying data. Of the accessible datasets, roughly 25% were presented “so poorly that replication was impossible” (Stockemer, Koehler, & Lentz, 2018, p. 799).
There is also an important—though much less discussed—second reason for why the publication of data makes sense. Starting from the idea of academic research as a collective endeavor toward a better understanding of the world, secondary data analysis can provide huge benefits to researchers. Most importantly, it can offer access to unique information if, for instance, historical data can no longer be found or generated. It can also add further evidence to original data, supporting or challenging one’s own results, and thus allowing for triangulation (on triangulation, see, for example, Junk, 2011). Finally, access to secondary data can save costs compared with own data collection, which is especially important for early-stage researchers who lack funding (e.g., Heaton, 2008). Secondary analyses of research data, however, also bear the risk of reproducing existing mistakes if these have not been discovered or sufficiently communicated.
Data transparency, that is, free availability of the relevant data, constitutes the basis for both quality assessment of related published research results and secondary data analysis. A necessary second step is the competent discussion of the data collections themselves by the academic community.
The challenges related to data transparency and data quality assessments are, of course, dependent on the academic discipline and the object of study. In this article, we elaborate on social science data based on the example of post-Soviet area studies. We start with a typology of the problems of data quality and data interpretation. We go on to give an overview of current attempts to improve data transparency and offer a brief sketch of a new project that aims to go beyond current approaches. Finally, we discuss the ethical issues that have to be taken into consideration when discussing data transparency.
Problems of data quality
Based on the underlying cause, we differentiate three quality problems concerning quantitative as well as qualitative data. First, there are intentionally falsified data. Second, there can also be unintended mistakes in the data. Third, data can be incomplete and thus misleading. In all cases, the results are data collections of low quality that should only be used after a proper assessment of their shortcomings.
The country context impacts the underlying causes in multiple ways. One relevant difference of the post-Soviet region, compared with the Organisation for Economic Co-operation and Development (OECD) world, is deficient information infrastructures. Many resources and services provided centrally in the Soviet Union have been disconnected or can no longer be provided by the successor states (Johnson, 2014). Therefore, the availability and quality of statistical data are limited in many regards (Bessonov, 2013; Kryukov & Sokolin, 2010), and the access to official documents is often impeded. Moreover, in authoritarian countries, conducting interviews or collecting data on politically relevant topics can lead to legal problems.
The following paragraphs provide a number of examples, which illustrate the three types of quality problems. The examples are taken from the post-Soviet region, but this does in no way imply that related problems are less pressing or necessarily of a different nature in other parts of the world. The examples from mainstream economic science cited in the introduction clearly indicate that the general problem is pervasive globally.
Intentional falsification of data
When individual researchers falsify data so that their conclusions look more convincing, this can only be detected on a case-by-case basis. We are not aware of any such instances in post-Soviet area studies.
In any case, the systematic falsification of data concerns foremost nonacademic sources, namely, official state agencies. Especially in authoritarian regimes, state organs may simply produce the information that is desired, instead of the information that has been collected. This often concerns economic statistics. While Uzbekistan still claimed in 2015 that the country had not been affected by the severe global economic downturn after 2009, Russian mirror statistics, foreign investors, and commodity prices all pointed to the opposite (Focus-Economics, 2016). Discrepancies in Russian trade statistics or regional statistics can also—at least partly—be explained by deliberate misreporting with the aim to obtain the politically desirable results (Simola, 2012; Zubarevich, 2012).
The detection of falsifications of election results has even developed into a new subdiscipline in political science, the “Election Fraud Forensics” (Alvarez, Ansolabehere, & Stewart, 2005; Alvarez, Hall, & Hyde, 2008; Beber & Scacco, 2012; Deckert, Myagkov, & Ordeshook, 2011; Lehoucq, 2003; Magaloni, 2010; Vickery & Shein, 2012). Various studies address such practices in the post-Soviet region (e.g., Hyde, 2007; Myagkov, Ordeshook, & Shakin, 2009; Senyuva, 2010; Tucker, 2007).
To give one example, according to the final results of the Central Election Commission of the Russian Federation for the 2011 parliamentary election, the party “United Russia” received nearly 50% of the votes and, thus, the absolute majority of deputies in parliament. However, in 60% of the 60,000 voting districts, irregularities were reported; in 3,000 voting districts (especially in Dagestan and North Ossetia in the North Caucasus), “United Russia” received 100% of the votes. This “ballot stuffing” obviously changed the election results. It has been estimated that in case of a normal statistical distribution, as occurred in other elections, “United Russia”—the party supporting Russian President Vladimir Putin—would have more likely received between 30% and 35% of the votes. Similar irregularities occurred in Russia during the presidential elections of 2012 (Klimek, Yegorov, Hanel, & Thurner, 2012).
However, to present the desired picture, governments do manipulate not only numbers, that is, quantitative data, but also all kinds of qualitative data. For example, in the Soviet Union, even before the digital age, official photos of leading party members were manipulated to delete the faces of people who had fallen out of favor (King, 2014). A common strategy not limited to authoritarian regimes is the production of “fake news,” which often means the spreading of rumors to discredit opponents. This form of falsification first of all concerns text documents such as official statements and media reports.
Unintended mistakes in data
Although ethically not as problematic as intended falsifications, it can be assumed that unintended mistakes are at least no less numerous in practice. “Unintended” here refers to those in charge of data collection. While, in the above examples of falsifications, those collecting and publishing the data have been the culprits, in the case of unintended mistakes, such actors may, for example, fall victim to false claims from the subjects of their research.
An area where this might be frequent is public opinion polls on political issues in authoritarian regimes, where freedom of opinion is restricted. In a public opinion poll conducted in Russia by the Levada Center in July 2016, only 30% of respondents stated that they would always honestly answer questions related to politics; furthermore, only 12% of them assumed that other people would do so (Levada Center, 2016). Partly related to this uneasiness with talking about politics is the high rejection rate in public opinion surveys. It has been claimed that only a small part of the Russian populace (between 10% and 30%) is willing to take part in such polls and surveys (Napeenko, 2017).
A related case—though not due to authoritarian repression—is the Gini coefficient produced by the “Azerbaijan Household Income and Expenditure Survey,” which has been included in global datasets. The Gini coefficient for Azerbaijan had a low value, which indicated a low degree of social inequality in the country. This finding was unexpected for a country experiencing an oil boom; theory would rather suggest high inequality. In fact, the low value was partly caused by better off, middle-class households not participating in the survey for fear that their nondeclared income would be detected, resulting in higher taxation (Ersado, 2006).
Another example is participants in large-scale demonstrations. In his codebook for disaggregated event data “Mass demonstrations and mass violent events in the former USSR, 1987–1992,” Beissinger (2003) explains,
The number of participants in a demonstration can often fluctuate drastically over the course of a single event. Crowds of 10 thousand, for instance, may gather on a square in the morning; by evening, the same demonstration may have tens or hundreds of thousands of participants. The variables here all reflect reported information on the peak number of participants mentioned in each description of the event. In all, specific information on the number of participants was available for 68.4 percent of the demonstrations recorded. Since estimating the size of crowds is an art rather than a science, divergent estimates were recorded whenever available. (p. 7)
The problem is even more challenging for complex phenomena such as migration. The International Federation of Human Rights (2016) states in a report,
It should be noted that the lack of reliable statistics pertaining to migratory flows from Kyrgyzstan, and especially lack of disaggregated statistics specifically on the movement of women and children at a national and regional levels, makes it difficult to assess the full impact of migration on women and children. Various experts agree that these data underestimate the number of Kyrgyz migrants working abroad, which could be up to one million. It is challenging to have a real picture of migratory flows mostly because of: 1) the visa-free regime in post-Soviet countries where Kyrgyz migrants tend to work, 2) significant gaps in data recording at border check points, and 3) the majority of Kyrgyz migrant workers are undocumented. As a result, statistics from both the Kyrgyz State Migration Service and the Russian Federal Migration Service (FMS), as well as estimations of experts on migration, do not match. (p. 9)
Incomplete data collection
Incompleteness is easily visible in quantitative datasets, as missing figures are marked as not available in tables. Nevertheless, quantitative analyses often simply ignore missing data, thus potentially introducing a bias. This is a major issue, as an advisory group to the United Nations concluded in 2014 that over the last two decades, the percentage of missing data for basic socioeconomic development indicators in 157 countries was on average 30% to 40%; an improvement was not considered likely (“Data and Development,” 2014; on the case of Russian official statistics, see Baranov, 2013; Khaninym, 2012; Korhonen, 2012).
Moreover, existing data collections often stand alone and isolated; they do not refer to comparable or complementary datasets to check validity or fill in gaps. For instance, concerning public opinion polls, the “Caucasus Barometer” has since 2008 published data that asks many questions taken from the “World Value Survey” (WVS) for Armenia, Azerbaijan, and Georgia. Such complementary data, missing in the WVS, would also invite a discussion about the methodological comparability of the two surveys.
In the case of qualitative data, incompleteness is often harder to identify, and the implications for the conclusions drawn from them are less obvious. A typical example is expert or elite interviews. The researcher starts with a list of ideal candidates for such interviews. However, the final list of interviews conducted usually looks very different, as many decline to give an interview, while other respondents are added based on suggestions from earlier interview partners or from those who declined themselves. This technique has been formalized as “snowball sampling” (e.g., the entry in Lewis-Beck, Bryman, & Futing Liao, 2004). Here, it is far from clear whether the first sample—which may have been smaller—is “more complete” than the actual one, which may include important additional respondents as a result of snowball sampling.
At the same time, at a general level, two biases are likely to result from this approach. First, more important people (in terms of relevant responsibilities and knowledge) are likely to delegate the “task” of the interview to less important people. Second, in the snowball approach, respondents will most likely suggest like-minded people for interviews. In addition, in hybrid and authoritarian regimes, specific respondents may be discouraged from talking to academic researchers or may self-censor their answers (Beisembayeva, Papoutsaki, Kolesova, & Kulikova, 2013; Goode, 2010; Richardson, 2014; Roberts, 2013; Shih, 2015).
Another example of incomplete qualitative data is the manual content analysis of media reporting. As such an analysis requires reading all texts, the sample of media included in the analysis is often small. Moreover, print media available in electronic databases are preferred, as full-text search functions immensely simplify the creation of the text corpus. That means that TV, which is by far the most important source of news for the population in all post-Soviet countries, is often not included in the analysis (for an alternative approach including TV report, see Heinrich & Pleines, 2015, 2018).
Moreover, media reporting itself may lack relevant data (i.e., information). For example, Fredheim (2017) finds that pressure from news owners who are close to the ruling elites had a significant effect on journalistic output at the popular Russian online newspapers “Lenta” and “Gazeta.” Editorial changes in both publications were accompanied by a shift from core news areas (such as domestic and international politics) toward lifestyle and human interest subjects. In a similar vein, the Kazakhstani government has systematically prevented political analysis on the country’s websites (Anceschi, 2015; Lewis, 2016). Those compiling a protest event database (like the one by Beissinger referred to above) may thus miss out on smaller protests because they are no longer being reported.
Problems of data interpretation
Even data that are correct and complete can lead to wrong results. This is, in fact, not a problem of data quality in the narrow sense but a problem of data interpretation. Here, we again distinguish three forms, related to, first, the proper implementation of the method of analysis; second, the over-interpretation of data; and third, the misinterpretation of data.
Problems related to the proper implementation of the method of data analysis can come in many forms, which are specific to the method being used. Regression analyses can obviously suffer from mathematical mistakes, and results can also differ depending on the model chosen for calculations. For content or discourse analysis, proper implementation implies, among many other things, native-speaker command of the respective language and a consistent coding scheme. This issue is the topic of textbooks on the respective methods.
However, problems of data interpretation can also be due to an “over-interpretation” of the data, assigning them more reliability than they actually have. Especially, quantitative data suggest an accuracy that may not be supported by the underlying information.
An example of over-interpretation is indices based on expert opinions. The organization “Reporters without Borders” explicitly warns that its ranking of media freedom does not fulfill academic standards.1 Until 2012, the “Corruption Perception Index (CPI)” by “Transparency International” used changing data sources and moving averages; thus, the organization itself stated that the index scores could not be compared over time.2 The “Freedom House” ranking on political freedoms, which is often used to identify political regimes types, has been criticized for its unsystematic methodology and for a bias in favor of allies of the United States (Giannone, 2010; Steiner, 2016). However, political scientists often use these and other rankings without references to studies critically assessing the validity of these rankings (like, for example, Andersson & Heywood, 2009; Apaza, 2009; Bühlmann, Merkel, Müller, Giebler & Weβels, 2012; Giannone, 2010; Høyland, Moene, & Willumsen, 2012; Knack, 2006; Møller & Skaaning, 2012; Munck, 2011; Muno, 2012; Pickel & Pickel, 2011, 2012; Pleines, 2018; Steiner, 2016; Teorell, 2011).
Finally, problems of data interpretation can take the form of actual misinterpretation. Qualitative studies can be ignorant of relevant context, thus misinterpreting sources due to lack of information or cultural knowledge of specific meanings, as in the case of the term “democracy” described below, or of relevant modes of interpretation, for example, missing out on irony. A similar problem in quantitative studies is the rather common use of the CPI as a proxy for a country’s level of corruption, although a study documented by Transparency International itself has confirmed that there is no systematic relation between expert assessments, as those used in the CPI, and levels of actual corruption as reported in representative public surveys (Razafindrakoto & Roubaud, 2005).
A standard example of misinterpretation for the post-Soviet region is public survey data about “democracy.” When asked about the desirability of democracy as a regime type, large parts of the populaces in the post-Soviet region do not think of the ideal type “democracy” but of their own experiences with democratically elected governments in the 1990s, a time period characterized by corruption and social disruption (Carnaghan, 2011). Accordingly, answers are to a large degree (but not completely) related to the desirability of a “return to the 1990s.” They have to be interpreted accordingly and cannot simply be included in global comparisons about perceptions of democracy. In general, it is well known that the questionnaire design can strongly influence the answers of the respondents (see, for example, Lyons, 2012, pp. 257–269).
In authoritarian regimes, the possibility of repressions also fosters self-censorship within the mass media and in social media (Alexanyan et al., 2012; Bekmurzaev, Lottholz, & Meyer, 2018; Malthaner, 2014; Roberts, 2013; Shklovski & Valtysson, 2012). Accordingly, all forms of content and discourse analysis might include self-censored and actually censored forms of expression. To take them as honest statements might be a misinterpretation.
From transparency to discussion
The solution to problems of data quality and data interpretation currently promoted in the social sciences is data transparency. The idea behind data transparency is that by archiving research data online, they become publicly available, which offers all researchers the opportunity to assess not just the presentation of research results in academic publications but also the underlying data (Elman & Kapiszewski, 2014; Moravcsik, 2014). Related initiatives such as the “Data Access & Research Transparency Joint Statement” (DA-RT)3 and the “FAIR principles” (Wilkinson et al., 2016) require the underlying research data of published journal articles to be openly accessible, findable, interoperable, and well documented. Recent efforts on national, European, and global levels have led to the creation of discipline-specific infrastructures for the upload, search, and long-term storage of research data.4
Obviously, online availability is an important step toward increased data quality. If mistakes in the data cannot be corrected because the correct information is not available, false data can at least be marked as such. Like in the case of incomplete data collection, this allows researchers to become aware of and discuss the impact this has on the research results (Alvarez, Key, & Núñez, 2018). Data transparency also offers great opportunities for secondary analyses, as highlighted in the introduction.
However, the expectation that once data collections are published online, mistakes and problems will be easily spotted and immediately corrected is over optimistic for at least two reasons. First, often the problem is not a simple figure that has to be corrected—such as the Gini coefficient in one of the examples given above—but a broader issue related to the validity, applicability, and contextuality of research data, such as the reliability of opinion polls or the Freedom House country rankings, which are challenged by a number of complex arguments. Related assessments and debates are not included in the repositories that store the data collections in question. Instead, they are spread over a broad range of academic publications in different disciplines, as demonstrated by the bibliographic references in the section on country rankings above. Accordingly, academic researchers downloading a data collection have no easy and systematic access to related data quality assessments.
Second, many of the problems of data interpretation listed in the respective section above are specific to individual analyses and related academic publications. Accordingly, they are not addressed in the literature, as long as they are not among the very few famous studies that stir broader debates. In most cases, an assessment of the reliability of data collection and data interpretation is only conducted by discussants at academic conferences or peer reviewers in the publication process. Their comments are not available at all to the broader academic community.
Finally, it has been argued that data collections related to qualitative research methods can often not (easily) be prepared for online publication in the context of transparency initiatives. Here, mistakes and misinterpretations are much harder to substantiate than in disciplines that work solely with quantitative methods (Büthe & Jacobs, 2015, p. 2). In a similar vein, Monroe (2018) finds DA-RT insufficiently sensitive to the needs of qualitative data and sensitive environments, such as authoritarian regimes.
In summary, this means that the online availability of data collections is not enough to tackle the problems of data quality and data interpretation. In addition, links to the relevant literature discussing or using the respective data collection and, most importantly, a peer discussion placed next to the actual data collection are needed. In technical terms, virtual research environments, social media, and commercial online services already offer the necessary functionalities.
From a technical and organizational standpoint, it is sensible to establish an online platform as a complementary layer linking already existing services for the long-term storage with services for discipline-specific commenting and curating (Akers & Doty, 2013; Anderson & Blanke, 2012). A close cooperation between infrastructure providers and specific academic communities in the creation of an interactive online discussion platform has the potential to establish the most useful digital services for research data as well as to ingrain the idea of open access to research data and transparency within the academic communities.
As many academic disciplines are split into subdisciplines, sometimes with different research practices and requirements (Quandt & Mauer, 2012, p. 61), the direct involvement of these academic (sub-) communities in the development and operation of such infrastructures seems to be advisable (Pfeiffenberger, 2007). This is especially important in the area of quality assessment of data and interaction with the community (Klump & Ludwig, 2013, p. 261).
Exactly this is the aim of the Discuss Data Project, an “Open Platform for the Interactive Discussion of Research Data Quality (on the example of area studies on the post-Soviet region),” created and operated by the Göttingen State and University Library and the Research Centre for East European Studies at the University of Bremen.
Discuss Data aims to create an online platform that combines the publication of research data not only with a documentation of the data collection process but also with an interactive place of communication to discuss, evaluate, and contextualize these research data. The expert community will be enabled to indicate faulty or misleading data, to recommend complementary datasets (in case of gaps in the data collection) and to discuss extensively the validity, applicability, and interpretation of the data. This platform creates the opportunity to gather—in a structured way—the feedback to research data that is currently scattered among journal articles, conference papers, and blog posts or has not been published at all. The evaluation of research data is, therefore, transformed from an individual to a collective endeavor benefiting the whole expert community. Although the Discuss Data Project is organized by academic institutions on the basis of academic concerns about data quality, the online platform is open to users with nonacademic backgrounds as well, as the assessment of data quality is relevant for a much broader audience, including, for example, nonacademic researchers in think tanks or business, journalists, or policy makers.
For the storage and long-term archiving of the content, Discuss Data will use the services of the DARIAH-DE repository5 and (eventually) the Humanities Data Centre.6 Beyond the presentation of so far unpublished (or not in the respective repository infrastructures published) data collections, Discuss Data will also link to research data published in other interdisciplinary repositories, such as the Harvard Dataverse, to connect users interactively to all available knowledge on the post-Soviet region. The validity and comparability of these data will be discussed; complementary data will be recommended; and all data will be presented in an easily accessible and comprehensible online platform.
The publication of data collections—a broad spectrum of quantitative and qualitative data sources, forms and formats from the social sciences and the humanities dealing with the post-Soviet region—their metadata (i.e., detailed descriptions of the data) and a documentation describing the process of data collection are closely connected with the quality assessment and contextualization of these data in a single place. The quality assessment is best conducted by experts who are familiar with the content, method, and/or context of the dataset (Devarajan, 2013; Jerven, 2013; Seligson, 2004). This discussion will be based on a moderated and gated peer-review process7 as well as the subsequent involvement of the interested academic community through comments and references. For these community discussions, an intuitively usable interface will be developed that will enable a direct response to posts by other users as well as the presentation of complex debates. For posting comments and other annotations, users will have to register on the platform providing their real names and institutional affiliations to avoid spamming and trolling.
Important editorial tasks—such as the upload of data collections, user administration and user support—will be gradually transferred to the user community itself. Thereby, a new model for the sustainable establishment of a permanent online platform through the transfer of responsibilities and tasks to the user community will be tested. An important project goal is to organize and moderate these transfer processes and make them technically possible to gradually establish a self-organization by the user community. Such a form of self-organization is possible as has been proven (even though outside of academic structures) by the success of the online encyclopedia Wikipedia (for the not always unproblematic relationship of academia toward Wikipedia and its knowledge organization, see Black, 2008; Brandt, 2009; Eijkman, 2010).
With the upload of qualitative data collections, the question of the protection of privacy requires special attention, as will be elaborated in the following section.
Ethical issues
Concerning the role of people as subjects of research in the social sciences, quantitative as well as qualitative data are mainly gathered in two forms. They can either be collected from participants who have been recruited by the researcher or taken from sources where they have been produced authentically (i.e., independently of the research project). An example of the latter is the large amounts of data created directly by users in online and social media, which are called “big data.”8
If the people included in the study (i.e., respondents, participants, or data providers) are recruited by the researcher, ethics guidelines regularly demand “informed consent.” Participants should be informed about the purpose of the study and the specific use of the generated data, that their participation is voluntary, and that they can withdraw from the study at any time. In a final step, data providers should also be informed about the research results (Fossheim & Ingierd, 2015, p. 11; Lüders, 2015, p. 79). Consequently, Elgesem (2015) argues that research should not entail a risk of harm or discomfort for the data provider.
In our view, the idea of “informed consent” poses three challenges to the researcher. First, this consent may not be granted due to a general feeling of mistrust that is not related to any specific concerns about the respective research project. Moreover, this mistrust may be directed not so much toward the research as such (e.g., giving an interview) but toward the specific formal requirements of “informed consent” (e.g., signing a consent form).
Second, even if “informed consent” is given, neither the respondent nor the researcher can be sure about the consequences a participation in a research project may have. Examples are that the researcher later becomes “undesired” by the state authorities, which will badly reflect on the “informers,” or that the respondent makes a statement that only later becomes “subversive” after official attitudes have changed. This aspect is especially relevant if the data (e.g., interviews) are to be published in full to ensure data transparency. This can require careful and time-intensive preparations by the researchers (cf., Monroe, 2018, pp. 144–145).
Third, in some cases, such as concerning corrupt state officials, militant separatists, or political radicals, research based on direct interaction with the subjects of research is hardly feasible if based on “informed consent.”
The situation is different if the researcher is not in direct contact with the “data provider,” as in the case of online data. Social media might, for instance, make compromises regarding the privacy of data and the anonymity of data providers or users that do not meet the standards of “informed consent.” The ubiquity of publicly available social media data creates enormous possibilities for privacy violations (Albright, 2011; Elgesem, 2015, pp. 24–25; Hoser & Nitschke, 2010, p. 184; Utaaker Segadal, 2015, p. 36). It often seems impossible (or at least impractical) to obtain the informed consent of data providers who also cannot be informed about the research results due to their large number and due to lack of contact details.
McKee and Porter (2009, p. 88) identify four factors that affect the need to obtain consent when research using online data is conducted:
the degree of accessibility in the public sphere (public vs. private),
the sensitivity of the information,
the degree of interaction between the researcher and the research subjects, and
the vulnerability of the research subjects.
Based on the first criterion, many scholars have proposed that data should not be used without consent if “the people being studied do not have an expectation that the information will be used in research” (Elgesem, 2015, p. 23, emphasis in the original). However,
assessing the acknowledged publicity of an online venue is not always straightforward, at least not as seen from the point of view of the participants. A personal blog might be publicly available for all to read, though very often it can be regarded as a personal and private space by the author. (Lüders, 2015, p. 80)
An additional aspect is the legal regulation in the country where the researcher is based. Research activities are not considered incompatible with the original purpose of data generation in the European Union (and Norway), as science is afforded a special position in the respective legal framework: “This provision might be seen as a fundamental principle guaranteeing further use of data for research purposes regardless of the original reason for their production. This leaves open the possibility to conduct research on information obtained online without consent” (Utaaker Segadal, 2015, p. 42).
To achieve the overarching goal of privacy protection, which is also demanded by law in the European Union, a widely used method in quantitative research is the anonymization of data to protect the privacy of the individual research subjects. The method of anonymization can entail one or several of these techniques (Albright, 2011, p. 779):
(micro)-aggregation (e.g., unspecified gender or age),
alteration of the data,
suppression of certain variables (that might identify the data provider),
data swap (data from one research subject are ascribed to another research subject and vice versa),
random noise (to distort the original data to some degree).
However, Aiden and Michel (2014) claim that big data necessarily cast “big shadows.” A shadow is a projection of the real object, a “visual transformation that preserves some aspects of the original object while filtering out others” (Aiden and Michel, 2014, p. 60). It has been shown that the anonymization of quantitative datasets can be broken, revealing personal and sensitive information about the original single data provider (Albright, 2011, p. 778; Aiden and Michel, 2014, pp. 61–62; Zimmer, 2010).
Whereas quantitative research often has no need to identify individual data providers, qualitative research is often based on rather detailed profiles of individual data providers. It is already a challenge to publish the research results anonymously
if one wishes to publish direct quotes, as these will be searchable on the Internet. It is also important to note that pseudonyms or nicknames may be identifiable because they may be used in various contexts online and hence function as a digital identity. (Utaaker Segadal, 2015, p. 43; see also Elgesem, 2015, p. 29)
Anonymity and privacy protection are of special importance for data providers in politically sensitive regions such as the former Soviet Union (Côté, 2013). Authoritarian governments increasingly use the Internet and social media to identify users and possibly harass them to suppress opposing views and criticism. In the case of Azerbaijan, for example, punitive measures by state agencies are evoked not only by outright political opposition to the ruling regime but also include, for example, punishment for insulting members of the presidential family, which is a very ambiguous and stretchable offense. Thus, the increased visibility and surveillance of opinions shared on social media has made it easier for the Azerbaijani government to swiftly and severely punish online activities (Alexanyan et al., 2012; Pearce, 2015; Pearce & Guliyev, 2016). As Roberts (2013, p. 344) points out, “there can be no limit on the provision of anonymity and care in handling data; even in cases when the respondent does not ask for that provision” (see also Richardson, 2014, p. 185).
A second legal issue related to the online publication of data collections is copyrights. In the case of content or discourse analyses, event databases based on media reporting, or audio and video collections, copyright regulations may prohibit the publication of the underlying data collection. However, the results of data analysis or interpretation (e.g., coding results in the case of content analysis or descriptions of pictures, audio or video sources) can be published online.
To protect privacy and copyrights on the Discuss Data platform, every data provider specifies the extent to which data collections can become available online. Any data upload has to go through a multi-level review process to ensure that technical as well as legal requirements are fulfilled.
Conclusion
The availability and quality of research data are limited in many regards: the challenges range from intended falsification of data, unintended mistakes, and incomplete datasets to the over- or misinterpretation of correct (and complete) datasets. Moreover, the publication of social science research data is the cause for ethical concerns, most importantly about privacy protection.
For the often-demanded transparency in academic knowledge production, the careful publication of the underlying research data is a necessary first step. The discussion of these data is the second and, in our view, even the more vital step. However, academic fora that enable such a discussion in the digital age are missing so far.
By addressing these problems with Discuss Data, we want to create a digital infrastructure that functions as a virtual place of communication enabling the discussion of publicly available research data. Evaluating the validity and reliability of research data is a hot topic in the social sciences at the moment. Discuss Data hopes to be able to present an easy and effective solution.
Acknowledgements
Discuss Data is jointly conducted by the Göttingen State and University Library and the Research Centre for East European Studies at the University of Bremen. A first version of the Discuss Data platform will be available online in 2020.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This publication has been produced in the context of the Discuss Data project, which is funded by the German Research Foundation (project numbers PL 621/3-1 and HO 3987/26-1). The funding source does not interfere with the conduction of the project.
Notes
1.
See https://rsf.org/en/world-press-freedom-index and https://rsf.org/en/detailed-methodology.
4.
Cf., for example, Harvard Dataverse (http://Dataverse.org/), GESIS Datorium (https://datorium.gesis.org), TextGrid Repository (http://www.textgridrep.de/).
5.
See Digitale Forschungsinfrastruktur für die Geistes- und Kulturwissenschaften (https://de.dariah.eu).
7.
On the continued relevance of peer-review procedures in the digital age, see Fitzpatrick (2012) and Nicholas, Watkinson, and Jamali (2015); for the area of datasets, see “Editorial: The Guardian” (2014).
8.
Big data refer to extremely large sets of semi- or unstructured digital data on social transactions that are, deliberately or passively and in various shapes and forms, generated in our daily interactions with technology. These digital traces constitute enormous datasets available to others that may be analyzed to reveal patterns, trends, and associations, especially relating to human behavior and interactions (Prabhu, 2015: 158; Steen-Johnsen, & Enjolras, 2015: 122).
References
|
Aiden, E., Michel, J.-B. (2014). Uncharted: Big data as a lens on human culture. New York, NY: Riverhead Books. Google Scholar | |
|
Akers, K. G., Doty, J. (2013). Disciplinary differences in faculty research data management: Practices and perspectives. International Journal of Digital Curation, 8(2), 5–26. Google Scholar | Crossref | |
|
Albright, J. J. (2011). Privacy protection in social science research: Possibilities and impossibilities. PS: Political Science & Politics, 44, 777–782. Google Scholar | Crossref | |
|
Alexanyan, K., Barash, V., Etling, B., Faris, R., Gasser, U., Kelly, J., . . . Roberts, H. (2012). Exploring Russian cyberspace: Digitally-mediated collective action and the networked public sphere. Cambridge, MA: The Berkman Center for Internet & Society Research at Harvard University. Retrieved from http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2014998 Google Scholar | |
|
Alvarez, R. M., Ansolabehere, S., Stewart, C.. (2005). Studying elections: Data quality and pitfalls in measuring the effects of voting technologies. Policy Studies Journal, 33, 15–24. Google Scholar | Crossref | |
|
Alvarez, R. M., Hall, T. E., Hyde, S. D. (2008). Introduction: Studying election fraud. In Alvarez, R. M., Hall, T. E., Hyde, S. D. (Eds.), Election fraud: Detecting and deterring electoral manipulation (pp. 1–17). Washington, DC: The Brookings Institution Press. Google Scholar | |
|
Alvarez, R. M., Key, E. M., Núñez, L. (2018). Research replication: Practical considerations. Political Science, 51, 422–426. Google Scholar | |
|
Anceschi, L. (2015). The persistence of media control under consolidated authoritarianism: Containing Kazakhstan’s digital media. Demokratizatsiya, 23, 277–295. Google Scholar | |
|
Anderson, S., Blanke, T. (2012). Taking the long view: From e-science humanities to humanities digital ecosystems. Historical Social Research/Historische Sozialforschung, 37, 147–164. Google Scholar | |
|
Andersson, S., Heywood, P. M. (2009). The politics of perception: Use and abuse of transparency international’s approach to measuring corruption. Political Studies, 57, 746–767. Google Scholar | SAGE Journals | |
|
Apaza, C. R. (2009). Measuring governance and corruption through the worldwide governance indicators: Critiques, responses, and ongoing scholarly discussion. PS: Political Science & Politics, 42, 139–143. Google Scholar | Crossref | |
|
Baranov, E. F. (2013). Russian statistics: Achievements and problems. Problems of Economic Transition, 55(11), 24–35. Google Scholar | Crossref | |
|
Beber, B., Scacco, A. (2012). What the numbers say: A digit-based test for election fraud. Political Analysis, 20, 211–234. Google Scholar | Crossref | |
|
Beisembayeva, D., Papoutsaki, E., Kolesova, E., Kulikova, S. (2013, June 25–29). Social media, online activism and government control in Kazakhstan. Paper presented at the IAMCR 2013 Conference “Crises, ‘Creative Destruction’ and the Global Power and Communication Orders,” Dublin. Google Scholar | |
|
Beissinger, M. (2003). Codebook for disaggregated event data: “Mass demonstrations and mass violent events in the former USSR, 1987–1992.” Retrieved from https://scholar.princeton.edu/mbeissinger/publications/mass-demonstrations-and-mass-violent-events-former-ussr-1987-1992-these Google Scholar | |
|
Bekmurzaev, N., Lottholz, P., Meyer, J. (2018). Navigating the safety implications of doing research and being researched in Kyrgyzstan: Cooperation, networks and framing. Central Asian Survey, 37, 100–118. Google Scholar | Crossref | |
|
Bessonov, V. A. (2013). On the problems of Russian statistics. Problems of Economic Transition, 55(11), 36–49. Google Scholar | Crossref | |
|
Black, E. W. (2008). Wikipedia and academic peer review: Wikipedia as a recognised medium for scholarly publication? Online Information Review, 32, 73–88. Google Scholar | Crossref | |
|
Brandt, D. (2009). Postmodern organisation of knowledge or: How subversive is Wikipedia? LIBREAS. Library and Ideas, 14, 4–18. Google Scholar | |
|
Bühlmann, M., Merkel, W., Müller, L., Giebler, H., Weβels, B. (2012). Democracy Barometer: A new instrument for Comparative Political Science. Zeitschrift für Vergleichende Politikwissenschaft, 6, 115–159. Google Scholar | Crossref | |
|
Büthe, T., Jacobs, A. M. (2015). Introduction to the symposium. Qualitative & Multi-Method Research, 13(1), 2–8. Google Scholar | |
|
Carnaghan, E. (2011). The difficulty of measuring support for democracy in a changing society: Evidence from Russia. Democratization, 18, 682–706. Google Scholar | Crossref | |
|
Cassidy, J. (2013, April 26). The Reinhart and Rogoff controversy: A summing up. The New Yorker. Retrieved from http://www.newyorker.com/news/john-cassidy/the-reinhart-and-rogoff-controversy-a-summing-up Google Scholar | |
|
Côté, I. (2013). Fieldwork in the era of social media: Opportunities and challenges. PS: Political Science & Politics, 46, 615–619. Google Scholar | Crossref | |
|
Data and development: Off the map . (2014, November 15). The Economist. Retrieved from http://www.economist.com/news/international/21632520-rich-countries-are-deluged-data-developing-ones-are-suffering-drought Google Scholar | |
|
Deckert, J., Myagkov, M., Ordeshook, P. C. (2011). Benford’s law and the detection of election fraud. Political Analysis, 19, 245–268. Google Scholar | Crossref | |
|
Devarajan, S. (2013). Africa’s statistical tragedy [Special issue]. Review of Income and Wealth, 59, S9–S15. Google Scholar | Crossref | |
|
Editorial: The Guardian view on the end of the peer review . (2014, July 6). The Guardian. Retrieved from https://www.theguardian.com/commentisfree/2014/jul/06/guardian-view-end-peer-review-scientific-journals Google Scholar | |
|
Eijkman, H. (2010). Academics and Wikipedia: Reframing Web 2.0+ as a disruptor of traditional academic power-knowledge arrangements. Campus-Wide Information Systems, 27, 173–185. Google Scholar | Crossref | |
|
Elgesem, D. (2015). Consent and information: Ethical considerations when conducting research on social media. In Fossheim, H., Ingierd, H. (Eds.), Internet research ethics (pp. 14–34). Oslo, Norway: Cappelen Damm Akademisk. Google Scholar | |
|
Elman, C., Kapiszewski, D. (2014). Data access and research transparency in the qualitative tradition. PS: Political Science & Politics, 47(1), 43–47. Google Scholar | Crossref | |
|
Ersado, L. (2006). Azerbaijanʼs household survey data: Explaining why inequality is so low (Policy Research Working Paper WPS 4009). Washington, DC: World Bank. Retrieved from http://documents.worldbank.org/curated/en/2006/09/7063026/azerbaijans-household-survey-data-explaining-inequality-so-low Google Scholar | |
|
Fitzpatrick, K. (2012). Beyond metrics: Community authorization and open peer review. In Gold, M. (Ed.), Debates in the digital humanities (pp. 452–459). Minneapolis: University of Minnesota Press. Google Scholar | Crossref | |
|
Focus-Economics . (2016). Uzbekistan economic outlook. Retrieved from https://www.focus-economics.com/countries/uzbekistan Google Scholar | |
|
Fossheim, H., Ingierd, H. (2015). Introductory remarks. In Fossheim, H., Ingierd, H. (Eds.), Internet research ethics (pp. 9–13). Oslo, Norway: Cappelen Damm Akademisk. Google Scholar | |
|
Fredheim, R. (2017). The loyal editor effect: Russian online journalism after independence. Post-Soviet Affairs, 33(1), 34–48. Google Scholar | Crossref | |
|
Giannone, D. (2010). Political and ideological aspects in the measurement of democracy: The Freedom House case. Democratization, 17(1), 68–97. Google Scholar | Crossref | |
|
Goode, J. P. (2010). Redefining Russia: Hybrid regimes, fieldwork, and Russian politics. Perspectives on Politics, 8, 1055–1075. Google Scholar | Crossref | |
|
Høyland, B., Moene, K., Willumsen, F. (2012). The tyranny of international index rankings. Journal of Development Economics, 97, 1–14. Google Scholar | Crossref | |
|
Heaton, J. (2008). Secondary analysis of qualitative data: An overview. Historical Social Research, 33(3), 33–45. Google Scholar | |
|
Heinrich, A., Pleines, H. (2015). Mixing geopolitics and business. How ruling elites in the Caspian states justify their choice of export pipelines. Journal of Eurasian Studies, 6(2), 107–113. Google Scholar | SAGE Journals | |
|
Heinrich, A., Pleines, H. (2018). The meaning of “limited pluralism” in media reporting under authoritarian rule. Politics and Governance, 6, 103–111. Google Scholar | Crossref | |
|
Hoser, B., Nitschke, T. (2010). Questions on ethics for research in the virtually connected world. Social Networks, 32, 180–186. Google Scholar | Crossref | |
|
Hyde, S. D. (2007). The observer effect in international politics: Evidence from a natural experiment. World Politics, 60, 37–63. Google Scholar | Crossref | |
|
International Federation of Human Rights . (2016). Women and children from Kyrgyzstan affected by migration. Paris, France: FIDH. Google Scholar | |
|
Ioannidis, J. P. A., Stanley, T. D., Doucouliagos, H. (2017). The power of bias in economics research. Economic Journal, 127, F236–F265. Google Scholar | Crossref | |
|
Jerven, M. (2013). Comparability of GDP estimates in sub-Saharan Africa: The effect of revisions in sources and methods since structural adjustment [Special issue]. Review of Income and Wealth, 59, S16–S36. Google Scholar | Crossref | |
|
Johnson, I. M. (2014). The rehabilitation of library and information services and professional education in the post-Soviet republics: Reflections from a development project. Information Development, 30, 130–147. Google Scholar | SAGE Journals | |
|
Junk, J. (2011). Method parallelization and method triangulation: Method combinations in the analysis of humanitarian interventions. German Policy Studies, 7(3), 83–116. Google Scholar | |
|
Khaninym, G. I. (2012). Numbers continue to be deceitful. EKO. Vserossiiskii ekonomicheskii zhurnal, 3, 4–13. Google Scholar | |
|
King, D. (2014). The commissar vanishes: The falsification of photographs and art in Stalin’s Russia (new ed.). London, England: Tate. Google Scholar | |
|
Klimek, P., Yegorov, Y., Hanel, R., Thurner, S. (2012). Statistical detection of systematic election irregularities. Proceedings of the National Academy of Science of the United States of America, 109, 16469–16473. Google Scholar | Crossref | Medline | |
|
Klump, J., Ludwig, J. (2013). Research data management. In Neuroth, H., Lossau, N., Rapp, A. (Eds.), Evolution der Informationsinfrastruktur. Kooperation zwischen Bibliothek und Wissenschaft (pp. 257–275). Glückstadt, Germany: VWH. Google Scholar | |
|
Knack, S. (2006). Measuring corruption in Eastern Europe and Central Asia: A critique of the crosscountry indicators (World Bank Policy Research Working Paper No. 3968). Washington, DC: World Bank. doi:10.1596/1813-9450-3968 Google Scholar | Crossref | |
|
Korhonen, V. (2012). Russian statistics. A view from the sidelines. EKO. Vserossiiskii ekonomicheskii zhurnal, 4, 56–73. Google Scholar | |
|
Kryukov, V. A., Sokolin, V. L. (2010). Russian statistics. Gains and losses. EKO. Vserossiiskii ekonomicheskii zhurnal, 8, 5–23. Google Scholar | |
|
Lehoucq, F. (2003). Electoral fraud: Causes, types and consequences. Annual Review of Political Science, 6, 233–256. Google Scholar | Crossref | |
|
Levada Center . (2016, August 12). Trust in mass media and readiness to state one’s opinion. Author. Retrieved from https://www.levada.ru/2016/08/12/14111/ Google Scholar | |
|
Lewis, D. (2016). Blogging Zhanaozen: Hegemonic discourse and authoritarian resilience in Kazakhstan. Central Asian Survey, 35, 421–438. Google Scholar | Crossref | |
|
Lewis-Beck, M. S., Bryman, A., Futing Liao, T. (Eds.). (2004). The SAGE encyclopedia of social science research methods. Thousand Oaks, CA: SAGE. Google Scholar | Crossref | |
|
Lüders, M. (2015). Researching social media: Confidentiality, anonymity and reconstructing online practices. In Fossheim, H., Ingierd, H. (Eds.), Internet research ethics (pp. 77–97). Oslo, Norway: Cappelen Damm Akademisk. Google Scholar | |
|
Lyons, P. (2012). Theory, data and analysis. Data resources for the study of politics in the Czech Republic. Prague: Institute of Sociology, Academy of Sciences of the Czech Republic. Google Scholar | |
|
Møller, J., Skaaning, S.-E. (2012). The inconsistency between concept and measurement in current studies of democracy. Zeitschrift Für Vergleichende Politikwissenschaft, 6(1), 233–251. Google Scholar | Crossref | |
|
Magaloni, B. (2010). The game of electoral fraud and the ousting of authoritarian rule. American Journal of Political Science, 54, 751–765. Google Scholar | Crossref | |
|
Malthaner, S. (2014). Fieldwork in the context of violent conflict and authoritarian regimes. In della Porta, D. (Ed.), Methodological practices in social movement research (pp. 173–194). Oxford, UK: Oxford University Press. Google Scholar | Crossref | |
|
McKee, H. A., Porter, J. E. (2009). The ethics of Internet research: A rhetorical, case-based process. New York, NY: Peter Lang. Google Scholar | |
|
Monroe, K. R. (2018). The rush to transparency: DA-RT and the potential dangers for qualitative research. Perspectives on Politics, 16, 141–148. Google Scholar | Crossref | |
|
Moravcsik, A. (2014). Transparency: The revolution in qualitative research. PS: Political Science & Politics, 47, 48–53. Google Scholar | Crossref | |
|
Munck, G. L. (2011). Measuring democracy: Framing a needed debate. Comparative Democratization, 9(1), 1–7. Google Scholar | |
|
Muno, W. (2012). Measuring the world. An analysis of the World Bank’s Worldwide Governance Indicators. Zeitschrift für Vergleichende Politikwissenschaft, 6(1), 87–113. Google Scholar | Crossref | |
|
Myagkov, M., Ordeshook, P. C., Shakin, D. (2009). The forensics of election fraud: Russia and Ukraine. Cambridge, UK: Cambridge University Press. Google Scholar | Crossref | |
|
Napeenko, G. (2017, March 13). If you scratch on a domestic liberal, you geht an educated conservative: Sociologist Grigory Yudin about deceiving public polls, elites’ fear of the people and the political suicide of the intelligentsia. Colta.ru. Retrieved from http://www.colta.ru/articles/raznoglasiya/14158 Google Scholar | |
|
Nicholas, D., Watkinson, A., Jamali, H. R. (2015). Peer review: Still king in the digital age. Learned Publishing, 28(1), 15–21. Google Scholar | Crossref | |
|
Pearce, K. E. (2015). Democratizing kompromat: The affordances of social media for state-sponsored harassment. Information, Communication & Society, 18, 1158–1174. Google Scholar | Crossref | |
|
Pearce, K. E., Guliyev, F. (2016). Digital knives are still knives: The affordances of social media for a repressed opposition against an entrenched authoritarian regime in Azerbaijan. In Bruns, A., Enli, G., Skogerbo, E., Larsson, A. O., Christensen, C. (Eds.), The Routledge companion to social media and politics (pp. 364–378). Abingdon, UK: Taylor & Francis. Google Scholar | |
|
Pfeiffenberger, H. (2007). Open access to primary scientific data. Zeitschrift für Bibliothekswesen und Bibliographie, 54(4), 207–210. Google Scholar | Crossref | |
|
Pickel, G., Pickel, S. (Ed.) (2011). Indicies in Comparative Political Science. Wiesbaden, Germany: VS Verlag. Google Scholar | |
|
Pickel, S., Pickel, G. (2012). The measurement of indices in Comparative Political Science-methodological sophistry or substantial need. Zeitschrift für Vergleichende Politikwissenschaft, 6(1), 1–17. Google Scholar | |
|
Pleines, H. (2018). Political regime-related country rankings. Caucasus Analytical Digest, 106, 2–19. Google Scholar | |
|
Prabhu, R. (2015). Big data—Big trouble? Meanderings in an uncharted ethical landscape. In Fossheim, H., Ingierd, H. (Eds.), Internet research ethics (pp. 157–172). Oslo, Norway: Cappelen Damm Akademisk. Google Scholar | |
|
Quandt, M., Mauer, R. (2012). Social Sciences. In Neuroth, H., Strathmann, S., Oßwald, A., Scheffel, R., Klump, J., Ludwig, J. (Eds.), Langzeitarchivierung von Forschungsdaten. Eine Bestandsaufnahme (pp. 61–81). Glückstadt, Germany: VWH. Google Scholar | |
|
Razafindrakoto, M., Roubaud, F. (2005). How far can we trust expert opinions on corruption? An experiment based on surveys in francophone Africa. In Transparency International (Ed.), Global corruption report (pp. 292–295). London, England: Pluto Press. Google Scholar | |
|
Richardson, P. B. (2014). Engaging the Russian elite: Approaches, methods and ethics. Politics, 34, 180–190. Google Scholar | SAGE Journals | |
|
Roberts, S. P. (2013). Research in challenging environments: The case of Russia’s “managed democracy.” Qualitative Research, 13, 337–351. Google Scholar | SAGE Journals | |
|
Seligson, M. A. (2004). Comparative survey research: Is there a problem? Comparative Politics, 15(2), 11–14. Google Scholar | |
|
Senyuva, O. (2010). Parliamentary elections in Moldova, April and July 2009. Electoral Studies, 29, 190–195. Google Scholar | Crossref | |
|
Shih, V. (2015). Research in authoritarian regimes: Transparency tradeoffs and solutions. Qualitative & Multi-Method Research, 13, 20–22. Google Scholar | |
|
Shklovski, I., Valtysson, B. (2012). Secretly political: Civic engagement in online publics in Kazakhstan. Journal of Broadcasting & Electronic Media, 56, 417–433. Google Scholar | Crossref | |
|
Simola, H. (2012). The quality of Russian import statistics. EKO. Vserossiiskii ekonomicheskii zhurnal, 3, 95–104. Google Scholar | |
|
Steen-Johnsen, K., Enjolras, B. (2015). Social research and Big Data: The tension between opportunities and realities. In Fossheim, H., Ingierd, H. (Eds.), Internet research ethics (pp. 122–140). Oslo, Norway: Cappelen Damm Akademisk. Google Scholar | |
|
Steiner, N. D. (2016). Comparing Freedom House democracy scores to alternative indices and testing for political bias: Are US allies rated as more democratic by Freedom House? Journal of Comparative Policy Analysis, 18, 329–349. Google Scholar | Crossref | |
|
Stockemer, D., Koehler, S., Lentz, T. (2018). Data access, transparency, and replication: New insights from the political behavior literature. Political Science, 51, 799–803. Google Scholar | |
|
Teorell, J. (2011). Over time, across space: Reflections on the production and usage of democracy and governance data. Comparative Democratization, 9, 7–11. Google Scholar | |
|
The 90% question: A seminal analysis of the relationship between debt and growth comes under attack . (2013a, April 20). The Economist. Retrieved from http://www.economist.com/news/finance-and-economics/21576362-seminal-analysis-relationship-between-debt-and-growth-comes-under Google Scholar | |
|
Trouble at the lab: Scientists like to think of science as self-correcting . To an alarming degree, it is not. (2013b, October 18). The Economist. Retrieved from http://www.economist.com/news/briefing/21588057-scientists-thinkscience-self-correcting-alarming-degree-it-not-trouble Google Scholar | |
|
Tucker, J. A. (2007). Enough! Electoral fraud, collective action problems, and post-communist colored revolutions. Perspectives on Politics, 5, 535–551. Google Scholar | Crossref | |
|
Utaaker Segadal, K . (2015). Possibilities and limitations of Internet research: A legal framework. In Fossheim, H., Ingierd, H. (Eds.), Internet research ethics (pp. 35–47). Oslo, Norway: Cappelen Damm Akademisk. Google Scholar | |
|
Vickery, C., Shein, E. (2012). Assessing electoral fraud in new democracies: Refining the vocabulary. Washington, DC: International Foundation for Electoral Systems. Google Scholar | |
|
Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., . . . Mons, B. (2016). The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 3, Article 160018. doi:10.1038/sdata.2016.18 Google Scholar | Crossref | Medline | |
|
Zimmer, M. (2010). ‘But the data is already public’: On the ethics of research in Facebook. Ethics and Information Technology, 12, 313–325. Google Scholar | Crossref | |
|
Zubarevich, N. V. (2012). “Deceitful numbers” on the map of the homeland. EKO. Vserossiiskii ekonomicheskii zhurnal, 4, 74–85. Google Scholar |
Author biographies
Andreas Heinrich is a senior researcher in the Department of Politics and Economics, Research Centre for East European Studies at the University of Bremen. His research interest is focused on the political role of the energy sector in the post-Soviet region.
Felix Herrmann is research associate for e-research and digital humanities at the Research Centre for East European Studies at the University of Bremen. His research interests include the development of digital infrastructures to support academic research and the history of economy, industry, and technology in the COMECON countries.
Heiko Pleines is head of the Department of Politics and Economics, Research Centre for East European Studies and professor of comparative politics at the University of Bremen. His main research interest is the role of nonstate actors in authoritarian regimes.


