Keep discussing evaluation – A personal and appreciative reflection

In an attempt to summarize and draw preliminary conclusions from the many fine responses to my article ‘Stop evaluating science’, this short piece brings some additional reflections on the topic with the primary intent not to close the debate but to keep it open. Discussing, in turn, three main topics of the responses and an additional topic that arguably is of particular interest, the article’s intent is to celebrate the great insights and contributions that surfaced in the debate so far by adding some notes on how to take the issue further in future scholarly inquiry and discussion.

Oh, the joy and honor of having such a distinguished group of colleagues responding, with such eloquent and thoughtful pieces, to a call for debate on the basis of my provocative article 'Stop evaluating science: A historical and sociological argument'. I of course personally never doubted the article's relevance and timeliness, but I was somewhat distrustful of its abilities to make any difference. After all, one alleged main consequence of the current evaluation hysteria in science is that journals are nowadays flooded with manuscripts submitted mostly to advance the career(s) of the author(s), with less attention paid to the actual contributions they make. In that flood of publications, what difference can one article make? This is mostly outside of the author's control. Therefore, I would like to commend and deeply thank the editors of this journal for their decision to let my article be the starting point for further debate. Not only was attention thus drawn to the article itself but, more importantly, it invited colleagues to counter and amend my arguments and analytical assertions, to the greater benefit of the topic and how it is handled in scholarly work.
A slightly more analytical reaction gives rise to two important inferences. Firstly, that colleagues are usually better at summarizing the key arguments of one's work. Secondly, that colleagues are also better suited to spot the gaps, fill them, and build on them to make both supplemental and superior interpretations. Before highlighting the latter capacity of the responses, I will settle on the former and try to recapitulate the (intended) message of my article 'Stop evaluating science', using mostly excerpts from their responses.
The currently ubiquitous practice of evaluating science meant to assess its usefulness and productivity is essentially pointless, and this has several reasons. First of all, 'evaluation is antagonistic' to many of the key values of science, including innovation (D'Agostino and Malpass, in this issue). Second, it 'inflates bureaucracy in unnecessary and counterproductive ways, wasting and misdirecting precious resources' (Brighenti, in this issue). Thirdly, because the metrics used 'are shallow, over-simplified and inaccurate' (Knaapen, in this issue), they cannot capture the real meaning and benefit of science, since this is unavailable to most of us (Shinn and Marcovich, in this issue). 'Nobody can be really sure about long-term consequences of any seemingly 'useless' research' (Khomyakov, in this issue), and therefore we should hesitate to judge what's good and bad science, other than on the basis of substantial historical evidence. Such evidence clearly shows that science 'has been immensely productive well before performance benchmarks were ever conceived' (Lizotte, in this issue), and therefore, it would be productive also 'without shallow quantitative managerial devices' (Hannud Abdo, in this issue). Another way of making the same argument is to say that 'the basic aim of science is not to be 'productive' -whatever the exact meaning of that term' (Gingras, in this issue). In this capacity, science is evidently successful, although this is a tough sell since 'while reformers can draw on popular rhetoric of how science should operate, critics must wade into the murky waters of real scientific practice' (Peterson and Panofsky, in this issue). Nonetheless, the 'growing consensus, or perhaps implicit consent, in society that science should be further rationalized through exogenous interference' (Lizotte) is fallacious. Hence the injunction in the title of the article: stop evaluating science.
Debate on the basis of this statement, and the arguments behind, are not only inspiring and useful but are probably the most responsible and productive way forward for the scholarly community. Although my article 'Stop evaluating science' was 'deliberately pointed and provocative' (Hallonsten, 2021a) and probably could be (mis)read as a closed statement of sweeping claims, with no interest of the author to hear counterarguments, it also ended with the note that '[s]erious debate [. . .] should ensue to handle this important matter'. Importantly, it is not only impossible for an author to anticipate all counterarguments to a historical-sociological argument but indeed ill-advised to try to do so. In no small part, the already expressed joy over the turnout of the call for debate is due to a personal conviction that no scientific publication, regardless of how comprehensively presented it is, marks the end of anything, except maybe (paraphrasing Winston Churchill) the end of the beginning of an inquiry.
Science (including social science) deserves to be understood as a process and not a product. Karl Popper (1988) most famously noted that the built-in progress of organized human quest for knowledge makes all results and claims provisional. Robert Merton (1973: 277) showed that organized skepticism 'is both a methodological and an institutional mandate' which means that it is both at the heart of scientific conduct, and a defining characteristic for science as an institution. This is of course itself a challenge to evaluators, since it means that scholarly publications are only contributions to a greater whole which is open-ended, evolutionary, and interactive, and whose worth can only be assessed with a holistic view. In practice, further work that builds on a scholarly contribution also amends it, complements it, contrasts with it, and proves it right and wrong in various aspects and from various points of view. In social science, the rational discourse and dialogue between informed and interested scholars, who base logical arguments and conclusions on solid empirical material, is what pushes knowledge forward, to the benefit of society (Habermas, 1971). Therefore, 'Stop evaluating science' perhaps suffered somewhat from its necessary brevity. Indeed, word limits may very well restrict analyses and argumentation of types that befit social science, but also have upsides, as the brief format makes articles accessible and thus suitably inviting and hence conducive of the scholarly exchange that is instrumental to progress in any knowledge-building enterprise.
Three topics from the varied collection of responses are, in my view, particularly suitable for further discussion that hopefully can inspire and incentivize further valuable scientific work in this important area of study, in the spirit of the Popperian, Mertonian, and Habermasian theory of scientific knowledge briefly recapitulated above. To them, I have taken the liberty to add a fourth topic, which is not prevalent among the responses but which was part of the purpose of my original article to explore, and that I hope can be of some significance in the continuing debate.

Quantification
A key issue in the debate is whether the blame should be put on numbers and evaluations themselves or on their mere use in irresponsible ways. Some of the collected responses claim that 'numbers are not inherently harmful' (Knaapen), and that 'the problem is not 'evaluation' per se' (Gingras), but I would like to make the point that in the case of evaluation of science, numbers are indeed inherently harmful. This is because they carry with them values that are not representative of science and/or that go squarely against proper conduct in science. Loes Knaapen argues, further, that 'the problem with science's shallow and inaccurate evaluation does not lie with quantification per se, but with the reduction of science's purpose to the single and narrow purpose of economic growth'. To this I both agree and disagree. The main problem is certainly the (erroneous but) hegemonic belief today that the purpose of science is to drive economic growth (economization; see a later section) but also the similarly erroneous but hegemonic belief that science is performing insufficiently in this regard. This is all tied to the current evaluation frenzy, because quantification is inseparably tied to economization. This is both because of the historical connection explicated in my article, and because the only way of 'proving' that science is insufficiently productive and contributive to society is to quantify. So economization and quantification inevitably feed off each other, although the exact causal relationships are difficult to establish, as with many historical processes. However, there is also a deeper sociological argument to be made about the inner logics of science as an institution or sphere of society, and how it differs from other institutions or spheres where quantification is fundamentally apt, most evidently the economy.
While I recognize the limitations of the frame of analysis provided by the functionalist sociology of science, and the conceptual tools it provides, I remain convinced that there is a point to be made about science's inner logic being compromised or corrupted by quantitative performance evaluation, because this means the colonization of the sphere or institution of science by (parts of) the sphere or institution of economy (and bureaucracy). I have argued this in other recent publications (Hallonsten, 2021b), pointing at the evidence that suggests that the incentive structures in science become corrupt by evaluation and the governance and resource distribution schemes tied to it, something that several of the responses seem to agree to. For example, 'overemphasis on these benchmarks can promote cronyism' and 'focusing only on the evaluated criteria will ultimately overshadow the core missions of learning and research' (Lizotte); 'measures replace the object they initially seek to measure' because 'concern for evaluation becomes more important than concern for the real phenomena evaluation is supposed to assess' (Brighenti); and 'auditable mechanisms of science evaluation invite gaming the system in pursuit of individual success and are highly prone to giving rise to distortions in perceptions and behavior' (D'Agostino and Malpass). To this shall perhaps be added my own point that quantification has the consequence that accountability no longer means responsibility for conduct, but instead and only means capable to be counted, and so the only way science can responsibly carry out its mission in society (whatever this is), is by subjecting itself to performance evaluation with quantitative and thus oversimplified metrics. If this belief takes root, and I believe it has already among a significant share of those in power of governing science, it is of little use to try to accomplish a middle way and reclaim control over metrics and evaluation practices. Lizotte's 'law for this modern age', that 'if you do not quantify what is important to you, rest assured that others will quantify what is important to them' sounds constructive and instructive. Perhaps it is the most responsible course of action to take control over how science is evaluated and with what metrics, although the risk is that this amounts to little more than giving in and accepting that quantification is the only valid way of proving science's worth. Which, I persist in arguing, it is not. Needless to say -or perhaps, absolutely necessary to saythere are other values than efficiency, especially in science (cf. Peterson and Panofsky).

Introspection
Related to the previous topic is the contentiousness of my simplified dichotomization of 'internal' and 'external' evaluation. While most of my critique in 'Stop evaluating science' was directed towards the bureaucratic structures that impose themselves on science and run on the distrust against science's abilities that began to spread in the 1960s and 1970s, it is also clear that science itself, and scientists, should not be spared from blame for this development. My article could perhaps be read as a Foucauldian argument that we are caught up in a vicious circle of surveillance and discipline (cf. Brighenti), but it also highlighted that the internal reward systems of science seem particularly prone to lend themselves to performance evaluation and rankings of the most superficial sort, with the benign support of many scientists. Also highly inappropriate and invalid indicators such as the 'h-index' are carelessly thrown around by scientists who should know better (among other things, because they many times have the mathematical or statistical training required to understand the flaws of this index), and cynically used by other scientists when it fits their purpose (advances their career) (Gingras, 2016: 43;cf. Wagner, in this issue). This is just the tip of the iceberg, I would claim, and I therefore join Jesper Schneider, Serge Horbach and Kaare Aagaard in their 'plea to the academic community to introspect on its own practices, which still to an overwhelmingly degree shape the reward and quality assurance system of science' (Schneider, Horbach and Aagaard, in this issue). Instead of blaming politicians, bureaucrats, or more diffuse exogenous forces, we should critically examine our own practices (Hannud Abdo). I can only concur.
More specifically, we should undertake further serious analyses of the alleged flaws of peer review, not least including causal relationships between changes in academic practice and in politics and bureaucracy. There are many problematic developments, for which the division of responsibilities is unclear and that therefore are in great need of investigation and further discussion. For example, on one hand, we have identified 'academic capitalism' as the operation of universities as profit-seeking businesses with not only grant money but also publications and citations as currency and capital, that makes the achievements of individual scientists and groups mere means towards this capital accumulation (e.g. Münch, 2014). Journals and citation indexes provide the techniques for sustaining this system and make up a business of its own (e.g. Macdonald, 2015). On the other hand, we have allegations of a deeply corrupt peer review system, both for journal publication and for grant allocation (to the extent that some suggest that the latter even should be replaced by lotteries, see Roumbanis, in this issue) -it is, they say, riddled with arbitrariness and chance, incompetence and ineffectiveness, conservativeness, nepotism, abuse of power, and discrimination (e.g. Laudel, 2006;Miller, 2006;see also Brighenti;Schneider, Horbach and Aagaard). Who is to blame? What came first? What caused what? Clearly, these are important matters that must not be handled too lightly but deserve serious analyses.
One promising starting point for such work is to hypothesize with the help of new or adapted concepts and conceptualizations. The concept of evaluation is itself very flexible and riddled with ambiguity. As the editors of this journal pointed out in the very first paragraph of their editorial inviting this debate (Jaclin and Wagner, 2021), evaluation is and has always been an integral part of scientific practice, and so it might make little sense to distinguish too harshly between 'internal' and 'external' evaluation. Welldeserved critique towards my own use of this admittedly oversimplified dichotomy was issued in the responses (see especially Schneider, Horbach and Aagaard), and quite evidently, as analytical tool, the dichotomy is far from sufficient. Maxim Khomyakov is therefore doing us all a favor when suggesting instead a taxonomy made of a 'substantial evaluation', a 'moral evaluation', and a 'utilitarian evaluation', with the first being part and parcel of scientific inquiry (in the Popperian and Mertonian meaning, see above), the second being a means for society or its institutions to evaluate the consequences and risks of scientific activities in a wider meaning, and the third representing the evaluation of the utility of science through some cost/benefit-analysis. While Khomyakov settles with noting that the third is the 'most problematic', I would probably take the argument one step further so that utilitarian evaluation, in this taxonomy, is the form of evaluation that is (in my words) 'essentially pointless and mostly counterproductive' (Hallonsten, 2021a). But how do we distinguish between utilitarian evaluation and the other two? How can we amend and adapt the taxonomy to enhance the explanatory value of our analyses? Once again we find here a useful starting point for further studies, and recognize that giving rise to further questions is an extraordinarily useful feature of any concept or conceptual scheme, or taxonomy.

Democratization
Yet another topic of crucial interest, that goes to the heart of the question of science's role in society, is the relationship between scientists and their audience, or between expertise and laypeople. Max Weber argued that one of the main advances of civilization into modernity was the willful ignorance of us all about 'the conditions of life under which we exist' (Weber, 1946: 139). Leaving aside the question of whether this is generally a good or bad development, we can conclude that it is inseparable from the growing and indispensable role of institutionalized science in modern society, a role it has played well, to say the least. The corollary question, in this context, is then whether it is possible to reverse this development without destabilizing or damaging the chances for further contribution of scientific progress to human and societal progress. Quoting Weber again, since science 'cannot tell anyone what he should do -but rather what he can do' (Weber, 1949: 54), let me make absolutely clear that the question here is not whether this is desirable, only whether it is possible.
Clearly, science has something that could be called an 'internal and lawful autonomy' and is governed by norms for proper behavior and conducive of continued productivity (Weber, 1946;Merton, 1973;Hallonsten, 2021b). Some would perhaps, with a terminology that is more up-to-date, call it 'institutional logics' (e.g. Thornton et al., 2012). I have argued, in 'Stop evaluating science' and elsewhere, that peer review (or organized skepticism) is inherent and central to this self-governance, and that the self-governance is being compromised by exogenous forces that can be summarized as bureaucratization, politicization, marketization, and the like. To this should be added democratization, meaning both public control over knowledge development, dissemination, and knowledge use, and the form of bureaucratic democratization that Peter Wagner (in this issue) highlights. When local sites of knowledge production are weakened in favor of centralized and standardized governance and funding of science (be it within individual countries or across Europe), those in power in this new regime naturally lack 'rich substantive knowledge about research and researchers' (Wagner, in this issue) and need to rely on uniform and allegedly neutral indicators that allow easy comparison. This of course means (as I also pointed out in 'Stop evaluating science'), an increased control over science and scientists by non-scientists, which can have devastating consequences. I am aware that I hereby give an elitist impression and perhaps also express a naïve overbelief in the 'internal and lawful autonomy' of science as an institution. But the intent is none other than to problematize and seek new paths for analysis and discussion of this important aspect of governance and evaluation.
Alexandre Hannud Abdo's argument about 'deep democratization' of knowledgethat the knowledge society is now a 'fulfilled promise' because knowledge at great scales has been made available to all and everyone -is fascinating and offers a very optimistic view on what others have called the 'death of expertise' (Nichols, 2017) and the rise of populism in its various shades, including recent and more vulgar varieties of 'knowledge resistance' (Klintman, 2019). I am not questioning either of these views, but once again wish for a sociologically and historically informed discussion over the boundaries between the institution of science and the society it serves and lives on. It seems to me that both Hannud Abdo's 'deep democratization' and my own argument about general distrust in science, seconded by Khomyakov who notes that science 'today is unknown, unintelligible and frightening activity', hold. The impetus should of course be deeper exploration of this very issue, and here I would like to bring the toolbox of classical sociology (of science) into the mix. Knaapen argues that peer review indeed is important but 'unlikely to be enough to challenge the economization of science', and suggests instead that 'more diverse external evaluation of science' be applied in order to 'assure science pursues a much broader range of public values, such as truth, democracy, wellbeing and other forms of social, economic and epistemic justice'. While I am generally sympathetic to this idea, I wonder how and if it can really be done. It was implicit in my argument in 'Stop evaluating science' that the outcomes of science -its products, in lack of better words -should be evaluated with attention according to their contributions to society in the widest meaning possible (and certainly not restricting itself to economic growth), but I nonetheless see great risk in the ambition to open up science to even more scrutiny, even with the sincerest of intentions. David Peterson and Aaron Panofsky offer a rather convincing argument for this when they note that 'too often reformers lack practical knowledge about the domain in which they tinker'.

Economization (again)
'Stop evaluating science' started off with the assumption that the current pressure on the institution of science to demonstrate its money's worth is due to a dual erroneous belief that its key (or even only) purpose is to drive economic growth, and that it is fulfilling this purpose unsatisfactorily. It continued by arguing that neither logic nor evidence lies behind these assumptions and the evaluation frenzy they have created, and presented as its main analytical contribution a sociologically-oriented historical review of the developments that have led to this situation. As part of this, the article also argued that the 'evolutionary, cumulative, serendipitous, recombinant, and interactive processes by which scientific research contribute to technological and social innovation' (Hallonsten, 2021a), often stretching out far in time and leading to unpredictable results that show up in unpredictable places, have been convincingly demonstrated in key works in the history and sociology of science, and that anyone taking the time to ponder these stories and the learnings from them will note the relentless inability of quantitative and superficial performance metrics currently used to capture these impacts and thus make justice to the scientific activities behind them.
But performance evaluation is ubiquitous nonetheless, and the reasons for this must therefore be sought elsewhere. Quite evidently, the true understanding of the nature of scientific inquiry and how it is productive and contributory to technical and social innovation has fallen short compared to other dominant narratives, such as the idea that the purpose of any societally mandated or supported activity is to drive economic growth. This is what I have chosen to call economization, a neutral term that should replace the over-used and ill-fit term 'neoliberalism' in these analyses. Aside from the fact that 'neoliberalism' is not an analytical concept but a political and normative (and, most often, pejorative) one, Elizabeth Popp Berman (2014) showed quite convincingly that the deadening focus on economic growth at all costs and as everything's true purpose has been promoted not only by market fundamentalists of the political right, but also by leftists and social democrats. This, by the way, makes perfect sense: Marxism is a materialist philosophy and Marxist politics depend on an analytical framework dominated by economic models and economic thinking, so it is no surprise that also leftist politics in a Marxist tradition would fall prey to economization.
Broadening the analytical frame somewhat, there is also much to suggest that the current ubiquity of evaluation in science, just as in society generally, is part of the same overall development of the 20th century (and beyond) of intensified bureaucratic and political efforts to control and correct things and do away with pluralism, uncertainty, and spontaneous order, in favor of rationality, predictability, and control. Such attempts are part of the universal solution of bureaucratic management that supposedly 'protects us against chaos and inefficiency' and guarantees that 'organizations, people, and machines do what they claim to do' (Parker, 2002: 2). Although such management ideology of course has roots in Taylorism and thus is inseparably tied to capitalism (but not necessarily anything like market fundamentalism or similar), it has become increasingly difficult to separate from public sector bureaucracy and its various attempts to control life. The most recent model is 'New Public Management' (e.g. Hood, 1991), a set of bureaucratic governance tools that notably include both quasi-marketization and quantification. Statism and expansionist welfare state policies are to blame for this just as much as market fundamentalism or 'neoliberalism'. Great sociology is available for those interested in the deeper meaning, causes, and consequences of these developments. The modernization project is both continuous and basically apolitical, because it means the system's invasion of the life world (Habermas, 1984) is based on the inherent expansionist character of the instrumentally rational ('zweckrational') at the expense of other values (Weber, 1957: 115ff.). Peterson and Panofsky describe something similar, in different terms but connected specifically to science vis-à-vis society's other institutions, in their erudite discussion over efficiency: 'Our inability to chart basic scientific progress undermines the ability to measure efficiency. The notion of efficiency only makes sense in the context of established means/ends relationships. The goal is to organize the means in the optimal way to achieve the desired end. The problem is that, in the area of basic science, the end is unknown' (Peterson and Panofsky). Analyses of 'reflexive modernization' as a continuation of modernity demonstrate that the recognition and documentation of risks of all sorts, and the evaluation of the abilities to mitigate them, become a major task of society's institutions (Beck, 1994) which further promote economic thinking, together with bureaucratization, and replace accountability as responsibility for conduct with accountability as capable to be counted.
Here as well, a call for introspection: What did we, as social scientists analyzing the role of science in society, do to hinder or bolster this development? It is quite clear that the structural transformation of (Western) economies in the 1970s and so on coincided (or reciprocated) with the renewal and expansion of the explanatory ambitions and reach of the economic sciences to include knowledge and technological development as factors for economic performance (e.g. Freeman and Soete, 1997;Landau and Rosenberg, 1986). The most evident feature of this development was, likely, the coining of the term 'knowledge economy' to denote an economy where knowledge has replaced raw materials and physical labor as the most crucial production factors. As social scientists, we are partly to blame for the proliferation of this policy Leitmotif, which seems harmless and accurate at face value but which has devastating consequences for the governance of (academic) science and education.
Essentially, the idea of the 'knowledge economy' invites further invasion of the institution of science by the systems or spheres of capitalist economy, politics, and bureaucracy, and their logics. The reason is of course that if knowledge is identified as the most crucial resource in today's economy, then obviously the institutions, organizations, and people that produce, maintain, disseminate, and develop knowledge must be governed and evaluated as any other unit of production. There is little to suggest that the intentions on behalf of politicians and bureaucrats enacting university reforms and imposing quantitative performance evaluation schemes on (academic) science have been anything but sincere: the logic by which they organize their domains of production, policymaking, management, and administration, is one of instrumental rationality. This is, by all accounts, apposite in the market economy and the bureaucratic state, and so the attempts of politicians and bureaucrats to reform universities to make them more efficient naturally follow the same logic, in the name of efficiency and proper goal attainment. Whether or not their good intentions make the prospects of breaking or reversing this development better or worse, is another matter for further study and discussion. But broken or reversed it must be, because no matter how we interpret the details and where exactly we look in terms of blame, I maintain that current performance evaluation in science is indeed pointless and mostly counterproductive. The many interesting and highly contributory responses to 'Stop evaluating science' have added much insight and perspective, yet not convinced me that this is in any way an exaggeration. Thus it seems to hold, as a point of departure for discussion, far beyond what has been accomplished here.

Funding
The author received no financial support for the research, authorship, and/or publication of this article.