Stop evaluating science: A historical-sociological argument

Although science has been a formidably successful force of social and technological development in the modern era, and a main reason for the wealth and well-being of current societies compared to previous times, a fundamental distrust characterizes its current status in society. According to prevalent discourse, science is insufficiently productive and in need of stricter governance and bureaucratic management, with performance evaluation by the means of quantitative metrics as a key tool to increase efficiency. The basis of this notion appears to be a belief that the key or only purpose of science is to drive economic growth, or sustainable development in combination with economic growth. In this article, these beliefs are analyzed and deconstructed with the help of a theoretical toolbox from the classic sociology of science and recent conceptualizations of economization, democratization, and commodification of scientific knowledge and the institution of science, connecting these beliefs to broader themes of market fundamentalism and to the metric fixation of current society. With the help of a historical-sociological analysis, this article shows that the current ubiquity of performance evaluation in science for the most part is pointless and counterproductive, and that this state of science policy is in dire need of reevaluation in order to secure science’s continued productivity and contribution to social and technological innovation.


Introduction
Science, the systematic building, organizing and communicating of knowledge about the physical and social world, has been an immensely powerful source of progress throughout modern history -probably even 'the most successful enterprise human beings have ever engaged upon' (Medawar, 1982: 80). Science-based innovation has made itself known in both constructive and destructive ways, and certainly not benefited humankind equally, but its role in improving living conditions and creating affluence has been monumental. It is indeed impossible to imagine the existence of current society without scientific knowledge, and without the institution of science in a central role. Sociologically speaking, science is an essential part of modernity, and as such, it is a success story.
Nonetheless, the institution of science is currently under enormous pressure to demonstrate its productivity and prove its usefulness to society, which seems to stem from two interconnected assumptions: First, that the main or only function of universities today is to produce innovation and thus drive economic growth, or to accomplish sustainable development in combination with sustained economic growth, and second, that universities are unable to fulfill this function efficiently without exogenous incentivizing and reforms to their governance structures and funding patterns. These two assumptions have a near-axiomatic status in current research and innovation policy (e.g. Geiger and Sá, 2008;Bok, 2003;Mirowski, 2011) and are part of a 'pro-innovation bias' (Godin and Vinck, 2017) that arguably clouds the judgment of both policymakers and scholars and gives rise to all kinds of misunderstandings. Among these misunderstandings are the ideas that innovation is always good, that innovation is the key or only purpose of scientific research, and that innovation should be promoted at any cost. For universities, the consequences have been, inter alia, the broad implementation of performance evaluation based on predominantly quantitative indicators designed to allow for comparison, records-keeping, and the constant production of league tables and performance indexes to keep track of who is the most (and least) excellent and relevant, in a mostly shallow and over-simplified sense (Radder, 2010;Münch, 2014;Hazelkorn, 2011). Only recently has 'impact' been broadened as a concept to encompass also outcomes besides commercialization of scientific results (Blasi et al., 2018: 362), but the grip of the quantitativelyoriented evaluations on the bureaucratic governance of publicly-funded science seems firm enough to prevent this realization from making any real difference (cf. Spaapen and Sivertsen, 2020: 2).
This article argues that the current ubiquitous practice of performance evaluation of science is not only a 'Frankenstein monster' (Martin, 2011) but also essentially pointless and mostly counterproductive, and that the assumptions that form a basis for its implementation and wide acceptance, are flawed. The article repeats Linda Butler's (2007) 'plea for sanity' in research evaluation, and adds to it a plea for reflection and consideration of the vast (historical) evidence that science is enormously productive, and has been also in times when quantitative performance evaluation was not practiced as a tool in science policy and university governance; that is, for most of modernity. Therefore, this article argues, the burden of proof should be shifted over to those who claim that science is insufficiently productive and that quantitative and shallow performance evaluation and competitive allocation of resources on basis of such evaluation is necessary to improve productivity.
The article is deliberately pointed and provocative, highlighting problematic aspects of the current science policy 'regime' (Elzinga, 2012) and especially the ubiquity of quantitative evaluation practices. The main contribution is a historical examination of the developments that affirmed the assumptions that science most of all is an economic engine, that it is in need of constant fixing and repairing, and that the proper remedy is performance evaluation. The article shows that these assumptions are based neither on evidence nor logic, but came about as a result of historical developments, amounting to a growing distrust in science and democratization of scientific knowledge that certainly brought social progress and a healthy reevaluation of dogmatic beliefs, but also had problematic consequences, whose reevaluation is paramount to a proper understanding of the role of science in society and the consequential readjustment of science policy.
There are many ways of misinterpreting this article and its argument. The deliberately provocative tone, necessary to spur desired debate, has the drawback of giving voice to some generalizations and simplifications. Some readers might thus assume that the article, between the lines, advocates a return to some 'pure' state of science, some 'good old days' prior to the historical developments accounted for, when science operated in splendid isolation, unaffected by market, state, and other societal institutions. No such claim is made, since this would be severely ahistorical: There has hardly ever been any such times, and the point of the article is also not to advocate the creation of such an idealized state. Others may perhaps interpret the article as claiming that the practitioners of science are innocent victims of political interference and the implementation of governance and evaluative standards alien to their true identities. Also this would be ahistorical, since the changes discussed in some detail in this article have occurred both with passive consent of the many, and through active embracing by the (perhaps not very) few. The article is, in this regard, just as much a call for soul-searching as it is a complaint aimed at nonscientist bureaucrats and decision-makers, and it is certainly an appeal to academics of all breeds to reflect upon their working conditions and the institutions surrounding them, so as not to 'be condemned to the fate of the 'boiled frog'' (Martin, 2011: 252). In sum, the article should be read mainly as an attempt to (re)vitalize the scholarly discussionand, possibly, broader debate -on the role of performance evaluation in science, the reasons for its ubiquity and supremacy in university governance, and whether this at all is warranted, apposite and desirable.

Modern science
Fundamental works in the sociological analysis of the historical transformation of society into modernity, through enlightenment, industrial revolutions, and the growth of the liberal democratic welfare state, have highlighted the crucial role of science, and the inseparability of the institutionalization of science from the modernization of society (e.g. Durkheim, 1969;Weber, 2009b;Merton, 1970;Parsons, 1966;Schluchter, 1992;Giddens, 1990). Interestingly, however, the initial institutionalization of science in society was largely detached from universities. Early scientific development of the 17th and 18th centuries was mostly practically oriented, and catered largely to the needs of industrial development, and it was therefore also funded mostly by industrialists and philanthropists (Merton, 1970;Cohen, 1990). Universities remained most of all providers of higher education, organized to supply the state and the church with lawyers, priests, surgeons, and civil servants (De Ridder-Symoens, 1996), and it was not until the Humboldt ideal of combining scientific research and higher education, in a university characterized by free and open pursuit of knowledge by professors and students in interaction, that the purpose and mission was broadened to include scientific research (Östling, 2018). This happened in the late 19th century and the early 20th century, in both Europe and North America, in parallel with the development of other distinctive features of the modern science system, including journals and monograph publications as the key communication channel, peer review as the key selection mechanism, and the doctoral, postdoctoral and professorial stages as central to the scientific career structure, with open competition for the highest academic positions (Baldwin, 2018;Parsons and Platt, 1973;Cole, 2009;Wittrock and Elzinga, 1983).
The research university is therefore an institutional feature of society that dates back two hundred years, at the most. Most scientific achievements before that were made in a different context, much more pragmatic and practically oriented, significantly less regulated, and mostly without high academic ideals. Sociologists have shown that science's proven ability to produce practically useful results, that could be turned into innovation for the rationalization of industrial production, accumulation of wealth, and the improvement of living conditions, was what warranted the later institutionalization of academic freedom and the self-governance of science in universities, not the other way around (Merton, 1970;Zuckerman, 1989;Cohen, 1990).
The growth of public and private spending on science in the post-World War II era followed a similar logic. Although ideological arguments for the utilization of science and technology as a central force of progress was in place before the war (e.g. Bernal, 1967), it was quite obviously the rather spectacular (and in part horrendous) demonstrations of impact of science during World War II that gave the impetus to the postwar expansion, on both sides of the Iron Curtain, of public and private R&D. A rudimentary division of labor was put in place, between on one hand classified (military) and proprietary (industrial) research, and on the other hand the free and open scientific inquiry in academic settings, but hybrids and intersections were also frequent (Agar, 2012;Pestre, 2003). Authors have suggested that the expression and concept 'military-industrial complex' is too narrow and should be expanded to 'military-industrial-academic complex' to reflect the broader and deeper significance of science, including especially its academic branch, for the North American postwar experience of supposed rationalization of society, fueled by superpower competition and material and economic progress (Giroux, 2007;Hallonsten, 2016). There is little to suggest that the situation in Europe was any different: In the thirty years that followed the end of World War II, unprecedented economic growth and dramatic improvements of material standards through innovation and redistribution of wealth reinforced each other, and science and higher education played an enormously important role as engines of this development (Cozzens, 2003;Pestre, 2003;Agar, 2012). Pre-war achievements, such as the invention and discovery of superconductivity, quantum mechanics, antibiotics, the jet engine, radar, synthetic material including nylon, and the atomic reactor, were complemented by similarly spectacular scientific advances with similarly spectacular technical and social results (the depth and width of which can still not be fully assessed): the transistor, oral contraceptives, the DNA structure, the laser, the GPS, the CT-scan, magnetic resonance imaging, and a vast number of incremental improvements, additions, recombinations, and amendments to these and others, many of which are just as significant but rather difficult to summarize in catchphrases or under simple names or acronyms. This of course pertains also to the social sciences, from which influential schools of thought have emanated, such as Marxism/neo-Marxism, postmodernism and critical theory, behavioral economics and neoinstitutional approaches to organization and management, theories of late/reflexive/ liquid modernity and risk society, innovation systems, and so on.
The prevalent research policy doctrine of the first two to three decades after World War II has later been summarized by the use of two concepts: the social contract for science, which is far from a contract in formal sense (and also different from the 'social contract' in Hobbes' or Locke's meaning) but an arrangement whereby science makes useful contributions to society in return for generous funding by the state and its taxpayers, which little or no strings attached (Guston, 2000), and the linear model for technological innovation, which says that generous funding for free and curiosity-driven research by default will produce scientific results that can be turned into technological advances, that in turn are transformed into societal benefit through innovation-based economic growth (Godin, 2006). Importantly, this policy model assumed that science, if left to govern itself and seek its own paths, would deliver spectacular results and findings that would benefit society in a wide variety of ways. With the benefit of hindsight, it can be concluded that this belief was (and is) essentially correct, although there are of course proven exceptions. Key is, however, that while scientific research can be expected to have profound technological and social impact on the world, and while such impact is perhaps also reasonable to demand, it is for the most part impossible to trace if the time frame is just a few years, and certainly impossible to give justice with simple metrics. Examples that undeniably demonstrate the evolutionary, cumulative, serendipitous, recombinant, and interactive processes by which scientific research contribute to technological and social innovation are found in excellent recent works like Matt Ridley (2020) and Mariana Mazzucato (2015), and previous analyses like I. Bernard Cohen (1985) and David Hull (1989). These show that for the absolute most part, the scientific advances and innovations that have shaped modern society and produced our wealth and well-being -let's just for the sake of simplicity mention the use of electricity, the curing of disease, the development of efficient transportation, the preservation of food, and the assembly of highly sophisticated technologies into consumer electronics gadgets including, most recently, smartphones -happened in ways that would never have been possible to trace and assess properly with present-day evaluative metrics, and likely never had happened at all if the allocation of resources to scientific research in past times had been made in the current short-sighted manner, based on the results of the application of those metrics.

Economization
The conclusion at the end of the previous section is uncontroversial for those who have undertaken qualitative studies of the institutional foundations of innovation (e.g. Westwick, 2003;Mody, 2011Mody, , 2016Gertner, 2012;Gribbe and Hallonsten, 2017). Nonetheless, although such qualitative inquiries offer vast evidence that science cannot be properly and fairly evaluated without a holistic, qualitative and long-sighted perspective, they are customarily eclipsed by another story, backed up by over-simplified quantitative measures of scientific productivity borrowed from, or inspired by, business practices. According to that story, (academic) science is underperforming as producer of innovation and driver of economic growth, especially considering overall levels of investment in research and development (R&D). Depending on level of analysis, this has been called the 'Swedish paradox' (Edquist and McKelvey, 1998;Ejermo and Kander, 2009), the 'European paradox' (Andreasen, 1995;Dosi et al., 2006), and the 'innovation problem' of advanced industrialized economies (Guston, 2000: 113). The statistical data and analyses that are used to draw this conclusion -levels of investments in R&D, measures of outputs of publications and patents, and rankings with the help of indexes and weighted measures -have been criticized on empirical grounds (Jacobsson and Rickne, 2004;Granberg and Jacobsson, 2006). As already noted, their validity can also be questioned conceptually and logically, but they do have a much explainable lure that probably stems in part from their kinship with quantitative measures and abstractions of the economy, such as growth, productivity, balance of trade, stock market value, and net worth (cf. Muller, 2018). The shallowness and inaccuracy of measures of gross R&D investments, aggregated publication and citation counts, and rankings of 'excellence' and 'innovation capability' based on these, mean little in the context of their application, since their purpose is not to make justice to scientific knowledge production and innovation per se, but to make sure that universities perform their claimed prime task of producing innovation and driving economic growth.
Elizabeth Popp Berman (2012 has called this economization: since several decades, science is no longer viewed as a public good but as a financial good, and no longer expected to advance civilization or culture in a wider sense but to first and foremost drive economic growth (cf. Mirowski, 2011). Through series of reforms, unequally implemented in different countries but with a clear common theme, governance structures and evaluative practices have been adapted accordingly. Though the broader ideological superstructure is the 'market fundamentalism' of current society (Bourdieu, 1998;Stiglitz, 2002;Somers and Block, 2005), the profound reforms of university governance that most countries have seen in the past thirty to forty years, largely as a result of economization, seem also to a great extent have been based on ostensibly sincere ambitions of increasing accountability and efficiency of university research. These ambitions are both easily motivated considering the vast growth of public expenditure on academic research and education, and understandable from the viewpoint of politicians and bureaucrats. Universities are ancient meritocratic federations of autonomous chair holder professors, that have adapted slowly (and some would say, insufficiently) to growth and new demands from society, and their governance and organization may appear both incomprehensible and hopelessly inefficient, perhaps only possible to understand and appreciate for those who are trained as scientists and socialized into academic communities (Hagstrom, 1965;Readings, 1996;Ginsberg, 2011). For everyone else, including especially policymakers, government bureaucrats, and the growing cadres of administrative staff at the universities themselves, academia is therefore considered fair game for radical reform -perhaps even in urgent need of it.
The foundations of economization are therefore non-partisan and, at least in terms of the dominating left-right axis in late 20th century politics in Europe and North America, unideological. Not only 'neo-liberal' reform agendas lay behind the reforms of university governance, but politicians of all established parties, and apparently also their voters, seem united in the expectation that government, all its branches, and all the institutions it funds and keeps in operation, first and foremost work actively to 'affect positively the larger economy' (Berman, 2014: 399). According to this logic, academic science, massively supported by the state, is also viewed as a subcontractor to the economy and its growth, which is seen as an end in itself (Mirowski, 2011).
But science, especially academic science, is an institutional sphere of its own, with 'internal and lawful autonomy' (Weber, 2009a: 328) and fundamental, normatively structured, principles of (self-)organizing (e.g. Merton, 1973a;Hagstrom, 1965;Whitley, 2000). Most conspicuously, like several other vital institutions in a democratic society, science lacks the price mechanism that allows markets to straightforwardly demonstrate differences in value for money, based on the continuous evaluation and reevaluation of the performance of suppliers and vendors by their customers (Münch, 2014: 3). This absence of a functioning proxy measure for performance, that can be used to evaluate and rank the equivalents of 'production units' in science -be they individuals, groups, departments, or universities -makes evaluation difficult and inequitable, and evaluation with standardized quantitative indicators essentially impossible and pointless. By extension, this means that such evaluation constitutes a direct threat to the institutional autonomy of science, and the operational and organizational logic it entails. It is well known, and has long been, that any appraisal or even rudimentary data collection on the behavior and performance of individuals and organizations alters those behaviors (Merton, 1936: 903-904;Ravetz, 1971: 295-296;Muller, 2018: 19-20;Espeland and Sauder, 2007). Put differently, evaluation changes what is evaluated, creating artifacts and measuring things that are not relevant to measure. In academic science, this means incentivizing scientists to engage in conformist window-dressing and publication for its own sake, rather than maintaining creativity and diversity and undertaking innovative, exploratory and crossborder studies (Whitley, 2007: 9-10;Martin, 2011: 250;Weingart, 2005;Tourish, 2019). The proliferating habit of connecting financial rewards to ranking positions and the meeting of specific evaluative standards on organizational and individual level (Hicks, 2012;Watermeyer, 2019: 107ff.) quite obviously risks to accentuate these effects.
Meanwhile, the autonomy of science and the logics of its self-organization, are arguably pivotal to its capability to contribute to society. Economization and the current evaluation frenzy could therefore prove to be devastating to science and the fulfillment of its institutional goal of 'extending certified knowledge' (Merton, 1973b: 270). Many have argued that this is the case, although using a different terminology and a different conceptual framework (e.g. Münch, 2014;Radder, 2019;Rider et al., 2012;Cole, 2015;Gertner, 2012;Mirowski, 2011;Collini, 2012;Watermeyer, 2019).

Distrust
The post-World War II expansion of publicly funded science and higher education had several drivers. It was a period of remarkable economic growth which provided the means for the expansion. The cultural status of technological progress and supposed rationalization, under the banner of modernity, was probably at an all-time high in the immediate postwar decades. The Cold War superpower competition, which in no small part took place in the area of science and technology, contributed through both direct military R&D efforts at an unprecedented scale and in other fields of competition where the link to military superiority was less direct, such as particle physics and spaceflight. In the 1960s, however, several parallel and intertwined processes were set in motion that coproduced the economization of science. Their common denominator was distrust.
The 1960s saw broad and deep questioning of the ruling modernist ideals and the abilities of technological development to improve life, most conspicuously expressed by the social movements that entailed emergent environmentalism, feminism, pacifism, and anti-authoritarian (neo-Marxist) civilization critique. The Cuban Missile Crisis and the Vietnam War produced a (rightful) questioning of the rationality of the contemporary modern world order built on technological strength, also on political level, and a period of superpower détente and alternative political movements ensued. U.S. President Dwight Eisenhower's worries of the 'military-industrial complex' became a fuel for the debate, and only a few years later, President Lyndon Johnson rhetorically asked 'what science can do for grandma', arguing that it was time to reap the benefits of basic research for the good of society, in practical terms, and not only exploit its destructive forces (Kevles, 1995: 411). As summarized by Ulrich Beck (1992: 169), '[u]ntil the sixties, science could count on an uncontroversial public that believed in science, but today its efforts and progress are followed with mistrust.' But the youth and/or leftist suspicion directed towards authority of all flavors in the 1960s also took hits on expertise, including scientific expertise, which was viewed as inseparable from elite dominance and prejudice. Hence anti-authoritarianism also paved the way for the implementation of supposedly transparent and democratic metrics and performance appraisals at the expense of professional judgment and institutional autonomy, both of which became branded as reactionary agents of the establishment (Muller, 2018: 40-41).
The economic downturn, taking full speed in the early-to mid-1970s, fueled a reconsideration of the beliefs in the social contract for science and the linear model for technological innovation, and their gradual replacement by active policies to prioritize and invest strategically by 'picking the winners' ahead of the commencement of funding, to maximize output from the public spending on R&D (Irvine and Martin, 1984). This shift of 'regimes' of science policy (Elzinga, 2012) coincided with a broadening of the knowledge base of the economic sciences, which meant the inclusion of a wider range of factors in the analyses of causes and effects of economic performance on the level of firms and industries, regions and countries, and globally. Not least were knowledge and technology, or research and development (R&D), acknowledged as key factors for economic performance both on short and long term, and attention was therefore increasingly paid also to the institutions of science and education, and their (measurable) contributions to economic development and growth (e.g. Nelson and Winter, 1982). Structural transformation through the decline of traditional manufacturing industries in (Western) Europe and North America, and their partial replacement by high-tech and service industries, gave rise to early proclamations of the dawn of the 'knowledge society' (e.g. Drucker, 1969). Somewhat paradoxically, major corporate R&D divisions, famous for their achievements (e.g. Gertner, 2012;Pithan, 2019), were downsized and their activities outsourced. As noted in a previous section, R&D had been an inalienable part of private enterprise since the first industrial revolution, but when the growth of publicly funded research turned universities across (Western) Europe and North America into major powerhouses of R&D, many companies began to question why they would invest in R&D at all when the public sector both had the institutional means (universities and institutes) and the financial means (a seemingly ever-increasing tax revenue) to do it for them (Rosenbloom and Spencer, 1996).
On the level of elected officials and civil servants, the questioning of the abilities of science to efficiently contribute to welfare improvements (in the 1960s and early 1970s), and to the performance of the economy (in the late 1970s and on), led to the crafting of reform agendas to remedy the problems. Though policy and decision makers may, at the time, have been aware of (some of) the complexity of innovation processes and the important role of private enterprise in innovation-based economic development, the direct jurisdiction of the state extends only to the public sector. Efforts to influence private industry must be limited to legislation and tax incentives, whereas university governance reforms can reach farther, and certainly have.

Democratization
The 'knowledge society' is usually taken to mean a society where knowledge, and especially scientific knowledge, is the key resource for individual and group achievement, and where consequently capital and physical labor are less pronounced (but still important) production factors (Stehr, 1994;Välimaa and Hoffman, 2008). An alternative interpretation holds that the 'knowledge society' is a society where 'knowledge' has been made a 'key defining aspect' and therefore 'an increasingly politically laden concept and one on which a range of social interests try and make a claim' (Sörlin and Vessuri, 2007: 1-2). The consequence is not only, or not at all, 'a society permeated in all its parts by knowledge' but rather 'a society where the institutions of knowledge production are penetrated by all of society's other interests and institutions, including but not limited to politics' (Hallonsten, 2016: 37-38). The latter of course means that knowledge and knowledge production to some extent is democratized, an effect with presumably positive connotations but potentially problematic consequences. As noted, science has certain fundamental principles of (self-)organizing that it does not share with other institutional spheres of society, who have their own similar fundamental principles of (self-)organizing (e.g. Weber, 2009a;Merton, 1973aMerton, , 1973bHallonsten, 2021). Simply put, the sphere of the economy upholds a balance between supply and demand by the price mechanism, and stimulates innovation and renewal by the profit motive. Politics allocates power and influence through persuasion and bargaining, and is legitimized by the popular vote and majority rule. State bureaucracies consist of organized hierarchies that apply rules and regulations uniformly and impartially in systematic processes to maximize efficiency and maintain order and continuity. Science, on its hand, establishes consensus around functional descriptions and manipulations of the physical and social world through organized skepticism, meritocratic primacy, and allocation of non-material rewards in the form of credit and acknowledgement (Merton, 1973a(Merton, , 1973bWhitley, 2000;Münch, 2014;Hallonsten, 2021).
The basic principles of the spheres of economy, science, politics, and so on make them distinctly different, and while they certainly do not operate in isolation (indeed, their differentiation, interpenetration, and structural coupling has been the topic of extensive works in 20th century sociology; see e.g. Parsons, 1952;Luhmann, 1984;Münch, 2011), the guarding of their autonomy is arguably a key feature of a free and open society, and a matter of efficiency and proper functioning of society as a whole. It follows that science should, as far as possible, be left to scientists to govern. The deep and qualitative appraisal of the content, rigor, meaning and potential contribution of scientific research can only be done by peers, and must therefore be kept internal to science and its disciplinary communities (Hallonsten, 2021). But in the name of democratization, and based on the prevalent belief that science is inefficiently organized, this can hardly be accepted by politicians, bureaucrats, industrialists and the general public. A science that receives generous public support and that, it is believed, is in dire need of incentives and other governance efforts toward greater productivity, efficiency, and accountability, must be evaluated with simple and clear metrics that clearly display who is 'excellent' and who is not, and managed in accordance therewith. The evaluative and managerial tools used for this come from politics, bureaucracy and enterprise, and are essentially alien to scientific practice and scientific self-governance, but they are the only tools in the hands of political and bureaucratic decision makers.
The resulting changes to the governance of (academic science), and the key role of performance evaluation, form part of a broader complex of similar reforms that have been analyzed with the use of concepts such as 'market fundamentalism' (Somers and Block, 2005), the 'audit society' (Power, 1997), 'metric fixation' (Muller, 2018), and the 'new public management' (Hood, 1995). Several institutions in society that were previously under professional self-control are increasingly being managed, monitored and evaluated by the means of general indicators, indexes and league tables, imported from the economic sphere and enforced by bureaucratic governance and regulation in the name of accountability and efficient use of public funds. In broader perspective, this increased use of metrics in the mapping and evaluation of various institutions and processes in society is itself a democratization process, given that it allows the extension of 'social control over reality itself' through ostensibly objective and value-free descriptions and quality assessments of institutions and processes in lay terms, as opposed to professional (elite) rule (Bonaccorsi, 2018: 20-21). Moreover, in the general case, population statistics, social surveys, and the standardization of such data have been instrumental in the creation of the welfare state and thus in the expansion of democracy (Bulmer et al., 1991). Still, current 'metric fixation' tends to go several steps further and conflate accountability as synonym to responsibility for conduct with accountability meaning capable to be counted, with the resulting proliferation of the belief that only by subjecting themselves to (externally controlled and quantitative) performance appraisal can institutions really be considered responsible and satisfactorily carrying out their missions (Muller, 2018: 17-18). In science, as we have seen, this is contrary to reality, and while in some sense scientific knowledge is democratized in the process, the implementation of this kind of accountability happens at the expense of professional autonomy and the rule of professional judgment and expertise. This begs the question whether the interpretation of the 'knowledge society' as a society permeated by knowledge in all its parts in fact is hollow: What substantial content can this knowledge have, if it has been selected and promoted through evaluations by non-experts? Therefore, the metrics used to democratize institutions through ostensibly objective and value-free indicators of performance and quality, are misdirected and deceitful, since they are used to put professional operations under lay control or, really, bureaucratic control. The continuous expansion and maintenance of evaluation programs and performance-based management, and the compilation, interpretation, and use of the results of these evaluations, require an 'elaborate bureaucracy' with soaring costs that, most likely, far outweigh the benefits (Martin, 2011: 251). Its interferences with professional and collegial governance have long since transcended the level of minor nuisances (Readings, 1996;Ginsberg, 2011). But the creed that universities should be more business-like and that more performance evaluation is a key means of achieving such a business-like university, is ironic given that most businesses 'have a built-in restraint on devoting too much time and money on measurement -at some point it cuts into profit' (Muller, 2018: 75). Universities and other non-profit organizations have no such limits, and can uninterruptedly continue to channel funding away from the performers of the core functions of the university -education and research -to its ever-growing bureaucracy.

Commodification
The constant use of performance evaluation as part and parcel of university governance has been interpreted as a variety of Foucauldian surveillance (Foucault, 1977), since it compromises and challenges the institutional autonomy of academic science and the academic freedom of individuals and puts science and scientists under exogenous inspection, measurement, and evaluation (Bonaccorsi, 2018: 20). But, as Byung-Chul Han (2015: 8) has pointed out, current society is an 'achievement society' rather than a 'discipline society', and its inhabitants should generally be viewed as 'achievementsubjects' rather than 'obedience-subjects', since they are not only expected to constantly make achievements, but also to be entrepreneurs and to constantly record and demonstrate their achievements. This seems to fit rather well with the institution of science, which is generally inhabited by very credit-and status-seeking individuals. The problems of ubiquitous performance evaluation in science therefore goes deeper than the Foucauldian analysis of surveillance as a central feature of current society, since the internal reward system of science appears to be an easy prey for policy and governance reform that enrolls scientists in a game of accumulation of publications, citations, grant money, and their common proxy: prestige. The scientific reward system, analyzed in great detail by Robert K. Merton (1973c), Warren O. Hagstrom (1965), Jonathan R. Cole and Stephen Cole (1973), and others, seems to have built-in features that work rather well with current evaluation frenzy and metric fixation. With one major exception. An essential part of the one-sided application of simplified and quantitative metrics in the evaluation of scientific quality, relevance, and success today is namely the (erroneous) view of scientific knowledge as a product to be bought and sold, in other words the commodification of science. This is an important feature of the phenomenon that has been called academic capitalism, namely that universities are managed more like businesses and individual academic achievement is colonized by university managers in their hunt for funding and prestige (Münch, 2014: 10).
But academic capitalism also means business in a very real and direct sense: a whole industry has been built up to provide the means for the evaluations, and conduct them, with advanced metrics, databases, and rankings. Besides grant money, which is nowadays accumulated by universities as if this was an end in itself (Greenberg, 2007: 11ff.), articles in what is known as 'top' academic journals, and citations to these articles, have become a hard currency. An explosive growth of the number of article manuscripts submitted to such 'top' journals has occurred both because of the pressure on individuals to publish, and the informal but broadly accepted assertion that articles published in these 'top' journals are of higher quality than other publications (Campbell and Meadows, 2011;Pacchioni, 2018). This rests on the erroneous assumption that the quality of a scientific finding correlates completely with the frequency of citations to the publication where the finding is communicated, an idea which of course is absurd. To begin with, there are many reasons for citing an article, besides appreciation of its quality. But the use of citation counts and indexes as a proxy for quality is a highly attractive option for the non-expert, and nowadays forms the basis for the strategic management of universities and their research activities, which is all about attracting funding and the prestige of high scores on league tables that reflect accumulated numbers of articles in 'top' journals, and citations to these (Greenberg, 2007;Münch, 2014). The commercialization of journals, that has enabled major publishers to reap giant revenues from the subscriptions paid by university libraries for the periodicals that university researchers need in their work, fuels the development (Macdonald, 2015).
Worse, still, is the status given to the journal article as a product.
As the key answer to the demand for a simple measure of 'excellence', and because of the would-be knowledge society's need for a simple criterion that separates truth from non-truth, the journal article has become not only a commodity but the carrier of truthful or correct scientific claims and results. In the eyes of the lay public and its elected representatives, anything that passes the mythical journal peer review and ends up in a highly-ranked journal is the objective truth (Roberts and Shambrook, 2012;Macdonald, 2015;Hallonsten, 2021). Obviously, this is not an accurate belief. Scientific articles can, in extreme cases, be deliberately misleading, or be written and published in all honesty and still contain smaller or greater mistakes. The only way of validating the knowledge claims of journal articles is to incorporate them into the scientific commons and put them to the continued, systemic, and social procedure of reading, citing, criticizing, confirming/refuting, and use in further scientific work. Journal articles are not the end games of scientific knowledge production and dissemination but minor (or, in rare cases, major) contributions towards it. Any claims to the contrary are erroneous, no matter how convincing the numbers that back up the claims seem to be (for an insightful discussion of this, with many educative examples, see Pacchioni, 2018).
But convincing they are. Scientific 'truth', as a commodity that can be bought and sold and not least packaged and marketed to fit a political agenda or business strategy, has a major appeal. The power of science to legitimate political action or back up claims of superiority of a product or service over its competitors, is forceful. Calls for 'evidencebased policy' are ubiquitous (Cairney, 2016), and the convenience of the metrics of journal rankings on basis of citation indexes, compared to a deep and qualitative assessment of the content and meaning of the scientific research itself, is obviously attractive for policymakers and the general public. The democratic problem of such a shift of responsibility from elected officials to more or less anonymous 'experts' has been the topic of some scholarly analysis (Pestre, 2003;Góra et al., 2018). It should, however, be noted that the demand for 'evidence-based policy' must not mean a general growth of the value of scientific knowledge in society (cf. the 'knowledge society' above), but can also be a function of a need to strengthen the legitimacy of what is considered rational and reasonable decision-making in the face of a growing environmental complexity, new challenges and threats to establishment consensus and assumed rationality by new movements that are difficult to grasp. The commodification of scientific knowledge, and the easy identification of scientific results of high quality and relevance by the labeling of 'top' journals and especially 'excellent' scientists and scientific environments, is crucial for such use of science in the service of politics, public bureaucracy, and mass media.

Conclusions
The conceptualization of science's institutional autonomy, its fundamental principles, and the essential role that various forms of formal and informal stratification and quality assessment have in the (self-)organization of science, formed a theoretical basis for the analyses in the above. Among other things, it was highlighted that the continuous evaluation of quality and the ranking of scientists, of their findings and results, and also of the organizations that embed them, is an essential feature of scientific work and a cornerstone of scientific knowledge production processes, as a variety of philosophers and sociologists of science have shown, with varying terminologies but all pointing at the essentially cumulative, collective, and interactive nature of science (e.g. Popper, 2002;Polanyi, 2009;Merton, 1973b). This means that the formalized evaluation of quality in science, as part of publishing, appointments, and allocation of competitive funding, is also both necessary and desirable. However, this is something altogether different from the evaluation of 'excellence' and 'relevance' for the sake of 'picking the winners' and maximize output or increase efficiency and accountability. While the current evaluation regime is geared entirely to the use of ostensibly neutral, value-free, and thus 'objective' and uniform criteria that also laypeople can understand, the performance evaluation in science that forms an essential part of its self-organization and its knowledge production processes is strongly dependent on the differentiation and specialization of scientific communities and disciplines and their varying social and intellectual organization (Whitley, 2000). Within these, evaluation tends to 'converge on a core set of discipline-specific quality criteria without violating epistemic pluralism, academic freedom, and the right to dissent' (Bonaccorsi, 2018: 22) and utilize these in qualitative evaluations that certainly entail rankings, although these are far from the general and shallow rankings of universities and 'top' academic journals based on simplified metrics. Such qualitative evaluation and ranking happen every day in science, in a myriad of forms, on all organizational levels imaginable, and with a great variety of formalization and structure -from the colloquial exchange of ideas between close colleagues over coffee or lunch, to the double blind peer review of manuscripts submitted for publication in journals, and everything in between.
Evaluation of that kind is both appropriate and sensitive to the evolutionary, cumulative, serendipitous, recombinant, and interactive processes of scientific knowledge production. It is therefore also attuned to the role of science in society and of the nature of scientific knowledge production, in a way that the current evaluation regime certainly is not. The latter is built on several misunderstandings, that this article has identified and explained the real origins of, including the view that innovation for the sake of economic growth is inherently good and should be promoted at any cost; that the main or only role for (academic) science in society is to drive economic growth through such innovation, or sustainable development in combination with sustained economic growth; and that universities and their professionals are performing this function inefficiently. The only remedy to these misunderstandings, besides explication of the historical and sociological reasons for them, is to hand over the burden of proof to those who hold them and articulate them. Questions like 'has science been productive enough?' and 'how can it be proven that science has been productive enough?' shall first and foremost be answered with a rhetorical question, namely, 'how else do you suppose that we have achieved this level of wealth and technical standard in Europe and North America?' In spite of the overwhelming logic of this rhetorical counter-question, and the historical evidence that supports it, champions of the view that science is insufficiently productive and must be made productive and held accountable through limitations to its self-governance and the use of quantitative performance appraisals will demand evidence that they can comprehend and, preferably, compare with their own simple and straightforward numbers. A list of counter-examples will therefore probably not suffice, since it can be discarded as mere 'anecdotal evidence' against which also the shallowest and most oversimplified statistics usually win. The argument should therefore center on the basis of the supposition itself, i.e. aggregate quantitative measures and indicators, and the demonstration that these are hopelessly incapable of capturing the full impact of science through technological innovation. The discussion requires problem reframing: perhaps the articles or books or letters that communicated key results and findings in the evolutionary, cumulative, serendipitous, recombinant, and interactive processes that enabled society to make use of electricity, cure a range of diseases, develop efficient transportation, preserve food, and assemble highly sophisticated technologies into consumer electronics gadgets including, most recently, smartphones, got thousands of citations. Who cares? None of those citations had anything to do with the profound transformation of society that was the true outcome of all that science. From this, the analysis and discussion should of course move on to consider what current citation indexes and similar measures say about the potential impact of current science.
Another important lesson is that innovation never occurs in isolation, and that the productivity and contributions of singular R&D activities never can be fully appreciated without attention paid to other crucial nodes in the networks that embed the activities, including complementary R&D, absorptive capacity, infrastructure, financial and social incentives, and other structural features of economy and society that influence innovation processes in a range of predictable and unpredictable ways (Ridley, 2020;Mazzucato, 2015). Quite evidently, quantitative performance evaluation cannot capture this complexity and therefore never provide evidence enough to judge whether R&D activities are productive, relevant, or 'excellent' enough. An efficient and productive innovation system requires many different components, of which not all are highly productive in an immediately recognizable sense (e.g. Hallonsten et al., 2020), and of which not all are possible to straightforwardly identify as vital to the system.
The consequence is obvious in its simplicity. The notion that contributions to the improvement of society by technological or social innovation can always be mapped and measured is erroneous. Likewise, the notion that the main or only purpose of universities is to drive economic growth through innovation, in ways that can be measured with quantitative indicators, is flawed. Science has, quite evidently, contributed immensely to the modernization of society and the vast improvements of living standards in Europe and North America in the past two hundred years, including the development of an economy and a society with less harmful impact on health and the environment. It is time to stop evaluating it with metrics that obviously fail to make justice to its success, and most of all time to stop governing it on basis of what these metrics show. Either Lord Kelvin (or Peter Drucker, or whoever really said it) was wrong in stating that 'if you can't measure it, you can't improve it', or science does not need improving, or alternative and more accurate means of science evaluation need to be developed. Or maybe all three. Serious debate, built on historical evidence, sociological insight, and logic, should ensue to handle this important matter.
School of Economics and Management (LUSEM-CIRCLE research grant) for the work on this article.