A History of Regulatory Animal Testing: What Can We Learn?

The contemporary pharmaceutical industry is voicing growing concerns about the translatability and reproducibility of animal models. In addition, the usefulness of certain of the required regulatory safety tests in animals is being increasingly questioned. It remains difficult, however, to make the move toward alternative testing methods, not least because of legislative demands. A historical analysis was performed, in order to study how the mandatory animal studies in legislative requirements came about. This article reflects on the role that specific public health disasters played in the creation of (more) regulatory requirements for animal testing. It will show how the regulatory changes prompted by the sulfanilamide elixir disaster in the 1930s and the thalidomide disaster in the early 1960s were based on the belief that extensive animal testing would prevent similar future human health tragedies. As scientists increasingly highlight issues with translatability between non-human animals and humans, the belief that current regulatory requirements ensure safety becomes more difficult to maintain. In addition, it means that some of the regulations now in place require animal tests that do not contribute to the safety of a drug, as shown in a third case study of the court case by Vanda industries against the FDA. We finally argue that regulations should be critically examined and altered where necessary, so that they are no longer a barrier in the transition toward animal-free testing and more human-relevant science.


Introduction
Over the past half-century, the advances of medical science and pharmaceutics have grown exponentially, and so has the volume of animal testing. In 2018, the EU witnessed nearly nine million animal experiments, out of which around 18% were performed in order to satisfy safety legislation requirements. 1 Recently, the EU Parliament voted in favour of making concrete plans to phase out animal testing in the EU. This will encompass taking a critical look at the role of animal testing in safety legislation. 2 Regulatory animal experiments are demanded as part of the medicine and vaccine development process. Pharmaceutical companies typically test their medicinal products following a tripartite testing procedure: preclinical, clinical (Phase I, II and III) and post-marketing surveillance. In the preclinical phase, products are tested in vitro as well as in vivo. These tests, performed in cell cultures and juvenile animals, generally fall into seven categories: safety pharmacology, acute toxicity, sub-acute or sub-chronic toxicity, chronic toxicity, carcinogenicity, developmental toxicity and reproductive toxicity. The information from these tests is then used to estimate an initial safe starting dose and dose range for Phase I human trials, and to identify parameters for clinical monitoring of potential adverse effects. As soon as this information is collected, clinical safety studies (i.e. tests on human beings) can be carried out. When the safety and efficacy have been determined, products are ready to be distributed to the market. 3 These requirements have established animal tests as a paradigm, and as a result it is hard to make a move toward alternative testing methods. Yet, in the second half of the 20th century, and especially so far in the 21st century, changes in public opinion have been witnessed due to growing concerns about the use of laboratory animals. Since 1 January 2013, a new EU directive, 2010/63/EU, aiming to harmonise the European internal market, has stated that "wherever possible, a scientifically satisfactory method or testing strategy not entailing the use of live animals shall be used", and a growing number of scientists now question the translatability of the results of animal tests to humans and clinical practice. [4][5][6] Reasons for the lack of translatability relate to both internal and external validity, 5,7 with species differences being one important factorsomething which cannot be overcome by improving the study design. 5 The translatability of the adverse effects of pharmaceuticals is particularly complicated, since compounds that cause serious adverse effects in preclinical testing do not progress to human trials for safety reasons. This can also imply that substances that would be effective in humans are precluded from further developmentin the case of paracetamol, if it had been tested in dogs, it would probably have never been developed further, as it is toxic in that species. 8 Published concordance studies indicate that animal to human translation is problematic when it comes to predicting the adverse effects of drugs [9][10][11] ; furthermore, studies also show that in vitro and in silico tests can outperform animal models in predicting certain adverse effects (e.g. 9,10,12,13 ). To highlight a specific examplethe risk of drug-induced cardiac arrhythmia in humans was predicted with 89% accuracy in an in silico model developed on simulations of human heart cells, whereas earlier-conducted animal studies had revealed a 75% accuracy.
Significant advances in the development and implementation of such technologies based on the use of human tissues and cells have enabled a better understanding and treatment of human disease. These new approach methodologies (NAMs) include relatively simple human cell cultures used in regulated tests for skin sensitisation and hormonal effects, as well as panels of human cells to assess the potential risk of adverse drug reactions caused by candidate drugs. [14][15][16] Multiple nonanimal testing strategies incorporating in vitro, in chemico and in silico inputs have demonstrated equivalent or superior performance to the murine local lymph node assay when compared to both animal and human data for skin sensitisation. 17 More complex cultures of human cells and tissues on microchipsmicrophysiological systems or organ-on-a-chip (OOC) methodsrecapitulate the physiological conditions encountered in vivo and, along with in silico studies, permit modelling and/or interpretation of the data from these in vitro methods. 18 In silico models enable investigations into the relationships between chemical structure, biological activity and toxicity, and are used to support predictions of human drug efficacy and toxicity. 19,20 The advent of routine stem cell culture has also opened up an opportunity to create a limitless supply of specific, homogeneous, human cells in order to create 3-D 'organoids' for organ-on-a-chip systems, where these miniaturised organs can be maintained for long-term studies by using continuous media circulation through microfluidic channels. 21 In combination with the availability of these potential alternatives, a growing number of governments, and even the EU, are gaining the political will to renounce animal tests as the unquestioned gold standard to guarantee the safety and efficacy of medication, and thus might eventually be willing to endorse the phasing out of animal studies. [22][23][24] Despite this political will, producers of medicines still struggle to convince regulatory authorities of the safety and efficacy of their products, if they have not carried out animal studies. According to Mary-Jeanne Schiffelers, 25 this is because animal experiments implicitly remain the gold standard. This can be seen, for example, in the preclinical safety evaluations of biopharmaceuticals for regulatory purposes. As shown by Kooijman et al., 26 these safety evaluations were based on classical procedures, despite these procedures being developed for small molecule therapeutics (SMTs), which differ significantly from biopharmaceuticals, and despite the availability of state-of-theart knowledge. This behaviour by regulators and industry was motivated by risk aversion, and has led to many experiments on animals that have no scientific value. 27 Taking animal experiments implicitly as the gold standard limits the alternatives that can be developed and validated. Moreover, scientists and regulators often do not seek entirely new procedures or human-based models, but look for alternative testing methods that have been validated in direct comparison with the performance of animal tests. This strategy has adverse consequences for the successful acceptance of the alternative methods, due to the low translatability of many animal experiments to humans, and their poor reproducibility. Thus, a direct comparison to human data would be the desired way to go. However, adhering to these animal study standards is considered more straightforward for companies, as large bodies of data using established methods have accumulated over time. This is often the reason why a certain species is chosen for an experimentnot because it is the most appropriate model, but because there are more data available for that species. 28 The use of unknown, new methods brings with it the possibility of failure. There will also be a respective lack of historical data from new methods. Furthermore, there may be a financial liability for the company, should a drug cause major adverse effects. Usually, the pharmaceutical industry is absolved of responsibility by following the regulated testing regimes. Therefore, this insecurity associated with the use of 'other' methods, and the perceived safety net that regulated animal testing still provides, hampers progress toward animal-free science. 21,25,29 The current safety regulations requiring animal testing did not follow on from rational deliberation among scientists, but from an often emotional interaction between scientists, politicians, the media and the public. The role of human health tragedies in the development of these regulations is already stated in previous literature, but generally as a 'positive' only in relation to the way in which they subsequently influenced public health policies. 30 To understand the current struggle to move away from animal testing, we need to revisit these historical events and study them through the lens of 'path dependency', to see how they continue to affect current regulations and animal testing practices. Path dependency refers to "the routine whereby the set of solutions is limited by knowledge and experiences gained in the past, even though past circumstances may no longer be relevant". 26 As such, it can be a strong limiting factor in the transition away from animal testing. 26,29,31 In this article we therefore take a historical approach and critically examine existing literature to look at what two health disasters have meant specifically for animal testing: the sulfanilamide elixir tragedy in 1937 in the USA, and the thalidomide tragedy in the Netherlands. Both these disasters, in addition to the devastating human consequences, had a major impact on safety regulations and led to more animal testing. To understand how such safety regulations can be a barrier to the transition toward animal testing-free science, we look at a third case study: the lawsuit of biopharmaceutical company Vanda against the FDA. In the conclusion, we reflect on the ideas and attitudes underlying current regulations, and how these would need to change so that regulations are no longer a barrier in the transition toward animal-free testing.

Sulfanilamide elixir and the 1938 Act
In 1937, sulfanilamide elixir was distributed across the USA and was subsequently distributed worldwide as a cure against streptococcal infections. However, in the USA, serious problems associated with its use became evident. An American pharmaceutical manufacturer had mixed the sulfanilamide with diethylene glycol (DEG), in order to make the medicine sweeter and thus more palatable. The mixture of both substances appeared to be poisonous, causing kidney failure. For over 100 people, including approximately 30 children, this resulted in death. 3,30,32 This disaster shocked the nation, especially since the drug was manufactured by a trusted producer. 32 The time was ripe for the USA to finally pass the Food, Drugs, and Cosmetics Act in 1938, which had been under discussion since the beginning of the 1900s. This Act was radically different from previous (very minimal) regulations, and it forms the basis of current US safety regulations. 32 The Act made it a requirement that safety and efficacy data, including data from animal tests, be provided to the FDA before medicines were allowed to be distributed. 3 Internationally, it was one of the first laws that allowed the state to structurally control the production and distribution of medicines. 33 This sulfanilamide elixir event barely bore any fruits outside the borders of the USA. In the Netherlands, pharmacists were aware of and shocked by the events, but this was not reflected in the regulatory system. The local authorities believed a disaster like this could not happen in a small country like the Netherlands. 34 They were proven wrong a few decades later.

Thalidomide in the Netherlands
Another pivotal event happened two decades laterthe thalidomide disaster. Thalidomide was an agent which was used at a large scale between 1956 and 1961. It was known by different names depending on the producer. The biggest producers were the Swiss company Ciba and the West-German company Grünenthal. In the Netherlands it was known as 'Softenon'. Thalidomide was marketed as a panacea, with the power to cure every ailment. Nausea and morning sickness faded away when taking thalidomide and, consequently, many pregnant women took the medicine. 35 Simultaneously, a strange disease was spreading throughout large parts of the world. A high number of children were born with phocomelia, a condition in which the limbs are deformed. These children, with either shortened or absent limbs, shocked the world. Initially, it was unclear what caused the birth defects. Several explanations were put forward, including that it was due to prenatal exposure to impure drinking water, as well as suggestions that the deformities were the result of Soviet chemical warfare, as western Germany in particular witnessed a high number of newborn children with phocomelia. 36 It was not until 1961 that two scientists (McBride in Australia and Lentz in Germany) concluded, independently of one another, that thalidomide had caused these birth defects. 37 It took so long to uncover the cause because pharmacological scientists did not yet know much about 'teratogenicity'. An agent is teratogenic when it affects the embryo or fetus, either by disturbing the pregnancy or causing birth defects. Thalidomide appeared to be safe when it was tested in mice, rats, guinea-pigs and rabbits. However, the pharmaceutical industry did not test its effects on fetuses in utero or in the offspring. 35,38 It has been argued that, prior to the thalidomide tragedy, the general belief was that the placenta served as a barrier to protect the fetus from exposure to any agent. 3 However, this statement should be slightly tempered because, by that time, it was already known that alcohol consumption by a pregnant woman could negatively affect her fetus, and teratologists were also aware of the fact that chemicals could cause birth defects. However, it is proposed that, despite this knowledge, ineffective communication between teratologists and medical staff on these observations meant that many medical practitioners would still have believed that the placenta could act as a barrier to the fetus. 39 It is often claimed that the Food, Drugs, and Cosmetics Act of 1938 prevented a thalidomide tragedy in the USA, in contrast to what occurred in many other countries. 32,40 The approval of thalidomide was, however, not held back because the required safety testing showed harmful teratogenic effects, since teratogenicity testing was not performed in the USA either. Rather, additional data were required due to a concern about peripheral neuropathy and the effects of biologicals on pregnant women. 3,30 Only after the thalidomide tragedies in other countries did testing for teratogenic effects in animals become mandatory in the USA. So, it was time rather than stricter legislation that prevented a similar disaster in the USA.
The product was sold in 46 European countries, and it has been estimated that between 8000 and 10,000 children were born with phocomelia. Several hundreds of those lived in the Netherlands. Although a Drugs Provisions Act was proposed in the Netherlands in 1952 and accepted in 1958, it had not yet been enacted when the effects of thalidomide became visible. The enactment process was slow due to conflicting interests and lack of a sense of urgency. 41 The sense of urgency increased after two smaller incidents with medicinal syrup and margarine in 1959 and 1960, respectively, which caused a stir among Dutch citizens and the media. 41 In the first case, a mistake was made by a Rotterdam pharmacy, when a barbiturateinstead of sodium citratewas mixed into 4 litres of medicinal syrup, leading to serious clinical symptoms in 26 children. In the second case, a new margarine was brought onto the market in the Netherlands that was claimed to have better baking properties because of a new emulsifier, but this new component led to 'margarine disease' ('Planta exanthema'), characterised by skin vesicles. Notably, this new emulsifier had been tested in animal studies at the National Institute for Public Health and the Environment, and no particularities were found. 41 When these affairs were followed by the thalidomide tragedy, public trust in drug safety and state supervision was lost completely. The Dutch media compared the tragedy to the flood disaster of 1953, asking for the 'dykes to be raised'. Due to this public pressure, politicians quickly implemented the Act of 1958, and established a new authority for the registration and marketing authorisation of drugs, the Medicines Evaluation Board (CBG). 41,42 The new marketing authorisation procedure required extensive animal testing. Similar regulations were established in other countries. Every product must now be tested for teratogenicity before entering the market. It is also a requirement that these reproductivity tests are carried out not only in rodents (a mouse or rat), but also in a non-rodent species. 35 These cases of disasters and tightening controls show that the creation of regulatory policies takes into consideration more than objective scientific arguments. The call to implement a tighter, more strictly regulated policy existed for a long time, but it needed an external eventa health crisisto convince the stakeholders involved in medicine development to embrace a rigid testing system. After being confronted with dangerous effects, society was made aware of the risks of modern medicine. This atmosphere of insecurity put pressure on regulatory authorities to argue for the efficacy of a tightly regulated system under their direct supervision.
Stricter regulations and more animal testing have, however, not been able to prevent new incidents in which drugs cause adverse effects in humans. Multiple FDAapproved drugs have been taken off the market since the thalidomide tragedy. 30 In the Netherlands, the CBG was the first national authority to completely ban Halcion (triazolam) in 1980 for its severe side effects (which included anxiety, paranoia and amnesia) that were not discovered during the marketing authorisation process. This was referred to as 'Dutch hysteria' in the international media and the adverse effects of the drug, especially when used longterm and/or in higher doses, were not acknowledged elsewhere until the 1990s. 41 In the case of the TGN1412 monoclonal antibody, adverse effects in humans became evident in the human clinical trials before the registration of the drug, despite the drug making it through preclinical trials in animals. Six human volunteers faced life threatening conditions due to multiple organ failure during these human clinical trials. 43 So while the stricter regulations and extensive animal testing are usually presented as measures that will prevent further disasters, issues of translatability between humans and other animal species mean that there is always risk in bringing potentially harmful new drugs onto the market. Even in the case of thalidomide, animal testing in 10 strains of rats, 11 breeds of rabbits, two breeds of dogs, three strains of hamsters, eight species of primates, and various cats, armadillos, guinea-pigs, swine and ferrets had failed to reveal significant teratogenicity. 44 In addition, regulations based on the idea of animal testing as the gold standard, can lead to the use of animals in experiments that have no added value for human safety, as we will see in the next case.

A current case study: Vanda
A recent case study that exemplifies some of the aforementioned mechanisms is the court case of the biopharmaceutical company Vanda against the FDA. Vanda was developing Tradipitant, a potential treatment for several human conditions, including gastroparesis, which is a medical disorder of weak muscular contractions of the stomach, and atopic dermatitis, a condition that makes the skin red and itchy. The former afflicts approximately 600,000 US citizens and the latter 17,800,000. 45 In order to distribute the medicine on the market, Vanda had to perform a set of tests, both preclinical and clinical, that were specified by the FDA. When companies do not want to perform one of those tests, they have the right to use alternatives, as long as they can prove the safety and efficacy of the medicine with the alternative test. The FDA must evaluate those alternatives, provide an evaluation report, and make a "case-specific, science-based determination as to whether it agrees". 46 Vanda had taken all the required steps but one. The company conducted several animal experiments, such as a 3-month rat study, a 6-month rat study and a 3-month dog study, with doses up to 300 times the human equivalent dose. None of these tests flagged up significant safety signals, so Vanda went ahead with testing the substance on 15 humans, in an 8-week study. 46 When Vanda wanted to test the substance on humans beyond eight weeks, the FDA asserted that the company would have to perform a chronic 9-month non-rodent study first, as is required before performing human tests that last longer than 90 days. The study, which is usually performed on beagles, tests the chronic toxicity of a substance. At the end of the study, the animals are euthanised in order to analyse their tissues.
Vanda refused to test on beagles for three reasons. Firstly, the company believed that their other experiments had already given sufficient proof of Tradipitant's safety and efficacy. Secondly, Vanda argued that the regulations would lead to unnecessary animal suffering. Thirdly, Vanda had proof of a "large body of published scientific evidence which concludes that 9-month dog studies rarely, if ever, identify toxicities that were not already identified in 3 month studies, and do not yield new information that is important for the purpose of understanding how the drug will impact humans". 46 Because of this body of literature, the ICH Guidance supported the view that a study in canines was unlikely to come up with additional toxicities. 47 The FDA was, however, not convinced, and issued a partial clinical hold and requested Vanda to stop the development of Tradipitant.
Instead of accepting this clinical hold, Vanda decided to start a court case. In February 2019, Vanda published an open letter, repeating their arguments listed above, and emphasising the outdatedness of the FDA's policy. Its rules for toxicity studies had, after all, not changed since 1997. 48 The letter also asked other companies to stand with Vanda in lobbying the FDA to abolish its "one-size-fits-all approach to animal research, including nine-month, non-rodent toxicity studies, which results in the unnecessary sacrifice of too many dogs and other animals". 46 Vanda, as a pharmaceutical company, acted upon a broader call for a further examination and discussion of the scientific evidence behind the regulatory requirements. This call is based on: concerns surrounding the low rate of translational success of animal studies to humans; the fact that approved drugs may cause side effects in humans not previously seen in animals; robust examples that show the more favourable translation of human-relevant/animal-free methods; and the dangerously low return on investment for new drugs within the pharmaceutical industry. 5 The FDA claimed that there were safety signals flagged up in the nonrodent data at 3 months, and they wanted to see whether these would occur at longer dosing periods. Vanda claimed that there were no human safety signals in their 8-week data. This would suggest a lack of correlation between the animal and human data, which also led to the initiation of the court case.
A year later, in February 2020, the court sided with the FDA, declaring that Vanda "has not shown that FDA's interpretation of the study is 'unreasonable,' and so its argument fails". 49 This story, which has not yet ended, shows how present day pharmaceutical companies experience the tick-box approach when developing medicines in order to fulfil regulatory requirements which, in turn, acts as a barrier to creating potentially safer medicines. It forces them to use and kill animals during experiments, the results of which are not always scientifically robust, and thereby slows down the speed of introducing new medicines to the market. Moreover, it exemplifies the regulatory embeddedness of animal experiments. Animal experiments still underpin the requirements of regulatory agencies, and according to this court ruling, "the statutory and regulatory scheme here [in the US] explicitly contemplates that the results of animal studies are predictive of the results of human trials", despite increasing evidence to the contrary. 4,5,10,50 If a stakeholder wants to move beyond the established practices, it can still lead to distrustand, in the case of Vanda, to a clinical hold and a lost court case. It is also suggested that, even though there can be good scientific evidence for the use of alternative procedures, in order to obtain acceptance from the FDA for their use, it is essential to engage with the regulators from the start of the testing process. 51 This may not have happened in the Vanda case. However, as a pharmaceutical company, Vanda acted upon the low translational success of animal experiments and did appear to have scientific evidence for their claims.
Despite the bureaucratic rigidities in the current regulatory system, significant initiatives are underway in the FDA and the European Medicines Agency (EMA) to stimulate the use and assessment of NAMs (e.g. the FDA webinar series for alternative methods, EMA IRIS platform and EMA Innovation Task Force (ITF)). In addition, working groups within regulatory agencies are trying to work with NAM developers to help push forward the validation of these methods for regulatory use. [52][53][54] Also, collaboration between these agencies is ongoing, including discussions around the use of real world evidence to inform regulatory decisions. 55 It is good to be aware, however, that NAMs are not viewed in the same way by everyone, and supporting NAMs does not automatically mean supporting a paradigm shift away from animal testing and toward human-based testing. For example, the Three Rs is still the guiding paradigm for the EMAand so, for them, supporting NAMs includes incorporating refinement strategies in animal testing, which can hardly be understood as moving away from animal testing as the gold standard. 56 Even though many alternative methods have so far been validated, it can then take a considerable length of time before a decision is made to phase out the respective animal testin the case of the pyrogen test, this process took 25 years (personal communication, Thomas Hartung, 2021). This illustrates that a range of scientific disciplines will need to be consulted and integrated, in order to understand how progress can best be made and to ensure that new scientific evidence really does become accepted and implemented in (regulatory) practice.

Conclusions
In the first two historical case studies discussed, we saw how human health disasters that shocked the public were needed for a paradigm shift in drug safety regulation to occur. Part of the new paradigm, in that post-tragedy era, was the belief that animal tests represented the 'gold standard' of safety testing, and that in the future such disasters could be prevented, as long as enough animal testing was performed. Combined with risk averse attitudes, this created a situation of path dependency, in which moving away from animal testing is limited not by scientific possibilities, but by historical contingencies, beliefs and previous experiences. However, we have seen time and again that limited translatability of experimental results between humans and other animals means that there is always a risk involved in introducing new drugs to the market, no matter what amount of animal experimentation is performed.
The underlying idea that animal testing is the gold standard thus gives a false sense of safety. This not only leads to human health risks, but also to the use and killing of many animals without a sound scientific basis, as was highlighted in the Vanda case study. Therefore, the development and implementation of NAMs in safety testing is expected to be of great benefit to both humans and animals. The increased attention paid to NAMs by scientists and regulatory bodies is a hopeful development. We have seen, however, that for a radical change in regulations to occur, more is needed than solely scientific arguments. In the cases explored, it is clear that a sense of urgency and public pressure were crucial factors in changing underlying beliefs.
Greater public and political awareness of the risks to public health that result from a continued reliance on animal testing as the 'gold standard' can contribute to a shift in paradigm toward increased reliance on human-based safety testing. Pressure from the public and media will likely also be crucial in creating the political will to change regulations, as we saw in the two historical case studies. In addition, public and media pressure can lead to more funds being allocated to the development of NAMs, as seems to be happening now in the EU after a clear majority vote in the European Parliament toward shaping a roadmap for the phasing out of animal studies. 2 For scientists and regulatory bodies to contribute effectively to the transition away from animal testing, it is important that they continually reflect upon, and highlight how, an (often implicit) belief in animal testing as the gold standard continues to affect research, education, regulations, funding practices, etc. This should include a critical reflection on the Three Rs paradigm, as well as the critical perspectives of scholars from the humanities and social sciences, all of which are needed to break free from the confines of the current paradigm.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was financed by the Radboud Institute for Culture and History and the Radboud University Medical Center.