two white rabbits on weeds Reproducibility and Relevance: Vital Components of Chemical Testing

Reproducibility and Relevance: Vital Components of Chemical Testing

We want to live in a world where the chemicals that improve our quality of life are safe to use. We want to ensure the protection of our families, our most vulnerable communities, and the environment. We cannot do this under the current regulatory system, however, because predicting the toxicity of chemicals relies heavily on decades-old animal tests.

The good news is that there have been huge advancements in the methods we can use to test chemicals and predict their potential toxicity to us and our environment. Now more than ever, we are able to harness reliable and relevant tools—that do not use animals—to better understand chemical effects more quickly. The use of these tools will allow us to keep our world safer than ever before.


Reproducibility is defined as the extent to which consistent results are obtained when an experiment is repeated.

It is essential that the methods we use to test chemicals are high-quality and produce consistent results. If you cannot repeat a result, the science is not sound.

Test methods developed in the 21st century have undergone rigorous examination to determine if they are reproducible before they are used or accepted by regulatory agencies. This is not the case, however, for animal tests that were developed many decades ago. In recent years, analyses have been conducted to assess the reproducibility of animal tests, and the results are alarming.

Examples of animal tests demonstrating lack of reproducibility

Carcinogenicity tests are conducted to predict whether a chemical will cause cancer in humans. In the animal test, rats and mice are exposed to a chemical every day for 18 to 24 months. They are then killed, and their bodies are dissected to look for cancerous tumors. An analysis of 121 repeated cancer tests in rats and mice showed that results were reproducible only 57% of the time.1

Eye irritation tests are conducted to predict whether a chemical will irritate the human eye. Developed in 1944, the animal test uses rabbits and has many demonstrated flaws.2 An analysis of 491 chemicals assessed at least twice in the rabbit eye test showed that studies predicting mild or moderate irritation were reproducible, respectively, only 33% or 16% of the time! In addition, there is a 10% chance that chemicals initially classified as severe irritants, when tested again, will be re-classified as non-irritants.3

Skin sensitization tests are conducted to predict the potential of a chemical, through repeated exposure, to cause an allergic human skin reaction. In the animal test, mice are exposed to chemicals for two or three days before they are killed. A study analyzing almost 90 chemicals, for which more than one test in mice had been conducted, showed that the results are reproducible only 62% of the time (under the most widely used classification system).4

Acute oral toxicity tests are conducted to predict what will happen to humans following short-term exposure to a chemical. In the test on rats, the premise is to determine the amount of an ingested chemical that causes 50% of the rats to die. A study analyzing almost 2,500 chemicals showed that tests using rats to assess acute oral toxicity are reproducible only 60% of the time.5

Skin irritation tests are conducted to predict whether a chemical will irritate human skin. The animal test is conducted using rabbits and was developed in 1944. An analysis of almost 1,000 chemicals assessed at least twice in the rabbit skin test showed that studies predicting mild or moderate irritation were reproducible less than 50% of the time.6

Endocrine disruption tests are conducted to predict whether a chemical affects the hormonal, or endocrine, system of humans. Many of the current tests rely on the use of large numbers of animals, and some have been assessed for their reproducibility. An analysis of 235 chemicals in the uterotrophic test, which assesses the estrogenic system in female rats, showed that the test was reproducible only 74% of the time.7 Similarly, a study of 25 chemicals in the Hershberger test, which assesses hormonal effects in male rats, showed that the study was reproducible only 72% of the time.8

What advantages do non-animal tests provide over animal tests?

In contrast to animal tests, non-animal tests are usually reproducible 80% to more than 90% of the time. For example, two human cell–based tests developed to assess eye irritation (EpiOcularTM and SkinEthic) are reproducible 93% and 95% of the time, while another eye irritation test (LabCyte) is reproducible 87% of the time. Methods that assess important aspects of skin allergy potential, named kDPRA and ADRA, have been shown to be reproducible 88% and 100% of the time!

  • Non-animal tests can be based on human cells to better reflect human biology and provide information about how a chemical causes toxicity in humans.
  • Non-animal tests can be faster than animal tests and, therefore, more data on more chemicals can be produced more rapidly.
  • The use of non-animal methods allows for the study of mixtures, not only of single chemicals. Studying mixtures provides a more reflective picture of how humans are exposed to chemicals in real life and is not feasible to undertake with lengthy animal tests.
  • Adverse outcomes in humans can depend on genetic background, physiology, pre-existing disease conditions, lifestyle, life-stage, and co-exposure—none of which is explored when relying on tests on animals. As non-animal methods are further developed, patient-tailored testing could allow for more personalized health studies.


  1. Gottmann E, Kramer S, Pfahringer B, Helma C. Data quality in predictive toxicology: reproducibility of rodent carcinogenicity experiments. Environ Health Perspect. 2001;109(5):509-514.
  2. Clippinger AJ, Raabe HA, Allen DG, et al. Human-relevant approaches to assess eye corrosion/irritation potential of agrochemical formulations. Cutan Ocul Toxicol. 2021;40(2):145-167.
  3. Luechtefeld T, Maertens A, Russo DP, Rovida C, Zhu H, Hartung T. Analysis of draize eye irritation testing and its prediction by mining publicly available 2008-2014 REACH data. ALTEX. 2016;33(2):123-134.
  4. Dumont C, Barroso J, Matys I, Worth A, Casati S. Analysis of the Local Lymph Node Assay (LLNA) variability for assessing the prediction of skin sensitisation potential and potency of chemicals with non-animal approaches. Toxicol Vitr. 2016;34:220-228.
  5. Karmaus AL, Mansouri K, To KT, et al. Evaluation of variability across rat acute oral systemic toxicity studies. Toxicol Sci. 2022;188(1):34-47.
  6. Rooney JP, Choksi NY, Ceger P, et al. Analysis of variability in the rabbit skin irritation assay. Regul Toxicol Pharmacol. 2021;122:104920.
  7. Kleinstreuer NC, Ceger PC, Allen DG, et al. A curated database of rodent uterotrophic bioactivity. Environ Health Perspect. 2016;124(5):556-562.
  8. Browne P, Kleinstreuer NC, Ceger P, et al. Development of a curated Hershberger database. Reprod Toxicol. 2018;81:259-271.

Source link

Scroll to Top