Hero image for "The Preprint Flood Has a Quality Problem. AI Safety Research Is Drowning in It."

The Preprint Flood Has a Quality Problem. AI Safety Research Is Drowning in It.


There's a particular irony in AI-generated hallucinations contaminating the research literature on AI safety. The field dedicated to making AI systems more reliable is partly built on a foundation of papers that haven't been peer reviewed, some of which may contain fabricated citations, unedited prompt outputs, and nonsensical diagrams — courtesy of the very technology being studied.

This isn't a hypothetical concern. It's now a documented pattern severe enough to prompt institutional responses.

The Infrastructure Is Breaking Under the Weight

The arXiv preprint server — the primary distribution channel for AI research — has been struggling visibly with the volume and quality of AI-generated submissions. The generative AI era has been, in the words of one Science blog post, "hard on these sites," prompting a steady escalation of new submission rules.

The latest escalation is significant. Thomas Dietterich, an emeritus professor at Oregon State University who serves on arXiv's editorial advisory council and moderation team, announced via social media that submitting inappropriate AI-generated content now results in a one-year submission ban — and a permanent requirement that all future submissions undergo journal peer review before arXiv will host them. All listed authors on a manuscript bear responsibility, regardless of who actually generated the content.

For fields where arXiv posting is considered a normal, expected part of the publication process — and AI safety research is exactly such a field — those are severe consequences. The policy exists because the problem is real: fake citations, unedited prompt responses, and incoherent figures have been slipping through, and until now the consequences were murky at best.

Fabricated Citations Are the Specific Failure Mode Worth Watching

The citation problem deserves its own attention. A recent Lancet study reported by STAT News found a steep rise in "fabricated" citations — references to papers that don't exist — spreading through the academic literature. The study frames these as AI hallucinations, and that framing is probably mostly right, but it obscures something important: a hallucinated citation in a peer-reviewed paper is a failure of the human authors and reviewers, not just the AI tool.

This matters especially for AI safety discourse, where the citation network is already unusually concentrated. A handful of preprints — many from a small number of labs and researchers — get cited repeatedly, building an appearance of consensus that may not reflect the actual state of evidence. Add fabricated citations into that network and the epistemic situation gets genuinely difficult to navigate. You can't easily tell which claims rest on real empirical work and which rest on papers that don't exist, or papers that exist but don't say what the citation implies.

The Methodology Question Nobody Is Asking

Here's what the current conversation about preprint quality mostly misses: the problem isn't only fabricated citations or AI slop. It's that preprints in AI safety often make strong empirical claims based on methodology that hasn't survived any external scrutiny.

Peer review is imperfect — I've written about its slowness and inconsistency in previous issues. But it does one thing that preprint posting doesn't: it forces authors to defend their methods to someone with relevant expertise who has no stake in the outcome. When a preprint claiming that a particular alignment technique reduces deceptive behavior by some percentage circulates on social media and gets cited in policy documents, nobody has asked the basic questions. What was the sample? What were the controls? How was "deceptive behavior" operationalized and measured? What are the confidence intervals?

The arXiv ban policy addresses the most egregious quality failures. It doesn't address the subtler problem of methodologically weak work that looks legitimate, gets amplified before review, and shapes discourse in ways that are hard to walk back once the paper is actually scrutinized.

The Meta-Science Moment: Speed vs. Scrutiny

The preprint model was built on a reasonable premise: share work early, get feedback, improve before formal publication. In physics and mathematics, where the community is relatively small and technically homogeneous, this works reasonably well. In AI safety, where the stakes are high, the community is large and ideologically varied, and the findings get picked up by journalists and policymakers almost immediately, the feedback loop is broken. The "share early" part happens. The "improve before it shapes discourse" part often doesn't.

arXiv's new enforcement posture is a recognition that the honor system has failed. Watch for whether other preprint servers follow with similar policies, and whether AI safety researchers — many of whom have strong incentives to publish quickly and loudly — push back or comply.


Bottom Line: The same AI systems that AI safety researchers are trying to make more reliable are now actively degrading the quality of the research literature those researchers depend on. That's not just ironic. It's a methodological problem that the field hasn't seriously reckoned with yet.