As something of an outsider to the full ways of academia, I must admit that I do not have a lot of experience with the academic publishing/peer-reviewed literature side of things.
But even so, I know when something is messed up, and clearly what have become known as “Frankencitations”—references in academic papers to nonexistent literature, conjured through LLM use—are messed up. They make a mockery of the entire enterprise and strike me as something of a crisis.
Ben Williamson of the University of Edinburgh has been tracking the phenomenon because of a very personal experience with Frankencitations. I asked him some questions about what’s going on and what he thinks we should do about it.
JW: Let’s begin with an introduction so people understand your bona fides.
BW: I’m a senior lecturer at the Centre for Research in Digital Education at the University of Edinburgh, and the editor of the journal Learning, Media and Technology. I research digital tech and data in the education sector, with recent and ongoing work focused on ed-tech investors, the rise of data-intensive biological sciences of learning and education policy work related to AI.
Q: What is a “Frankencitation” and why should we be worried about them?
A: Over the last two or three years, many of us working in universities have started noticing these strange references to academic journal articles or books that don’t exist. The references are fabricated by generative artificial intelligence when an author does something like instruct a large language model to help write a paper or add in references to support a literature review. I think we all know by now that LLMs routinely make up material. That’s what’s happening here. LLMs are making up academic references because of academic workers using them to produce research outputs like papers, chapters or books.
I’ve heard lots of different names for these: phantom references, zombie citations, ghost references or Frankencitations. What these terms all get at is that AI-generated fake references are not real, like a living human body, but they do have a kind of half-life. “Frankencitations” works as a name for this because they’re like the monster—stitched together out of data in a language model, then animated when someone uses a chatbot like ChatGPT to do their work for them. And then these fabricated citations can cause all sorts of trouble when they break out into the world by being included in academic manuscripts.
Q: This sounds very bad to me. A literal rampage of fake stuff infecting academic research and citations.
A: The big issue we have now is that these Frankencitations are ending up in manuscripts that are sent for peer review. That’s putting a big strain on editors and reviewers, as we’re now having to police manuscripts for fake stuff in the references lists. Some of these Frankencitations are even ending up in actual scholarly publications. They’re simply not being spotted at any stage in the academic publishing pipeline—not by editors, by reviewers or even during revision, proofing and final publication production.
Using AI to produce academic publications with these fake insertions is a form of knowledge pollution, like letting toxins seep out into a river to change the whole ecosystem that it nourishes. It’s also interfering with scholarly integrity and breaking the citation chains that signify an author’s relationship to their field. It poses the real danger that individuals could be credited with ideas they never had or have claims attributed to them that they never made. That’s why the idea that these fake references have a kind of half-life seems right to me: They’re shambling around, making their imprint on the academic world and causing huge problems for journal editors, reviewers and readers, despite lacking the substance of an actual publication.
Q: You have an interesting story about how you first encountered the phenomenon. I want you to tell us that story, but I’d also like to know what you were thinking/feeling as it was unfolding. What was the intellectual/emotional journey of learning you’d been so extensively Frankencited?
A: I’m a journal editor, and that involves checking new manuscript submissions to decide if they’re suitable for peer review. Recently, my co-editors and I began seeing references in new manuscripts to papers where one or more of us was credited with authorship, but we knew right away those papers must not be real because we did not write them!
Our assumption here is that some people submitting to our journal are asking LLMs to add in some references to papers by the editors, hoping this makes their manuscripts more likely to get past the initial desk review. The problem is, they’re dropping in these absolute bloopers that we’re spotting right away. So with those papers, the editors become not only the first readers but also their very last readers, as we just have to reject them straight off. This is simply academic misconduct by scholarly publishing standards, since authors absolutely must take responsibility for the accuracy of references in their own manuscripts.
Recently, though, I decided to do a web search for one of these fake references to me, purely out of interest to see if it appeared elsewhere. It had the title “Education governance and datafication.” That’s quite bland, but I have published on the subjects of “education,” “governance” and “datafication,” so those are plausible keywords to associate with my name. But when I web searched for “Education governance and datafication,” I quickly realized that this particular Frankencitation had a very busy half-life. As far as I was able to make out, this text, “Education governance and datafication,” had been cited 70 or so times. You can look it up yourself—just go to Google Scholar, search “Education governance and datafication,” and you’ll find pages and pages of publications that cite me for something I never wrote.
What’s even weirder to me is that among all of these, the subtitle of “Education governance and datafication” changes constantly. It’s cited as appearing in a range of different journals. Sometimes it’s a whole book. Most of the time it has a named co-author—somebody I have collaborated with in the past—but sometimes not. And this nonpaper is still shambling along now—it’s still picking up fresh citations.
Q: Your scholarly reputation has exploded!
A: A nonexistent publication has fast become one of my most cited. It’s a bit miserable, because I have actually done years of relevant work, but dozens of authors would rather cite something I never worked on.
What’s really concerning about this is not knowing what I’m being credited for. Most of these papers are in my field. Some of them are with really poor-quality predatory publishers and can be safely ignored. But not all. There are some cases where reputable academics publishing in high-ranked journals are citing it. What are they saying that I wrote or claimed? I also heard from a colleague at another institution that my Frankencitation appeared in a student assignment. So the production and reproduction of these false references is also putting students at risk.
It is really a ridiculous situation. I’m more recognized for something I did not write than the actual papers I did write on the topic of education, data and governance!
Q: Even as I think I understand the phenomenon at a basic level, I’m not sure how and why these things proliferate so quickly. You did some work to see how many times you’d been Frankencited, and then a while later someone else followed up and the number of Frankencitations had grown significantly. What’s the mechanism? How big is the phenomenon?
A: Yes, after I posted online a bit about my discovery, a computer science scholar called Dirk H. R. Spennemann completely independently looked into it. He used it as an example to examine how LLMs produce fabricated references—he just put out a preprint on arXiv that goes into the technical details. So he did a much more forensic hunt through Google Scholar than I was able to. What he then found was the nonexistent paper had actually been cited nearer to 140 times. And he wanted to understand at a technical level how LLMs generate these references. What he concluded is that there are distinct patterns in how genAI makes these up. Here’s what he wrote in the paper:
“The hallucinated academic references created by ChatGPT are not random errors but predictable, pattern-driven artifacts of how genAI models generate text. These references are systematic reconstructions built from real authors, journals, and topical keywords. Because genAI models rely on pattern recognition rather than factual verification, they produce citations that are structurally correct and contextually plausible, yet nonexistent. These hallucinations can be repeatedly generated, leading to duplication and consistency across different texts, which can increase their perceived legitimacy.”
OK, so “Education governance and datafication” is a combination of a real author (me), real journals (some of which I have published in) and topical keywords (as I said, I work in education and study data and governance). Because it also exists on Google Scholar, which is considered to be an authoritative index of the world’s scholarship, a language model that’s retrieving from the web—rather than just producing text from its training data—ends up just confirming the existence and legitimacy of the paper, despite its nonexistence. Spennemann also ran a little experiment where he prompted ChatGPT to generate essays on the topic of education governance and data. It cited the nonexistent paper, of course.
As the library systems expert Aaron Tay has explained it, what we have is a dynamic set of processes involving both Google Scholar and generative AI. Google Scholar has established “Education governance and datafication” as a citation that has been referenced by 140 other publications, and LLMs are amplifying its existence by running retrieval-augmented generation (RAG) processes that treat Scholar as an authoritative source of citational truth. Google Scholar treats it as real and chatbots reproduce it as such as soon as anyone writes a prompt for an essay or paper on the same topic.
One thing that’s been hard to fully figure out is whether “Education governance and datafication” first appeared as a fully AI-generated reference, or whether it originated from a human mistake, which genAI has just amplified massively ever since.
Q: I trust it’s obvious to folks why proliferating fake citations about a nonexistent source is simply bad and wrong, but what’s beyond that? What are we looking at if we allow these things to continue to proliferate?
A: Obviously we’ve had problems of miscitation and faked research in the past. This current use of generative AI is now industrializing long-standing issues in academia and putting us all under a lot more strain when it comes to academic publishing. This for me is a matter of trust. How can you trust the academic record if it’s polluted with synthetic material that no longer accurately refers to past research? I mean, all research is supposed to be a process of building on past knowledge to generate new insights and to make original contributions to understanding. That’s not the case with Frankencitations. More mundanely, why should I trust an author who has submitted something to our journal that breaches academic integrity?
It’s also a matter of academic labor. One person trying to save some time or be more productive with AI has downstream effects on those stewarding academic manuscripts through review and publication. This is starting to cause intolerable pain for journal editors and reviewers. We’re already having to work much more to keep pace with massively inflated submission rates, and now many people are saying we should be checking every single reference in a paper, too—a pretty much impossible task.
Then the really big-picture problem is what some are now calling “scholarslop.” Frankencitations may be a pretty good tell of an AI-generated paper. We’re talking AI slop in the academy. It’s not only references being faked with AI but whole papers and books. The problem is that much of this is hard to detect. Some researchers think we should be working more with AI anyway and suggest it’s only natural that, over time, more and more of the scholarly record will have been augmented with AI assistance—whether from AI use in data production and analysis or as a co-author and knowledge-production partner. That may be so for some, on specific projects, for specific purposes. But what we’re dealing with here is the massive, machinic production of academic material that may or may not correspond with anything actually in the world. If a paper doesn’t even correspond with the prior literature in its own field, how are we supposed to trust it tells us anything about its own subjects and objects of analysis?
Q: Short term, is there anything we (readers, researchers, institutions, publications, etc. …) can do to stop this proliferation?
A: One reason we’ve been quite vocal about this at our journal is just to try to persuade academic authors that it’s a bad idea to demean themselves as scholars and insult us as editors and reviewers by submitting manuscripts containing fake material. I mean, if you cite us and we didn’t write it, we know what you did and that says something about your academic standards. But of course these fake references are not always so easy to spot, and addressing it is going to require much more systemic efforts. Publishers are already trying out technological solutions that are supposed to screen for AI-generated references. But this is apparently much more technically difficult than you would imagine. We already know that AI writing detectors generate a lot of false positives, which is why they’re untrustworthy for use either in academic journals or for screening student assignments. AI citation detectors could just amplify the same problems, leading to waves of false accusations, article rejections and academic acrimony.
I think we need to be clear that academic publishers and tech firms absolutely must stand up and help us out here. After all, the publishers are all in multimillion-dollar partnerships with the big AI companies to rent out our existing papers as training data for new models. They’re part of the problem of AI-manufactured academic content, and they need to be working with editors, authors, libraries and institutions to solve it.
Here in the U.K., a lot of academic libraries are already canceling contracts with publishers due to spiraling costs. What if our libraries and the associations that represent them went to the publishers and the big tech firms they’re partnered with and demanded action? There’s doubtless much more strength in such alliances than a few editors screaming into their social media channels (as I have been lately).
Q: What about long term? What are the structural/system changes that have to happen in order to prevent us from being inundated by this kind of AI slop?
A: We need some kind of set of sectorwide consensual agreements when it comes to academic publishing. Violations of academic integrity did not originate with generative AI, of course. Predatory publishers have been around a while. Paper mills have been able to produce manuscripts on demand for years, too. Some science is bad science and should never be published.
AI is now amplifying all of those existing problems. It’s not ameliorating them. And in the current political context in many countries, science is already under siege. It’s not hard to imagine AI-generated “science” publications that serve more overtly political ends, especially as certain academic journals have already been taken over and turned against liberal academics. AI is also potentially going to accelerate journal article production even further, worsening the deluge of manuscripts that journals are already being flooded by. This is not the AI utopia of academic publishing transformation we were promised when the publishers all got in bed with OpenAI, Google and Microsoft.
So what kinds of rules and norms do we need to reinforce to protect from all of this? Do we need sectorwide agreements about how to enforce violations of academic integrity? In academia and publishing, we have all sorts of sets of principles and standards and rules. So why not for AI, too?
I don’t have full answers or solutions to these problems. Like many other editors, reviewers, authors and university librarians, I’ve been really struggling to maintain a sense of academic hope as AI has simply been let loose across all our knowledge systems. Sure, the system was creaking badly already, but it seems clear to me that the uncontrolled AI experiment of the last few years has been a disaster for academic knowledge production and publishing.
The editors of the journal Organization Science just did a detailed study of this, concluding academic use of AI has led to more but not better research and caused a crisis in peer review. We’re going to need sectorwide efforts of institutions, publishers and even the tech companies themselves to sort it out. The risk otherwise is the public further loses trust in the sector as an authoritative source of important knowledge. Or, to spin it more optimistically, maybe we can use this moment to work out what kind of academic publishing system would really work best for the future.
