> The penalty is a 1-year ban from arXiv followed by the requirement that subsequent arXiv submissions must first be accepted at a reputable peer-reviewed venue.
This is incredibly good for science. arXiv is free, but it's a privilege not a right!
The frontier LLMs are getting pretty good at checking this sort of thing. You could prompt them to not only verify the references are real but that they actually state what the article claims. Some human review will still be needed but I'll bet this approach could find a lot of academic fraud.
Your approach is good for catching stuff that human reviewers might miss not as a first line default-only unit. The whole reason this is happening is because humans are not doing their job. Your solution (humans not doing their job) is just increasing the scope of the problem.
> The frontier LLMs are getting pretty good at checking this sort of thing.
No, this is career ending high stakes. it requires old school "actually check a record of reality" type methods, like a database query or http get to one of the many services that hold this info.
I think they're saying that frontier LLMs may be usable to spot citations that are correct by shape (a real citation) but incorrect by usage (unrelated to the text)
I kind of hate the idea, but you probably could do a lazy LLM check of every paper and every citation and have it flag possible wrong (second sense) citations for human review
But you'd need a LOT of tokens and a LOT of human-hours
> have it flag possible wrong (second sense) citations for human review
And then what, we're done? How have we avoided the need for the same exhaustive human review? It only saves human review time if you trust the LLM not to miss things.
ArXiv doesn't even check the submission closely, so how can they know?
They say "errors, mistakes"
They use an automated system to check if the basic requirements were met, and sometimes papers are flagged for further superficial human review, but there is no way they can possibly do this at scale or check every reference. This would be like trying to do peer review, but for a preprint archive that gets easily 100x more volume than any journal.
Second, there is such a huuuuge gap between publishing on arvix and peer review. I can attest personally that it's not even close. I've gotten probably dozen rejections from peer review and no problems publishing in arxiv math. This is because peer review checks not just for if something is new or correct, but also if it's of "interest to math community," which is inherently subjective, but also makes peer review many magnitudes harder than publishing on arxiv.
Even when a well-known professor in number theory praised the paper when I got an endorsement and a second emailed me and and encouraged me to publish it, it still got rejected 3 times and still waiting.
Being required to publish in a peer reviewed journal will close off arxiv for many researchers for good. It also defeats the point of it being a pre-print.
This puts the burden to make sure it's right on the submitter, where it should be. Verification can come at any time after that; the submitter understands the consequences of hallucinated references. Verification can be crowd-sourced (and likely will be).
Nothing stops someone from putting a PDF on the internet. I'm fine with ArXiv holding a high standard.
> ArXiv doesn't even check the submission closely, so how can they know?
They can be informed by people who read the papers and check the citations. A zero-tolerance policy provides an incentive to report sloppy papers (namely, that you can be confident something will be done about it), and each time a paper is removed or an author is banned, it incrementally increases the value of the arXiv as a whole.
> Being required to publish in a peer reviewed journal will close off arxiv for many researchers for good.
At the end of the day, demanding that people carefully proofread their LLM-generated papers before sharing them on the arXiv seems like a relative low bar to clear, and I sort of question whether it's reasonable to call individuals who find it too onerous "researchers" in the first place.
It's more than that. if there are mistakes, then you can also be flagged.
read the whole tweet:
If generative AI tools generate inappropriate language, plagiarized content, biased content, errors, mistakes, incorrect references, or misleading content, and that output is included in scientific works, it is the responsibility of the author(s).
If you'd read the whole series of tweets it's obvious that is not their intention and there needs to be "incontrovertible evidence that the authors did not check the results of LLM generation" for the penalty to apply.
It's not hard to divine their intentions: you are entirely responsible for what you summit and if it's clearly slop(py) you get a ban. In a reply they state that they are seeking to apply this rule fairly and accurately and are mindful of unintended effects.
You don't need to be actively enforcing a rule 100% on everyone. Speed cameras don't cover every stretch of road either.
It's enough for them to place this policy and enforce it when they become aware of violations. Someone reading the slopped paper (or, here, trying to follow a reference) will notice sooner or later.
> Being required to publish in a peer reviewed journal will close off arxiv for many researchers for good. It also defeats the point of it being a pre-print.
You sound like it's impossible for researchers to write papers without slopped references, and inevitable to get hit by this policy.
Even acts that would be criminal in the US occur less in China due to properly enforced fines. Nobody does things assuming they will get caught unless there is a high likelihood of getting caught.
Research and practice has shown that the strongest deterrent is certainty.
I disagree. It's just one darn hallucinated citation for heaven's sake, not fraud or something. It doesn't account for the substance or quality of their work at all. A one-year ban seems plenty sufficient for a minor first time mistake like this. People make mistakes and a good fraction of them can learn from those mistakes. There's no need to permanently cripple someone's ability to progress their life or contribute to humanity just because an AI hallucinated a reference one time in their life. That's punitive instead of rehabilitative.
> It's just one darn hallucinated citation for heaven's sake, not fraud or something.
It is fraud.
> It doesn't account for the substance or quality of their work at all.
References are part of the work. If you're making up the references, what else are you making up?
> People make mistakes and a good fraction of them can learn from those mistakes. There's no need to permanently cripple someone's ability to progress their life or contribute to humanity just because an AI hallucinated a reference one time in their life.
A one year ban is not permanent. Having a negative consequence for making poor decisions seems like an inducement to learn from the mistake?
In an ideal world, one would be keeping notes on references used while doing the research that lead to writing the paper. Choosing not to do that is one poor decision.
Having a positive outlook, if asking an AI to provide references that may have been missed, one should at least verify the references exist and are relevant. Choosing not to do that is also a poor decision, even if one did take notes on references used while researching.
> In an ideal world, one would be keeping notes on references used
In a far less than ideal world authors are referencing papers they've at least read the title and abstract of. In an ideal world, authors would be only referencing works they have read in their entirety. I don't think we need to live in the ideal world[0], but let's also not pretend the ideal world is even remotely out of reach. Let's also be honest that in the current setting a lot of citations are being used to encourage a work be accepted more than they are being used because of their utility to the paper. The average ML paper now is 8 pages and has >50 citations. That's crazy
[0] References can be entire textbooks, which is potentially too high of a bar
Even as a human, you can still fuck up references.
I submitted a paper with a reference author as Elisio because I couldn’t read my own handwriting. After submitting, I double checked all the references through an LLM. It pointed out that their name was actually Enrique. Yes, you should probably double check your references before submitting, not after.
Point is, I didn’t even trust the LLM at first. But after verifying the mistake, I was embarrassed af. I resubmitted with the fixes before it went live, but ultimately, what’s the difference between “mistake” and “hallucination”?
I assume they won’t ban anyone automatically without a way to object.
Using your example, i wouldn’t assume they would enforce the ban if you object and explain your typo and if the corrected citation actually says what you cited.
Mistakes like these are explainable a completely hallucinated citation is usually not.
While fraud does require intention to deceive, I get the sentiment that hallucinated citations shouldn't be dismissed as simply carelessness. It should be something stronger than that: gross negligence or something MUCH stronger! There should absolutely be repercussions for this.
But let's not call it fraud. That word is reserved for something specific.
EDIT: someone else said "reckless disregard" equals intent or something to that effect. So I looked it up.
It appears so that is the case. "Reckless Disregard Equals Intent" in legal language.
But I am not sure if this particular clause should apply here. Perhaps it depends on what kind of research is being published? For e.g., if it is related to medical science and has a real consequence on people's health, we can then apply this?
I do believe this policy is appropriate to deal with the reckless disregard of posting hallucinated references.
It's a conscious decision to not take the time to check your AI output, and instead waste a whole bunch of other people's time letting them essentially do that for you in duplicate.
Feels like that should disqualify you from participation for a bit. Intent or no intent.
> Feels like that should disqualify you from participation for a bit. Intent or no intent.
Exactly! For a bit!
Yet this is not for a bit! This is a lifetime disqualification, and that's been my entire grip the whole time! Is nobody reading this?
"The penalty is a 1-year ban from arXiv followed by the requirement that subsequent arXiv submissions must first be accepted at a reputable peer-reviewed venue."
> Then how long are they disqualifying them from submitting prior to acceptance, if not lifetime?
Well, "lifetime ban" means "you are not allowed back in". Their ban specifically allows you back in (after a specified period) subject to fulfilling a single constraint.
It's conditional acceptance back in, which is not the same as a lifetime ban which is unconditional.
I think (though might well be misunderstanding) that reckless disregard is taken to be an intentional choice but that it does not imply that the outcome itself was intentional. The difference between intentionally doing something that you know for a fact has a high risk of failure but you can't necessarily predict the outcome versus intentionally seeking a particular legally disallowed outcome.
But what LPisGood was saying is that reckless disregard (as opposed to explicit intent) is sufficient to meet the legal bar for fraud.
The intent to deceive is there. The deception is lying when you submit it that it is a scholarly piece of work in which amongst many other things you know the citations are accurate. This false representation was knowingly and intentionally made at the time of submission.
The citation being incorrect is merely the proof of deception not the (relevant) deception itself.
Fraud is the correct description provided (and this is practically a guarantee) you intended to benefit from the submission of the paper (e.g. by bolstering your resume).
If I violate the letter of the ToS when clicking submit you can correctly argue that I have technically committed fraud! Yet that is almost never what anyone actually means when having discussions like this one.
Fraud in a scientific context generally refers to fabricated research results. At least personally I agree with GP that hallucinated citations are generally something akin to laziness thus not fraud but rather some sort of professional negligence.
Fraud in the scientific world has generally taken the form of fabricated results, but I don't agree that the word has transitioned away from the common and legal meaning of deception in order to get a benefit.
Even if it had though, I'd be perfectly comfortable calling this fraud in this discussion based on the common meaning of the word. Just because we're talking about a scientific context does not mean we need to use the scientific-jargon versions of words - we're not in a scientific context ourselves.
---
And I'd disagree that this is just about the "letter of the ToS". While that is perhaps a necessary component in order to prove the deception, this is really about the cultural expectations of the community that merely happened to have been encoded in the ToS. The fraud would still occur without the ToS, it would merely be next to impossible to show you didn't simply misunderstand the cultural norms and what your actions would lead others to believe.
I disagree with your implicit assertion regarding the common meaning of the term in this context. I believe that the term fraud as commonly used when discussing things in a scientific context has always (for at least my entire life) been taken to refer to knowingly and intentionally falsified research results (also falsified appointments, falsified affiliations, falsified authorships, etc).
> deception in order to get a benefit.
The point being that reckless or negligent conduct is not commonly taken to constitute deception. There's a reason we have different terms for these things.
Sure, you can say "well he exhibited reckless disregard for his professional duties when he opted not to bother reading the citation section that the LLM shat out, and reckless disregard is sufficient to meet the legal bar for fraud, and also the ToS specifically says that you certify that you validated all references manually so bam! two counts of fraud legally speaking" and you wouldn't be wrong but the distinction between "legally fraud" and "fraud as is commonly meant when talking about scientific papers" is essential to effective conversation in this particular instance.
> Just because we're talking about a scientific context does not mean we need to use the scientific-jargon versions of words
The context is essential because (obviously) it affects how people interpret the meaning of your words. A fraudulent submission to a scientific journal has a specific and well understood meaning in common usage.
If you still disagree with me imagine polling a bunch of tenured career researchers about what they would think if they read the statement "X caught submitting a fraudulent paper to journal Y". I can just about guarantee you that none of them are imagining hallucinated citations.
We're not "discussing things in a scientific context" here. We're in the context of a startup/programmer news aggregator discussing scientific news. We are not "a bunch of tenured career researchers" discussing amongst ourselves so the jargon appropriate for that context is not the appropriate jargon - rather we need to use the jargon that the startup/programmer news aggregator crowd would understand.
That said even in a scientific context I still disagree and your example at the end is a fine starting point. By comparison imagine one of the profs said told the others that their house was burgled. The others would probably be thinking that things like TVs or computers or money was stolen, and not that the thief simply stole all their spoons. That doesn't make having all your spoons stolen not burglary. Likewise the profs expect that the results or authorships are where the fraud occurred because those are the best places to extract value with fraud, not by avoiding the simple act of writing the paper with correct citations. That doesn't mean fraudulently using an LLM to hallucinate a paper from your (we'll suppose for sake of argument) actual results is any less fraudulent though, it's just an unexpected form of fraud.
Edit: I want to be clear that this is not my argument: "well he exhibited reckless disregard for his professional duties when he opted not to bother reading the citation section". I see other people making that argument, and I'm not sure if they're right or wrong that that's another reason why it is fraud, but I'm certain that we don't even need to reach that question.
My argument is that it is fraud to represent the paper as a scholarly work when you don't know that it is correct. It is not that you are taking a risk it might be wrong, it is that you are actively representing that you know it is correct and if you do not know that you are committing fraud even if it happens to be so. This is a case of intentional deception, the deception being the representation that this is scholarly work, not reckless disregard for the truth as to the accuracy of the citations.
There are actually a surprising (IMO) number of career researchers on this site. Regardless, disregarding the context specific meaning will at absolute best result in a disjointed conversation where people are talking past each other. Worse, in this instance people are debating how arxiv (and other venues) ought to handle these sorts of things at which point you are well and truly into the territory where you need to get the field specific terminology right.
I concede that I was sloppy when I referred to what the researchers would be imagining. I should have phrased it as asking them if they thought that transgression X constituted fraud.
Regardless, hopefully you can see the idea that I was attempting to communicate? The burglary example isn't equivalent because while the spoons are unexpected the end result is still an event that most people would agree constituted burglary and resulted in noticeable harm to the victim.
I'm struggling to adjust your example on the fly but perhaps if it were the contents of the yard waste bin that had been pilfered? That's still technically burglary but I think most people would view it quite differently and might question the wisdom of prosecuting it.
I think the key difference here comes down to motivations as well as impact. Falsifying results (for example) is an active attempt to counterfeit the core value proposition of the endeavor and the end result of that is proportional - personal benefit directly as a result of the falsification and significant damages to anyone sufficiently bamboozled by the fiction long enough to base any decisions on it. Whereas no one using an LLM to generate just the bibliography is doing that to get ahead (at least not on its own) and any damages are limited to the reader wasting a few minutes trying to figure out the extent of the issue and who to contact about it.
> In an ideal world, one would be keeping notes on references used while doing the research that lead to writing the paper. Choosing not to do that is one poor decision.
"Any author citing another paper should be required to provide proof that they a) possess a copy of that paper, b) have read that paper, c) have read the paper carefully."
No, it is emphatically not.
Fraud requires intent to deceive.
> A one year ban is not permanent.
...what text are you reading? Nobody was calling the one-year ban permanent, or even against it. I was literally in favor of it in my comment. I explicitly said it is already plenty sufficient. What I said is there's no need to go beyond that. My entire gripe was that they very much are going beyond that with a permanent penalty. Did you completely miss where they said "...followed by the requirement that subsequent arXiv submissions must first be accepted at a reputable peer-reviewed venue"?
No. One single hallucinated citation on a document with you as an author is not evidence of your reckless disregard for anything. These exaggerations are crazy and you would absolutely deny such accusations if you missed your co-author's AI hallucinating a citation on your manuscript too. At best it would be careless, if you really relish extrapolating from one data point and smearing people's character based on that. Not reckless. It's quite literally the difference between going five miles per hour over the speed limit versus fifty.
If your co-author inserted the fradulent reference, I agree that you may not have committed fraud. But your co-author did, and you didn't check their work. and knowing that you didn't check their work, you signed off on it.
You didn't pick your co-author very well, but arXiv lacks investigative powers to determine which co-author did the bad, so they all get the consequence.
Do you think every co-author on a 100-author paper checks every citation? It's like saying that every member of a large software team personally reviews every line of code. It's just completely divorced from reality.
I’ve disagreed with some of your other stances in this thread, but I want to acknowledge the validity of your take here.
You’re right that a single hallucinated line is not evidence of reckless disregard - because that could have happened on a final follow-up pass after you had performed due diligence. It’s happened to me. I know how challenging it can be to keep bad patterns out of LLM generated output, because human communication is full of bad patterns. It’s a constant battle, and sometimes I suspect that my hard-line posture actually encourages the LLM to regularly “vibe check” me! E.g. “Are you sure you’re really the guy you’re trying to be? Because if you are you wouldn’t miss this.” LLMs are devious, and that’s why I respect them so much. If you think they’re pumping the breaks then you should check again, because they probably just put the pedal to the metal.
That being said, I regularly insist on doing certain things myself. If I were publishing a paper intended to be taken seriously - citations would be one of the things I checked manually. But I can easily see myself doing a final follow-up pass after everything looks perfect, and missing a last minute change. I would hope that I would catch that, but when you’re approaching the finish line - that’s when you expect your team to come together. That’s when everything is “supposed to” fall into place. It’s the last place you would expect to be sabotaged, and in hindsight, probably the best place to be a saboteur.
You're saying it as if the poor author just had no choice but to let LLM write their bibliography. To avoid hallucinations, maybe just don't let an LLM write any part of your paper?
You can only get in this situation if you let a bullshit generator write your paper, and the fraud is that you are generating bullshit and calling it a paper. No buts. It's impossible to trigger this accidentally, or without reckless disregard for the truth.
Not as much of a lack of seriousness as excusing away hallucinations as not that big of a deal in what's supposed to be a researched, scholarly body of work written by humans.
Not really - much of work consists of what David Graeber described as “bullshit jobs”. Now AI and its backers are proposing to automate all that bullshit.
> You’re right that a single hallucinated line is not evidence of reckless disregard
It absolutely is.
> - because that could have happened on a final follow-up pass after you had performed due diligence.
A "final follow-up pass" that lets the LLM make whatever changes it deems appropriate completely negates all the due diligence you did before, unless you very carefully review the diffs. And a new or substantially changed citation should stand out in that diff so much that there's no possible excuse to missing it.
> It’s happened to me.
Then you were guilty of reckless disregard.
> I know how challenging it can be to keep bad patterns out of LLM generated output
If your research paper contains any LLM generated output you did not manually vet, you are a hack and should not get published.
Allowing hallucinated content or citations into your work is an act of carelessness and disregard for the time of people that are going to read your paper and it should be policed as such.
And flatly, if a person can't be bothered to check their damn work before uploading it, why should anyone else invest their time in reading it seriously?
They are still purposely writing a paper, whether that is with the help of an LLM or not. They are instructing the LLM to do the task of finding citations. It's no difference to googling for a paper that explains a specific point. You would still double check Google's output.
arXiV is not intended to be your blog. You should be held to a zero-mistake standard when publishing academic work.
The people I worry for are the junior researchers who are going to be splash damage for dishonest PIs. The PIs, though, deserve everything that’s coming for them.
Maybe I'm misunderstanding you, but zero-mistake seems harsh. I would say that AI references are a sign of something that is not simply a mistake.
However, we can have zero tolerance for certain techniques for "writing" a paper. Plagiarism and inventing data are already examples of this, if there is evidence for these techniques being used there is no excuse. We could say the same for AI references - any writing process that could produce these is by definition not a technique we want.
So the mistake isn't not checking a reference the AI gave. The mistake is letting the AI make references for you.
If we agree that academic research is important then I think we can impose certain standards on how you do it. We can dissalow certain tools if that means we can't trust the output. Just like an electrician can't use certain techniques, even if they're easy, because we don't trust the final result.
> No, it is emphatically not. D Fraud requires intent to deceive.
I'm about as pro AI-as-a-research--and-writing-assistant and anti AI-witchhunt as they come, but I simply cannot parse what I've quoted here.
Posting slop to arxiv is blatant deception. Posting an article is an attestation that the article is a genuine engagement with the literature. If you're posting things to arxiv that are not sincere engagements with the literature, you are attempting to deceive.
No, it emphatically is not just a year! It's perpetual, and that's literally been my entire point this whole time. If it was just one year I would've had no complaints - and I made that clear from the very first comment!
What part of "...followed by the requirement that subsequent arXiv submissions must first be accepted at a reputable peer-reviewed venue..." is everyone here reading and still somehow interpreting to be limited to 1 year?
You are equating cutting corners (ie laziness) with intentional deception and not being genuine. That doesn't seem accurate to me. In most contexts I think cutting corners would be taken to be some form of negligence or recklessness.
Regardless of terminology, I agree that it's certainly punishable and certainly a serious problem.
> followed by the requirement that subsequent arXiv submissions must first be accepted at a reputable peer-reviewed venue"?
This part seemed reasonable too. I'm not in academia, but my understanding is most people writing papers intend for them to be accepted by reputable peer-reviewed venues, but post to arXiv because those venues don't always allow for simple distribution.
If your papers aren't going to be accepted at reputable venues and you posted slop to arXiv before (and they noticed it!), seems reasonable that they only want reputable stuff from you in the future?
it's very silly, but not a big deal. Arxiv is becoming irrelevant these days anyways.
In fact would be better if they just banned AI, so we could just get off the luddite platforms.
Automated research is the future, end of story. And really it couldn't have come out at a better time, given the increasingly diminishing returns on human powered research.
A "mistake" would be a typo in a real citation. A hallucinated citation is evidence of just plain laziness and negligence, which taints the entire submission.
No it is not. Seriously. All you need for this to happen is for your lab partner to ask AI to add a missing citation that they are already familiar with at the last minute before a midnight submission deadline, and for the AI to hallucinate something else, and for them to honestly miss this. It does not even imply any involvement on your part, let alone that either of you were lazy or negligent on the actual research or substance of the paper. The lack of any sympathy or imagination here is astounding.
There are no deadlines for journal submissions. Even if you felt you were running close to your revisions being due, an email to an editor will probably fix this for you. And what you described is still negligent, not verifying the garbage output bot did not in fact output garbage.
Kinda. There is a deadline to appear in the next days posting and this can be important if e.g. you want to get something on the arxiv before a talk or a proposal deadline.
Your constructed hypothetical makes it even worse. If there are 2+ people in this scenario who have good intentions, this should especially never happen. When you sign your name on a paper, you are nonetheless vouching for everything written in it, including the things you didn't personally write. You should absolutely be checking every single reference your co-author included and verifying that it says what your co-author claims it says. This is something you should have been doing completely independent of LLMs existing. This is something you're publishing publicly, something that may be associated with you and your career for the rest of your life, it is insanely negligent to not even read and verify what your co-author is adding.
In other words: all it needs for your paper to have fraud is for your lab partner to add fraud to your paper.
I'm not seeing the problem here. The only problem is that your lab partner should be banned and not you. But being incentivised to check your co-author's work before submission isn't a bad thing.
You’re confusing the issue here by saying it’s not your fault, it’s your lab partner’s. We’re talking about why your lab partner did something wrong. You can assign blame for the wrong thing separately.
The citation is part of the substance of the paper. If you YOLOed in a citation without checking it, seems justified to suspect that you may have YOLOed in some data, or some analysis, or maybe even the conclusion.
Being suspicious would be reasonable and I think a penalty could perhaps be appropriate but the person you're replying to is objecting that the stated hypothetical would not rise to the bar of negligence.
Do bear in mind the degree of the described scenario. There's quite a difference between having an LLM shit out your entire citation section (and possibly the rest of the paper as well) versus asking the tool to make a targeted edit and overlooking a small piece of nonsense that results.
This is like saying lawyers should be allowed to submit AI-hallucinated case references or quotations in court documents. Because by your logic, that, too, should be perfectly acceptable. Yet is not, for hopefully obvious reasons. Why exactly should scientific research be any different? If your paper contains hallucinated references, we can't verify your assertions in the paper, and therefore must question the paper as a whole.
If you cannot be bothered to check your references when writing academic quality papers then you have no place writing them in the first place. The punishment is not chopping off a finger, it is a polite reminder to do the bare minimum.
What's the difference between a "hallucinated" citation and consciously inserting reference to a non-existent paper and hopping it goes unnoticed? How do we determine which one was done consciously and which was "a minor first time mistake"?
Your standards are lower than what they would accept at my high-school. Seriously.
And generally, if you are generating papers with LLMs, let other LLMs read them. Why would we waste human hours considering something that was generated? At this point publish your prompt because that's the actual work you're doing.
> It's not the kind of mistake that is possible unless you're engaging in fraud anyway.
Seriously? You can't fathom an honest researcher asking for AI to find a citation they know exists, and the AI inserting or modifying a citation incorrectly without them realizing?
If you find evidence of fraud by all means lay down the hammer. Using a single hallucinated citation like it's some kind of ironclad proxy just because you think they must be committing fraud is insane.
if you're not checking citations in the paper youre publishing AND trusting a non SOTA, hallucination prone ai model to come up with sources for it, its probably for the best of everyone that this paper isn't published.
yes there will be rare exceptions but in general i feel like this is a really good addition.
if an llm does the work, you did not write it or research it, the llm did. you have no business crediting yourself as an author.
if someone writes a paper and an entirely different person takes credit for it without even bothering to check if the actual writer just made shit up, they deserve a lifetime ban. seems like a year is a very light punishment.
>Seriously? You can't fathom an honest researcher asking for AI to find a citation they know exists
Assumptions:
1. The entire document is loaded into an AI editor
2. The researcher is asking an AI editor to work on his references
3. The researcher has not checked his own references.
This could be avoided at 1, 2 or 3. But even just 1 implies that the researcher knows that they have a hot potato and might critically fuck up and lose all credibility. Being in that scenario and committing to 2 and 3 is at least extreme negligence.
If you are citing a work you paste a citation to that work. If you are bullshitting you ask an AI to come up with a citation. Jesus, there is zero reason to ever "generate a citation" if you are not, in fact, commiting fraud.
That's like saying that there's zero reason to ever ask an LLM to do basic math for you. Sure you probably shouldn't do that but sometimes it's convenient and so people will inevitably do exactly that regardless of the somewhat frequent wrong answers that are guaranteed to ensue.
How specific are the citations? If it's “Sentence 4 on page 97 supports” or “Paper says ‘___’” then I imagine it would be fairly easy. If it's “(__ page long) paper supports x”, then very difficult?
Verifying that the reference you cite actually exists is the absolute minimum standard for academic work. It is not optional, not something to skip because of a deadline, and not something to outsource blindly to hallucination-prone AI.
If someone cannot meet that bar, they have no business publishing research papers. I have written academic papers myself, and I find it astonishing that people are trying to justify this as if it were some understandable workflow mistake. At that point it is simply slop with academic formatting. Post it on a blog or somewhere else, but do not put it into the scientific record.
A one-year ban is not a lifetime ban. Maybe six months would also have been enough, but the author can use that time to think about whether they should verify references next time — and to manually check every other citation.
> Seriously? You can't fathom an honest researcher asking for AI to find a citation they know exists, and the AI inserting or modifying a citation incorrectly without them realizing?
Indeed I cannot. If you do that, you are not, in fact, an honest researcher. You're a lazy hack.
I would not necessarily go as far as calling it fraud, but if you cannot even verify that the reference you are citing actually exists, you are not ready to publish research papers.
Deadlines are not an excuse here. Checking whether a cited book, paper, or passage exists is the absolute minimum standard for scientific work, not an optional extra. I have written academic papers myself, and I find it astonishing that people are trying to justify this as if it were some understandable workflow mistake. At that point it is simply slop with academic formatting.
A one-year ban is not a lifetime ban. Maybe six months would also have been enough, but the point is that the author gets time to think about whether they should verify references next time. They can also use that time to manually check every other citation.
A citation is where you derived knowledge... If you haven't checked it and you are submitting something that should represent a ton of labour (and which will consume labour to review), you don't understand what you're doing. It is not just crossing T's and dotting I'd.
Your being set behind is less important than the fact that your publishing is setting everyone else behind.
Such a banned person is being helped to "step out of the way", and someone more competent will assuredly step forward to consume the limited maintenance labour more thoughtfully
> Your being set behind is less important than the fact that your publishing is setting everyone else behind
One hallucinated citation does not in any way imply anyone is being left behind. All it means nobody is checked that particular line of the manuscript after it was written. The rest of the paper could still be solid and treated accordingly. If you find evidence of the contrary, of course treat it accordingly, but this is so obviously not that.
> One hallucinated citation does not in any way imply anyone is being left behind.
The parent said “setting” others behind, which refers to lost time.
Being “left” behind implies a degraded trajectory, which is defined not by time lost, but by the final destination.
Different but related things (e.g. lost time can indeed affect your final destination, for instance, after growing old correcting a scourge of hallucinated citations - which should have been table stakes all along).
That was literally just a typo, I was walking and messed up while typing. Pretend I wrote "set behind." It makes no difference to my point and I fully stand behind the comment with that correction.
If all you're genuinely worried about is the collective human time spent on tracing down one stupid hallucinated citation in a paper, may I remind you of the ludicrous amounts of time and effort readers waste trying to wade through the sea fluff, jargon, and complexity frequently added to papers in a completely deliberate fashion. If wasting even a little bit of readers' time is what you see as the crime here, you have orders of magnitude bigger fish to fry.
The fact is that, for one hallucinated citation to be the noteworthy bit that "sets others behind" in any meaningful way, the actual substance of your paper has to be utterly worthless (or worse); otherwise, you're contributing far more than you're taking away, and thus your paper is very much not setting others behind. OTOH, if your paper really is worthless or harmful enough for this part of it to be a big deal, that would be the basis for punishment, not this. A single hallucinated citation is simply not a bleep on that metaphorical radar.
You clearly misunderstand. You cite a work in your paper because you have read that work, and build upon it or want to refer to it to back up a specific claim. Generating references is fraud period, because you are implying that you have read a work when in fact you just asked an AI "please insert some reference-shaped text here" to make it look like a proper paper. It is sadly not a necessary, but certainly a VERY sufficient, reason to conclude a paper is fraudulent.
> There's no need to permanently cripple someone's ability to progress their life or contribute to humanity
I don't think you need to publish on arXive to contribute meaningfully to humanity.
> That's punitive instead of rehabilitative.
Unfortunately science is competitive. Yours is a race to the bottom where the people who can afford the most expensive models and who are least concerned with the truth can publish the most papers and benefit financially and professionally by doing so. This is not a zero sum arena, grant money and opportunities will possibly be rewarded to them, and not to another team who is producing more careful and genuine output.
Seeing the usual LLM hypers angry replying to this on twitter is such a tell. Just like the comments on the LLM poisoning articles, some people just can't accept that some people don't like LLMs and get upset when you put any amount of hindrance to their rapid acceptance.
It's hard for me to even understand their perspective. Researching references for a published academic paper isn't some incidental busywork task, it's supposed to be a core part of doing research which is the core of the job. If you don't have sympathy for someone who, say, paid a person on Fiverr to cook up a paper rather than writing it themselves and then didn't even bother to check the references, why is using an LLM and not checking any better?
There is a lot of "throw it against the wall, and if it sticks, write it up" empirical work against benchmarks. It leads to post-hoc rationalization of the work and browser plugins using LLMs to find references for work that is already written.
It is a bureaucratic view about "you need a citation for this", where people misunderstand the citation as a checkbox, instead of "you need to substantiate this claim, as I, the reviewer, do not accept this as a fact".
It's also hilarious that they complain about this because, from what I've seen, most LLM hypers will talk about something being irrelevant or taken over by AI with no understanding of what that something really is or involves.
It's not even that they "don't like LLMs". They just don't like academic fraud! If references were fabricated with a Markov chain it would be just as bad!
> Our Code of Conduct states that by signing your name as an author of a paper, each author takes full responsibility for all its contents, irrespective of how the contents were generated (Dieterrich, T. G.)
To be a coauthor on a preprint that you have not submitted, you have to actively "claim" it (using a password given to the author who submitted). It's on you to double-check before claiming.
While this is certainly a welcome step, I hope there is more work done to fix the underlying problem of easily creating correct BibTeX entries for the cited papers. Citations for any given paper can come from a wide range of journals with various publishers, conferences, and preprints. The same paper can be available from multiple sources with varying details, e.g. arXiv and the conference website. Tools like Zotero have certainly made it significantly easier to extract citations from webpages of publication, but I still find issues with the extracted BibTeX details. While author names and titles are often extracted correctly, I still have to manually ensure that details like publication venue, year, volume number, page number, URL, etc. are extracted correctly and also shown correctly in LaTeX format. Different publications can use different citation styles. This can unfortunately lead to taking shortcuts with AI-generated citation data due to the lack of an easy and unified approach to extract consistent citation data. I am not sure whether hallucinated citations are being generated in the main manuscript or in a separate BibTeX file, so I may be a bit off in my understanding.
Well, yeah, 99% of arXiv papers were not written for me or you. They were written for someone who works in a niche within a niche. That's (in my view) the beauty of research.
Agreed. There was already too much human generated slop in academia.
And I’m not talking about good faith research that didn’t pan out, I mean research that is completely useless for any other purpose other than convincing a casual observer that the authors are doing research.
Next, for AI papers, a reproducibility requirement. So much code and details are fudged and paper's cannot be reproduced. Ran the training with some other config, or other data, etc. to make their mechanism or intervention seem better.
I just wish to anyone who is against this policy to be forced to review a paper that turns out to be unedited AI slop. Reviewers are experts volunteers who do it for free. It is incredibly frustrating to have spent 4 hours reading a paper where you try your best to make sense of what the authors are trying to prove just to realize that it is hallucinations.
The authors should value the time of the reviewers higher than their own time. So, if you include AI nonsense in your paper, it is insulting.
how will they detect hallucinated refs at scale? Manual spot checks? Automated DOI verification? The policy seems right but enforcement is the hard part.
Enforcement is secondary and is allowed to take weeks / months / never at all if nobody reads the paper. It's about being able to ban if an issue arrises; not about keeping the database strictly clean.
However difficult it might be right now it's only going to get easier. Anyway I don't think proactive enforcement is the point. Rather now they have an official method by which to address incidents that are brought to their attention.
There needs be to a careful vetting before such adverse actions. If somebody includes a name and pushed it without express permission, does everyone get the ban? I agree that implemented the right way, this is good.
To be a coauthor on a preprint that you have not submitted, you have to actively "claim" it (using a password given to the author who submitted). It's on you to double-check before claiming.
I surely hope that only "confirmed" coauthors will get the ban, it's only logical.
This has become such a problem in scholarly publishing that we have a business that provides citation checking https://groundedai.company/ that we've been buidling for a couple of years now
It's not unexpected, but still sad to see so many comments opposing even the smallest step against low-effort fraud in academic publications. Is this what hacker culture has been reduced to in the age of the slop era? Open hostility against science and engineering?
Good; academic literature is in crisis because of all of the slop. Forcing some consequences on easily-detectable hallucinations can only be a good thing
That, and mixing reference details from multiple sources and messing it up.
Let's say you read a paper on Arxiv but cite the version that was submitted to a journal or conference, without realizing that the authors made changes to the version they submitted and forgot to upload them to Arxiv.
In physics, references which just didn't exist. That could be that the author made it up, but often it's because they transcribed the reference from another paper without reading it - we know because a few people have deliberately introduced fake references to trace how far they would go. The reasons are not the same as for AI, but the problem they produce is the same.
References which don't accurately reflect the quoted material seem more common in other subjects.
Which is why the angry replies on Twitter from AI hype accounts is so funny. You should get penalised for fake references and profanity in your submissions, even if you wrote your slop longhand. I don't know why anyone would have an issue with this policy.
Had a colleague submit a paper with literal AI slop left in the text, got hit with a nasty revision request. Check your drafts before you submit, people. The reviewers will find it.
Also check your LaTeX comments, Arxiv makes those publicly visible!!!
I'm a screen reader user and usually read papers as raw TeX. I've seen everything: slurs, demeaning comments towards reviewers and professors, admissions of fraud, instructions to coauthors to commit further fraud before paper submission to mask the earlier fraud... it's all there. There's far less of it than I would think, definitely <1% of papers, but it's there.
I think it would be useful to run an LLM anti-fraud pass on the TeX source of all new arxiv papers. It wouldn't catch everything, but it would catch some of the dumbest fraudsters.
On the positive side, you can also find stronger claims that didn't survive review, additional explanations that didn't make the cut due to the conference's page limit, as well as experimental results that the authors felt weren't really worth including. Those need to be approached with an abundance of caution, but are genuinely useful sometimes.
It's been pretty eye opening watching Craig Wright (of bitcoin fakery fame) flooding out LLM generated 'academic' papers and even having some of them accepted.
He's toast if SSRN were to adopt a similar policy.
It seems a good idea to ban cheating, but how hard is it, especially in new reasoning/agents contexts to validate references?
The deeper question is whether legitimate AI generated results are allowed or not?
Test - In the extreme - think proof of Riemann Hypothesis autonomously generated (end to end) formally proven - is it allowed or not?
In that case, you would just not do a reference. End to end autonomous science might have fewer concrete citations as the contributing knowledge is just the sum of the training data of the model.
There already exists multiple tools for automatically verifying references. This measure will likely only filter out the laziest and most incompetent of AI slop submissions. It's a very modest raising of the bar, but comes at zero cost to honest researchers.
I expect arXiv will still have problems with slop submissions but, at least, their references should actually exist going forward.
It isn't "cheating" they're concerned with, it's sloppiness. This dictum isn't some sort of AI ban, but instead simply that if there is evidence that it was so low effort that the work includes such blatant problems, it's just adding noise.
> think proof of Riemann Hypothesis autonomously generated (end to end) formally proven - is it allowed or not?
Sorry to be rude, but this seems like a dumb question. I want science to progress. A primary purpose of these journals is to progress science. A full proof of the Riemann Hypothesis progresses science. I don't care how it was produced, if Hitler is coauthor, etc, I just care that it is correct. Whether the authors should be rewarded for whatever methods they used can be a separate question.
Terence Tao had a nice talk from the Future of Mathematics conference posted yesterday [0] that shapes a lot of my own feelings on this matter.
The short of it is he argues how first to correctness shouldn't be the only goal / isn't a great optimisation incentive. Presentation and digestibility of correct results is a missing 1/3 when you've finished generation and verification. I completely agree with him. You don't just need an AI generated proof of the Reimann Hypothesis. You would really like it to be intentional and structured for others to understand.
A really beautiful quote I learned of in the talk is this:
> "We are not trying to meet some abstract production quota of definitions, theorems, and proofs. The measure of our success is whether what we do enables people to understand and think more clearly and effectively about math." - William Thurston
Ya, I think this totally makes sense. Just to be clear though, I don’t think we’re actually disagreeing. A proof of the Riemann hypothesis that’s obtuse and basically unreadable is a great step on the path to a proof that is enlightening and clear. If ai provides correct-but-annoying results, I’m confident humans can still learn benefit from that marginal result.
> The penalty is a 1-year ban from arXiv followed by the requirement that subsequent arXiv submissions must first be accepted at a reputable peer-reviewed venue.
This is incredibly good for science. arXiv is free, but it's a privilege not a right!
I'm not seeing this clearly listed on https://info.arxiv.org/help/policies/index.html so it's possible this is planned but not live yet - or perhaps I'm not digging deeply enough?
As a certain doctor once said: the whole point of the doomsday machine is lost if you keep it a secret!
I bet, since this has been posted, someone here has already vibe coded a reference checker that they plan to put behind a subscription.
This is good for reference checking, but I doubt this will do much for the most likely shoddy science that accompanies hallucinated references.
The frontier LLMs are getting pretty good at checking this sort of thing. You could prompt them to not only verify the references are real but that they actually state what the article claims. Some human review will still be needed but I'll bet this approach could find a lot of academic fraud.
Your approach is good for catching stuff that human reviewers might miss not as a first line default-only unit. The whole reason this is happening is because humans are not doing their job. Your solution (humans not doing their job) is just increasing the scope of the problem.
> The frontier LLMs are getting pretty good at checking this sort of thing.
No, this is career ending high stakes. it requires old school "actually check a record of reality" type methods, like a database query or http get to one of the many services that hold this info.
I think they're saying that frontier LLMs may be usable to spot citations that are correct by shape (a real citation) but incorrect by usage (unrelated to the text)
I kind of hate the idea, but you probably could do a lazy LLM check of every paper and every citation and have it flag possible wrong (second sense) citations for human review
But you'd need a LOT of tokens and a LOT of human-hours
> have it flag possible wrong (second sense) citations for human review
And then what, we're done? How have we avoided the need for the same exhaustive human review? It only saves human review time if you trust the LLM not to miss things.
LLMs can make tool calls to do database and http queries to search for, buy, and cross reference a citation.
why is the standard response to "this tech isn't reliable enough for this" to run its output through the same unreliable tech?
The device-fixer started breaking devices instead of fixing them. Tell it to fix itself!
Yeah...
The amount of people who confidently tell on themselves in these discussions continues to bum me out.
My take: this seems excessive.
ArXiv doesn't even check the submission closely, so how can they know?
They say "errors, mistakes"
They use an automated system to check if the basic requirements were met, and sometimes papers are flagged for further superficial human review, but there is no way they can possibly do this at scale or check every reference. This would be like trying to do peer review, but for a preprint archive that gets easily 100x more volume than any journal.
Second, there is such a huuuuge gap between publishing on arvix and peer review. I can attest personally that it's not even close. I've gotten probably dozen rejections from peer review and no problems publishing in arxiv math. This is because peer review checks not just for if something is new or correct, but also if it's of "interest to math community," which is inherently subjective, but also makes peer review many magnitudes harder than publishing on arxiv.
Even when a well-known professor in number theory praised the paper when I got an endorsement and a second emailed me and and encouraged me to publish it, it still got rejected 3 times and still waiting.
Being required to publish in a peer reviewed journal will close off arxiv for many researchers for good. It also defeats the point of it being a pre-print.
This puts the burden to make sure it's right on the submitter, where it should be. Verification can come at any time after that; the submitter understands the consequences of hallucinated references. Verification can be crowd-sourced (and likely will be).
Nothing stops someone from putting a PDF on the internet. I'm fine with ArXiv holding a high standard.
More than fine, let’s encourage it.
We deserve it, it’s one of the ways to differentiate from the Elsevier et al shitboxes!
Not to mention Zenodo, Academia.edu, etc.
> ArXiv doesn't even check the submission closely, so how can they know?
They can be informed by people who read the papers and check the citations. A zero-tolerance policy provides an incentive to report sloppy papers (namely, that you can be confident something will be done about it), and each time a paper is removed or an author is banned, it incrementally increases the value of the arXiv as a whole.
> Being required to publish in a peer reviewed journal will close off arxiv for many researchers for good.
At the end of the day, demanding that people carefully proofread their LLM-generated papers before sharing them on the arXiv seems like a relative low bar to clear, and I sort of question whether it's reasonable to call individuals who find it too onerous "researchers" in the first place.
You could at least filter out hallucinated references which simply don't exist pretty trivially, I'd imagine.
It's more than that. if there are mistakes, then you can also be flagged.
read the whole tweet:
If generative AI tools generate inappropriate language, plagiarized content, biased content, errors, mistakes, incorrect references, or misleading content, and that output is included in scientific works, it is the responsibility of the author(s).
If you'd read the whole series of tweets it's obvious that is not their intention and there needs to be "incontrovertible evidence that the authors did not check the results of LLM generation" for the penalty to apply.
It's not hard to divine their intentions: you are entirely responsible for what you summit and if it's clearly slop(py) you get a ban. In a reply they state that they are seeking to apply this rule fairly and accurately and are mindful of unintended effects.
You don't need to be actively enforcing a rule 100% on everyone. Speed cameras don't cover every stretch of road either.
It's enough for them to place this policy and enforce it when they become aware of violations. Someone reading the slopped paper (or, here, trying to follow a reference) will notice sooner or later.
> Being required to publish in a peer reviewed journal will close off arxiv for many researchers for good. It also defeats the point of it being a pre-print.
You sound like it's impossible for researchers to write papers without slopped references, and inevitable to get hit by this policy.
Even acts that would be criminal in the US occur less in China due to properly enforced fines. Nobody does things assuming they will get caught unless there is a high likelihood of getting caught.
Research and practice has shown that the strongest deterrent is certainty.
Impact = Risk * Probability Occurrence.
If the fine is high enough (risk) , but probability low, people will not do the thing because of the impact on them.
> This is incredibly good for science.
I disagree. It's just one darn hallucinated citation for heaven's sake, not fraud or something. It doesn't account for the substance or quality of their work at all. A one-year ban seems plenty sufficient for a minor first time mistake like this. People make mistakes and a good fraction of them can learn from those mistakes. There's no need to permanently cripple someone's ability to progress their life or contribute to humanity just because an AI hallucinated a reference one time in their life. That's punitive instead of rehabilitative.
> It's just one darn hallucinated citation for heaven's sake, not fraud or something.
It is fraud.
> It doesn't account for the substance or quality of their work at all.
References are part of the work. If you're making up the references, what else are you making up?
> People make mistakes and a good fraction of them can learn from those mistakes. There's no need to permanently cripple someone's ability to progress their life or contribute to humanity just because an AI hallucinated a reference one time in their life.
A one year ban is not permanent. Having a negative consequence for making poor decisions seems like an inducement to learn from the mistake?
In an ideal world, one would be keeping notes on references used while doing the research that lead to writing the paper. Choosing not to do that is one poor decision.
Having a positive outlook, if asking an AI to provide references that may have been missed, one should at least verify the references exist and are relevant. Choosing not to do that is also a poor decision, even if one did take notes on references used while researching.
[0] References can be entire textbooks, which is potentially too high of a bar
Even as a human, you can still fuck up references.
I submitted a paper with a reference author as Elisio because I couldn’t read my own handwriting. After submitting, I double checked all the references through an LLM. It pointed out that their name was actually Enrique. Yes, you should probably double check your references before submitting, not after.
Point is, I didn’t even trust the LLM at first. But after verifying the mistake, I was embarrassed af. I resubmitted with the fixes before it went live, but ultimately, what’s the difference between “mistake” and “hallucination”?
I assume they won’t ban anyone automatically without a way to object. Using your example, i wouldn’t assume they would enforce the ban if you object and explain your typo and if the corrected citation actually says what you cited. Mistakes like these are explainable a completely hallucinated citation is usually not.
Sounds like you could use a tool like Zotero.
With proper bibliography management tools, everything (that has one) is centered around the DOI.
In fact, if a DOI is present, it's trivial to verify authors, title, venue, year, pages etc.
Of course, some older and more obscure papers won't have a DOI, but the vast majority of research work has.
If you write your own paper (mostly) and choose your own references (because you've actually read the papers) you won't have a problem.
> It is fraud.
I think we are talking semantics here.
While fraud does require intention to deceive, I get the sentiment that hallucinated citations shouldn't be dismissed as simply carelessness. It should be something stronger than that: gross negligence or something MUCH stronger! There should absolutely be repercussions for this.
But let's not call it fraud. That word is reserved for something specific.
EDIT: someone else said "reckless disregard" equals intent or something to that effect. So I looked it up.
It appears so that is the case. "Reckless Disregard Equals Intent" in legal language.
But I am not sure if this particular clause should apply here. Perhaps it depends on what kind of research is being published? For e.g., if it is related to medical science and has a real consequence on people's health, we can then apply this?
I do believe this policy is appropriate to deal with the reckless disregard of posting hallucinated references.
It's a conscious decision to not take the time to check your AI output, and instead waste a whole bunch of other people's time letting them essentially do that for you in duplicate.
Feels like that should disqualify you from participation for a bit. Intent or no intent.
100% agreed.
Doing your job poorly means giving more work to others and, consequently, stealing their time, their most precious asset.
Many here don't agree with this ban because they work in IT, where this immoral and antisocial behavior is normalized.
> Feels like that should disqualify you from participation for a bit. Intent or no intent.
Exactly! For a bit!
Yet this is not for a bit! This is a lifetime disqualification, and that's been my entire grip the whole time! Is nobody reading this?
"The penalty is a 1-year ban from arXiv followed by the requirement that subsequent arXiv submissions must first be accepted at a reputable peer-reviewed venue."
Mhm. Okay, honestly, I maybe don't have enough data to judge how much impact that requirement has.
I also haven't seen anything on how this works with multiple authors, which could go anywhere from draconian to weakening the entire thing.
That doesn't sound like a lifetime ban to me.
What? Then how long are they disqualifying them from submitting prior to acceptance, if not lifetime? It certainly doesn't say 1 year or something.
> Then how long are they disqualifying them from submitting prior to acceptance, if not lifetime?
Well, "lifetime ban" means "you are not allowed back in". Their ban specifically allows you back in (after a specified period) subject to fulfilling a single constraint.
It's conditional acceptance back in, which is not the same as a lifetime ban which is unconditional.
I think (though might well be misunderstanding) that reckless disregard is taken to be an intentional choice but that it does not imply that the outcome itself was intentional. The difference between intentionally doing something that you know for a fact has a high risk of failure but you can't necessarily predict the outcome versus intentionally seeking a particular legally disallowed outcome.
But what LPisGood was saying is that reckless disregard (as opposed to explicit intent) is sufficient to meet the legal bar for fraud.
The intent to deceive is there. The deception is lying when you submit it that it is a scholarly piece of work in which amongst many other things you know the citations are accurate. This false representation was knowingly and intentionally made at the time of submission.
The citation being incorrect is merely the proof of deception not the (relevant) deception itself.
Fraud is the correct description provided (and this is practically a guarantee) you intended to benefit from the submission of the paper (e.g. by bolstering your resume).
If I violate the letter of the ToS when clicking submit you can correctly argue that I have technically committed fraud! Yet that is almost never what anyone actually means when having discussions like this one.
Fraud in a scientific context generally refers to fabricated research results. At least personally I agree with GP that hallucinated citations are generally something akin to laziness thus not fraud but rather some sort of professional negligence.
Fraud in the scientific world has generally taken the form of fabricated results, but I don't agree that the word has transitioned away from the common and legal meaning of deception in order to get a benefit.
Even if it had though, I'd be perfectly comfortable calling this fraud in this discussion based on the common meaning of the word. Just because we're talking about a scientific context does not mean we need to use the scientific-jargon versions of words - we're not in a scientific context ourselves.
---
And I'd disagree that this is just about the "letter of the ToS". While that is perhaps a necessary component in order to prove the deception, this is really about the cultural expectations of the community that merely happened to have been encoded in the ToS. The fraud would still occur without the ToS, it would merely be next to impossible to show you didn't simply misunderstand the cultural norms and what your actions would lead others to believe.
I disagree with your implicit assertion regarding the common meaning of the term in this context. I believe that the term fraud as commonly used when discussing things in a scientific context has always (for at least my entire life) been taken to refer to knowingly and intentionally falsified research results (also falsified appointments, falsified affiliations, falsified authorships, etc).
> deception in order to get a benefit.
The point being that reckless or negligent conduct is not commonly taken to constitute deception. There's a reason we have different terms for these things.
Sure, you can say "well he exhibited reckless disregard for his professional duties when he opted not to bother reading the citation section that the LLM shat out, and reckless disregard is sufficient to meet the legal bar for fraud, and also the ToS specifically says that you certify that you validated all references manually so bam! two counts of fraud legally speaking" and you wouldn't be wrong but the distinction between "legally fraud" and "fraud as is commonly meant when talking about scientific papers" is essential to effective conversation in this particular instance.
> Just because we're talking about a scientific context does not mean we need to use the scientific-jargon versions of words
The context is essential because (obviously) it affects how people interpret the meaning of your words. A fraudulent submission to a scientific journal has a specific and well understood meaning in common usage.
If you still disagree with me imagine polling a bunch of tenured career researchers about what they would think if they read the statement "X caught submitting a fraudulent paper to journal Y". I can just about guarantee you that none of them are imagining hallucinated citations.
We're not "discussing things in a scientific context" here. We're in the context of a startup/programmer news aggregator discussing scientific news. We are not "a bunch of tenured career researchers" discussing amongst ourselves so the jargon appropriate for that context is not the appropriate jargon - rather we need to use the jargon that the startup/programmer news aggregator crowd would understand.
That said even in a scientific context I still disagree and your example at the end is a fine starting point. By comparison imagine one of the profs said told the others that their house was burgled. The others would probably be thinking that things like TVs or computers or money was stolen, and not that the thief simply stole all their spoons. That doesn't make having all your spoons stolen not burglary. Likewise the profs expect that the results or authorships are where the fraud occurred because those are the best places to extract value with fraud, not by avoiding the simple act of writing the paper with correct citations. That doesn't mean fraudulently using an LLM to hallucinate a paper from your (we'll suppose for sake of argument) actual results is any less fraudulent though, it's just an unexpected form of fraud.
Edit: I want to be clear that this is not my argument: "well he exhibited reckless disregard for his professional duties when he opted not to bother reading the citation section". I see other people making that argument, and I'm not sure if they're right or wrong that that's another reason why it is fraud, but I'm certain that we don't even need to reach that question.
My argument is that it is fraud to represent the paper as a scholarly work when you don't know that it is correct. It is not that you are taking a risk it might be wrong, it is that you are actively representing that you know it is correct and if you do not know that you are committing fraud even if it happens to be so. This is a case of intentional deception, the deception being the representation that this is scholarly work, not reckless disregard for the truth as to the accuracy of the citations.
There are actually a surprising (IMO) number of career researchers on this site. Regardless, disregarding the context specific meaning will at absolute best result in a disjointed conversation where people are talking past each other. Worse, in this instance people are debating how arxiv (and other venues) ought to handle these sorts of things at which point you are well and truly into the territory where you need to get the field specific terminology right.
I concede that I was sloppy when I referred to what the researchers would be imagining. I should have phrased it as asking them if they thought that transgression X constituted fraud.
Regardless, hopefully you can see the idea that I was attempting to communicate? The burglary example isn't equivalent because while the spoons are unexpected the end result is still an event that most people would agree constituted burglary and resulted in noticeable harm to the victim.
I'm struggling to adjust your example on the fly but perhaps if it were the contents of the yard waste bin that had been pilfered? That's still technically burglary but I think most people would view it quite differently and might question the wisdom of prosecuting it.
I think the key difference here comes down to motivations as well as impact. Falsifying results (for example) is an active attempt to counterfeit the core value proposition of the endeavor and the end result of that is proportional - personal benefit directly as a result of the falsification and significant damages to anyone sufficiently bamboozled by the fiction long enough to base any decisions on it. Whereas no one using an LLM to generate just the bibliography is doing that to get ahead (at least not on its own) and any damages are limited to the reader wasting a few minutes trying to figure out the extent of the issue and who to contact about it.
> In an ideal world, one would be keeping notes on references used while doing the research that lead to writing the paper. Choosing not to do that is one poor decision.
In this book
https://news.ycombinator.com/item?id=44022957
there is this passage on p. 127:
"Any author citing another paper should be required to provide proof that they a) possess a copy of that paper, b) have read that paper, c) have read the paper carefully."
> It is fraud.
No, it is emphatically not. Fraud requires intent to deceive.
> A one year ban is not permanent.
...what text are you reading? Nobody was calling the one-year ban permanent, or even against it. I was literally in favor of it in my comment. I explicitly said it is already plenty sufficient. What I said is there's no need to go beyond that. My entire gripe was that they very much are going beyond that with a permanent penalty. Did you completely miss where they said "...followed by the requirement that subsequent arXiv submissions must first be accepted at a reputable peer-reviewed venue"?
Fraud requires intent to deceive _or_ reckless disregard, sometimes called, “conscious indifference” for the veracity of the statement asserted.
No. One single hallucinated citation on a document with you as an author is not evidence of your reckless disregard for anything. These exaggerations are crazy and you would absolutely deny such accusations if you missed your co-author's AI hallucinating a citation on your manuscript too. At best it would be careless, if you really relish extrapolating from one data point and smearing people's character based on that. Not reckless. It's quite literally the difference between going five miles per hour over the speed limit versus fifty.
If your co-author inserted the fradulent reference, I agree that you may not have committed fraud. But your co-author did, and you didn't check their work. and knowing that you didn't check their work, you signed off on it.
You didn't pick your co-author very well, but arXiv lacks investigative powers to determine which co-author did the bad, so they all get the consequence.
Do you think every co-author on a 100-author paper checks every citation? It's like saying that every member of a large software team personally reviews every line of code. It's just completely divorced from reality.
[dead]
I’ve disagreed with some of your other stances in this thread, but I want to acknowledge the validity of your take here.
You’re right that a single hallucinated line is not evidence of reckless disregard - because that could have happened on a final follow-up pass after you had performed due diligence. It’s happened to me. I know how challenging it can be to keep bad patterns out of LLM generated output, because human communication is full of bad patterns. It’s a constant battle, and sometimes I suspect that my hard-line posture actually encourages the LLM to regularly “vibe check” me! E.g. “Are you sure you’re really the guy you’re trying to be? Because if you are you wouldn’t miss this.” LLMs are devious, and that’s why I respect them so much. If you think they’re pumping the breaks then you should check again, because they probably just put the pedal to the metal.
That being said, I regularly insist on doing certain things myself. If I were publishing a paper intended to be taken seriously - citations would be one of the things I checked manually. But I can easily see myself doing a final follow-up pass after everything looks perfect, and missing a last minute change. I would hope that I would catch that, but when you’re approaching the finish line - that’s when you expect your team to come together. That’s when everything is “supposed to” fall into place. It’s the last place you would expect to be sabotaged, and in hindsight, probably the best place to be a saboteur.
You're saying it as if the poor author just had no choice but to let LLM write their bibliography. To avoid hallucinations, maybe just don't let an LLM write any part of your paper?
You can only get in this situation if you let a bullshit generator write your paper, and the fraud is that you are generating bullshit and calling it a paper. No buts. It's impossible to trigger this accidentally, or without reckless disregard for the truth.
Calling LLMs "bullshit generators" in the year 2026 just shows a lack of seriousness.
Not as much of a lack of seriousness as excusing away hallucinations as not that big of a deal in what's supposed to be a researched, scholarly body of work written by humans.
Not really - much of work consists of what David Graeber described as “bullshit jobs”. Now AI and its backers are proposing to automate all that bullshit.
And yet people are trying to defend LLM-generated made-up bullshit citations in scientific papers.
> You’re right that a single hallucinated line is not evidence of reckless disregard
It absolutely is.
> - because that could have happened on a final follow-up pass after you had performed due diligence.
A "final follow-up pass" that lets the LLM make whatever changes it deems appropriate completely negates all the due diligence you did before, unless you very carefully review the diffs. And a new or substantially changed citation should stand out in that diff so much that there's no possible excuse to missing it.
> It’s happened to me.
Then you were guilty of reckless disregard.
> I know how challenging it can be to keep bad patterns out of LLM generated output
If your research paper contains any LLM generated output you did not manually vet, you are a hack and should not get published.
Allowing hallucinated content or citations into your work is an act of carelessness and disregard for the time of people that are going to read your paper and it should be policed as such.
And flatly, if a person can't be bothered to check their damn work before uploading it, why should anyone else invest their time in reading it seriously?
How are you suggesting the fake citation came about? Why are you writing papers and not having actually read the source you took the material from?
> Why are you writing papers and not having actually read the source you took the material from?
They're explicitly not writing papers. The fake citations are created and inserted by the LLM
They are still purposely writing a paper, whether that is with the help of an LLM or not. They are instructing the LLM to do the task of finding citations. It's no difference to googling for a paper that explains a specific point. You would still double check Google's output.
arXiV is not intended to be your blog. You should be held to a zero-mistake standard when publishing academic work.
The people I worry for are the junior researchers who are going to be splash damage for dishonest PIs. The PIs, though, deserve everything that’s coming for them.
Maybe I'm misunderstanding you, but zero-mistake seems harsh. I would say that AI references are a sign of something that is not simply a mistake.
However, we can have zero tolerance for certain techniques for "writing" a paper. Plagiarism and inventing data are already examples of this, if there is evidence for these techniques being used there is no excuse. We could say the same for AI references - any writing process that could produce these is by definition not a technique we want.
So the mistake isn't not checking a reference the AI gave. The mistake is letting the AI make references for you.
If we agree that academic research is important then I think we can impose certain standards on how you do it. We can dissalow certain tools if that means we can't trust the output. Just like an electrician can't use certain techniques, even if they're easy, because we don't trust the final result.
If you are using AI-hallucinated references in scientific papers then there is some obvious intent to deceive there
> No, it is emphatically not. D Fraud requires intent to deceive.
I'm about as pro AI-as-a-research--and-writing-assistant and anti AI-witchhunt as they come, but I simply cannot parse what I've quoted here.
Posting slop to arxiv is blatant deception. Posting an article is an attestation that the article is a genuine engagement with the literature. If you're posting things to arxiv that are not sincere engagements with the literature, you are attempting to deceive.
>I'm about as pro AI-as-a-research--and-writing-assistant and anti AI-witchhunt as they come, but I simply cannot parse what I've quoted here.
Ditto. And its only 1 year. Like its about the most reasonable thing they could have done.
> And its only 1 year
No, it emphatically is not just a year! It's perpetual, and that's literally been my entire point this whole time. If it was just one year I would've had no complaints - and I made that clear from the very first comment!
What part of "...followed by the requirement that subsequent arXiv submissions must first be accepted at a reputable peer-reviewed venue..." is everyone here reading and still somehow interpreting to be limited to 1 year?
You are equating cutting corners (ie laziness) with intentional deception and not being genuine. That doesn't seem accurate to me. In most contexts I think cutting corners would be taken to be some form of negligence or recklessness.
Regardless of terminology, I agree that it's certainly punishable and certainly a serious problem.
> followed by the requirement that subsequent arXiv submissions must first be accepted at a reputable peer-reviewed venue"?
This part seemed reasonable too. I'm not in academia, but my understanding is most people writing papers intend for them to be accepted by reputable peer-reviewed venues, but post to arXiv because those venues don't always allow for simple distribution.
If your papers aren't going to be accepted at reputable venues and you posted slop to arXiv before (and they noticed it!), seems reasonable that they only want reputable stuff from you in the future?
it's very silly, but not a big deal. Arxiv is becoming irrelevant these days anyways.
In fact would be better if they just banned AI, so we could just get off the luddite platforms.
Automated research is the future, end of story. And really it couldn't have come out at a better time, given the increasingly diminishing returns on human powered research.
Poe's law striking hard.
If automated research is the future, it has to be research, not making stuff up.
Which of those two does "hallucinated references" fit into?
A "mistake" would be a typo in a real citation. A hallucinated citation is evidence of just plain laziness and negligence, which taints the entire submission.
No it is not. Seriously. All you need for this to happen is for your lab partner to ask AI to add a missing citation that they are already familiar with at the last minute before a midnight submission deadline, and for the AI to hallucinate something else, and for them to honestly miss this. It does not even imply any involvement on your part, let alone that either of you were lazy or negligent on the actual research or substance of the paper. The lack of any sympathy or imagination here is astounding.
There are no deadlines for journal submissions. Even if you felt you were running close to your revisions being due, an email to an editor will probably fix this for you. And what you described is still negligent, not verifying the garbage output bot did not in fact output garbage.
Even more, there are no deadlines for arXiv submissions.
Kinda. There is a deadline to appear in the next days posting and this can be important if e.g. you want to get something on the arxiv before a talk or a proposal deadline.
Your constructed hypothetical makes it even worse. If there are 2+ people in this scenario who have good intentions, this should especially never happen. When you sign your name on a paper, you are nonetheless vouching for everything written in it, including the things you didn't personally write. You should absolutely be checking every single reference your co-author included and verifying that it says what your co-author claims it says. This is something you should have been doing completely independent of LLMs existing. This is something you're publishing publicly, something that may be associated with you and your career for the rest of your life, it is insanely negligent to not even read and verify what your co-author is adding.
In other words: all it needs for your paper to have fraud is for your lab partner to add fraud to your paper.
I'm not seeing the problem here. The only problem is that your lab partner should be banned and not you. But being incentivised to check your co-author's work before submission isn't a bad thing.
> But being incentivised to check your co-author's work before submission isn't a bad thing.
Nobody was arguing against this.
You’re confusing the issue here by saying it’s not your fault, it’s your lab partner’s. We’re talking about why your lab partner did something wrong. You can assign blame for the wrong thing separately.
The citation is part of the substance of the paper. If you YOLOed in a citation without checking it, seems justified to suspect that you may have YOLOed in some data, or some analysis, or maybe even the conclusion.
Being suspicious would be reasonable and I think a penalty could perhaps be appropriate but the person you're replying to is objecting that the stated hypothetical would not rise to the bar of negligence.
Do bear in mind the degree of the described scenario. There's quite a difference between having an LLM shit out your entire citation section (and possibly the rest of the paper as well) versus asking the tool to make a targeted edit and overlooking a small piece of nonsense that results.
This is like saying lawyers should be allowed to submit AI-hallucinated case references or quotations in court documents. Because by your logic, that, too, should be perfectly acceptable. Yet is not, for hopefully obvious reasons. Why exactly should scientific research be any different? If your paper contains hallucinated references, we can't verify your assertions in the paper, and therefore must question the paper as a whole.
You seem incredibly upset you can't get away with fraud and that people are calling it fraud.
> You seem incredibly upset you can't get away with fraud and that people are calling it fraud.
Yeah, that's exactly what I've been upset about. You really nailed it!
I hope you don't do science because this is how reputation get tainted.
The lack of understanding that you are responsible for the content you create, no matter what tools you use, is what's astounding.
If you cannot be bothered to check your references when writing academic quality papers then you have no place writing them in the first place. The punishment is not chopping off a finger, it is a polite reminder to do the bare minimum.
Well, in the good old days, when we have refereed journals, it would be part of the publishing process.
What's the difference between a "hallucinated" citation and consciously inserting reference to a non-existent paper and hopping it goes unnoticed? How do we determine which one was done consciously and which was "a minor first time mistake"?
Your standards are lower than what they would accept at my high-school. Seriously.
And generally, if you are generating papers with LLMs, let other LLMs read them. Why would we waste human hours considering something that was generated? At this point publish your prompt because that's the actual work you're doing.
It's not the kind of mistake that is possible unless you're engaging in fraud anyway.
> It's not the kind of mistake that is possible unless you're engaging in fraud anyway.
Seriously? You can't fathom an honest researcher asking for AI to find a citation they know exists, and the AI inserting or modifying a citation incorrectly without them realizing?
If you find evidence of fraud by all means lay down the hammer. Using a single hallucinated citation like it's some kind of ironclad proxy just because you think they must be committing fraud is insane.
if you're not checking citations in the paper youre publishing AND trusting a non SOTA, hallucination prone ai model to come up with sources for it, its probably for the best of everyone that this paper isn't published.
yes there will be rare exceptions but in general i feel like this is a really good addition.
> non SOTA, hallucination prone ai model
What SOTA models are not hallucination prone?
Why would you ask the ai to find a citation you know exists? Just reach for that citation.
if an llm does the work, you did not write it or research it, the llm did. you have no business crediting yourself as an author.
if someone writes a paper and an entirely different person takes credit for it without even bothering to check if the actual writer just made shit up, they deserve a lifetime ban. seems like a year is a very light punishment.
Yes, having AI write something and not checking it yourself is sure to lead to hallucination, hence, it is a fraudulent way to write.
>Seriously? You can't fathom an honest researcher asking for AI to find a citation they know exists
Assumptions:
1. The entire document is loaded into an AI editor
2. The researcher is asking an AI editor to work on his references
3. The researcher has not checked his own references.
This could be avoided at 1, 2 or 3. But even just 1 implies that the researcher knows that they have a hot potato and might critically fuck up and lose all credibility. Being in that scenario and committing to 2 and 3 is at least extreme negligence.
If you are citing a work you paste a citation to that work. If you are bullshitting you ask an AI to come up with a citation. Jesus, there is zero reason to ever "generate a citation" if you are not, in fact, commiting fraud.
That's like saying that there's zero reason to ever ask an LLM to do basic math for you. Sure you probably shouldn't do that but sometimes it's convenient and so people will inevitably do exactly that regardless of the somewhat frequent wrong answers that are guaranteed to ensue.
I much agree. But I wonder shouldn't the citations all be hyperlinks and thus easy to verify?
How specific are the citations? If it's “Sentence 4 on page 97 supports” or “Paper says ‘___’” then I imagine it would be fairly easy. If it's “(__ page long) paper supports x”, then very difficult?
Verifying that the reference you cite actually exists is the absolute minimum standard for academic work. It is not optional, not something to skip because of a deadline, and not something to outsource blindly to hallucination-prone AI.
If someone cannot meet that bar, they have no business publishing research papers. I have written academic papers myself, and I find it astonishing that people are trying to justify this as if it were some understandable workflow mistake. At that point it is simply slop with academic formatting. Post it on a blog or somewhere else, but do not put it into the scientific record.
A one-year ban is not a lifetime ban. Maybe six months would also have been enough, but the author can use that time to think about whether they should verify references next time — and to manually check every other citation.
> Seriously? You can't fathom an honest researcher asking for AI to find a citation they know exists, and the AI inserting or modifying a citation incorrectly without them realizing?
Indeed I cannot. If you do that, you are not, in fact, an honest researcher. You're a lazy hack.
I would not necessarily go as far as calling it fraud, but if you cannot even verify that the reference you are citing actually exists, you are not ready to publish research papers.
Deadlines are not an excuse here. Checking whether a cited book, paper, or passage exists is the absolute minimum standard for scientific work, not an optional extra. I have written academic papers myself, and I find it astonishing that people are trying to justify this as if it were some understandable workflow mistake. At that point it is simply slop with academic formatting.
A one-year ban is not a lifetime ban. Maybe six months would also have been enough, but the point is that the author gets time to think about whether they should verify references next time. They can also use that time to manually check every other citation.
A citation is where you derived knowledge... If you haven't checked it and you are submitting something that should represent a ton of labour (and which will consume labour to review), you don't understand what you're doing. It is not just crossing T's and dotting I'd.
Your being set behind is less important than the fact that your publishing is setting everyone else behind.
Such a banned person is being helped to "step out of the way", and someone more competent will assuredly step forward to consume the limited maintenance labour more thoughtfully
> Your being set behind is less important than the fact that your publishing is setting everyone else behind
One hallucinated citation does not in any way imply anyone is being left behind. All it means nobody is checked that particular line of the manuscript after it was written. The rest of the paper could still be solid and treated accordingly. If you find evidence of the contrary, of course treat it accordingly, but this is so obviously not that.
> One hallucinated citation does not in any way imply anyone is being left behind.
The parent said “setting” others behind, which refers to lost time.
Being “left” behind implies a degraded trajectory, which is defined not by time lost, but by the final destination.
Different but related things (e.g. lost time can indeed affect your final destination, for instance, after growing old correcting a scourge of hallucinated citations - which should have been table stakes all along).
That was literally just a typo, I was walking and messed up while typing. Pretend I wrote "set behind." It makes no difference to my point and I fully stand behind the comment with that correction.
If all you're genuinely worried about is the collective human time spent on tracing down one stupid hallucinated citation in a paper, may I remind you of the ludicrous amounts of time and effort readers waste trying to wade through the sea fluff, jargon, and complexity frequently added to papers in a completely deliberate fashion. If wasting even a little bit of readers' time is what you see as the crime here, you have orders of magnitude bigger fish to fry.
The fact is that, for one hallucinated citation to be the noteworthy bit that "sets others behind" in any meaningful way, the actual substance of your paper has to be utterly worthless (or worse); otherwise, you're contributing far more than you're taking away, and thus your paper is very much not setting others behind. OTOH, if your paper really is worthless or harmful enough for this part of it to be a big deal, that would be the basis for punishment, not this. A single hallucinated citation is simply not a bleep on that metaphorical radar.
No. It's fraud.
You clearly misunderstand. You cite a work in your paper because you have read that work, and build upon it or want to refer to it to back up a specific claim. Generating references is fraud period, because you are implying that you have read a work when in fact you just asked an AI "please insert some reference-shaped text here" to make it look like a proper paper. It is sadly not a necessary, but certainly a VERY sufficient, reason to conclude a paper is fraudulent.
It’s easy to avoid this whole issue: write the paper yourself.
Yes, it is fraud
Don't use AI? Problem solved?
> There's no need to permanently cripple someone's ability to progress their life or contribute to humanity
I don't think you need to publish on arXive to contribute meaningfully to humanity.
> That's punitive instead of rehabilitative.
Unfortunately science is competitive. Yours is a race to the bottom where the people who can afford the most expensive models and who are least concerned with the truth can publish the most papers and benefit financially and professionally by doing so. This is not a zero sum arena, grant money and opportunities will possibly be rewarded to them, and not to another team who is producing more careful and genuine output.
You are being ironic right?
In science, one hallucinated reference can corrupt the entire rest of the work. So you're completely wrong.
And every piece of work in future which cites the paper with the hallucinated reference.
[dead]
Seeing the usual LLM hypers angry replying to this on twitter is such a tell. Just like the comments on the LLM poisoning articles, some people just can't accept that some people don't like LLMs and get upset when you put any amount of hindrance to their rapid acceptance.
It's hard for me to even understand their perspective. Researching references for a published academic paper isn't some incidental busywork task, it's supposed to be a core part of doing research which is the core of the job. If you don't have sympathy for someone who, say, paid a person on Fiverr to cook up a paper rather than writing it themselves and then didn't even bother to check the references, why is using an LLM and not checking any better?
There is a lot of "throw it against the wall, and if it sticks, write it up" empirical work against benchmarks. It leads to post-hoc rationalization of the work and browser plugins using LLMs to find references for work that is already written. It is a bureaucratic view about "you need a citation for this", where people misunderstand the citation as a checkbox, instead of "you need to substantiate this claim, as I, the reviewer, do not accept this as a fact".
It's also hilarious that they complain about this because, from what I've seen, most LLM hypers will talk about something being irrelevant or taken over by AI with no understanding of what that something really is or involves.
> some people don't like LLMs
It's not even that they "don't like LLMs". They just don't like academic fraud! If references were fabricated with a Markov chain it would be just as bad!
Crazy that this is graytexted. So basically HN consensus is that we need to be hyper and accelerate llm adoption everywhere.
Bonkers. At the same time peak hn
https://xcancel.com/tdietterich/status/2055000956144935055
> Our Code of Conduct states that by signing your name as an author of a paper, each author takes full responsibility for all its contents, irrespective of how the contents were generated (Dieterrich, T. G.)
coauthors about to get roasted
To be a coauthor on a preprint that you have not submitted, you have to actively "claim" it (using a password given to the author who submitted). It's on you to double-check before claiming.
Is that your definition or theirs?
I can't see that in the code of conduct.
While this is certainly a welcome step, I hope there is more work done to fix the underlying problem of easily creating correct BibTeX entries for the cited papers. Citations for any given paper can come from a wide range of journals with various publishers, conferences, and preprints. The same paper can be available from multiple sources with varying details, e.g. arXiv and the conference website. Tools like Zotero have certainly made it significantly easier to extract citations from webpages of publication, but I still find issues with the extracted BibTeX details. While author names and titles are often extracted correctly, I still have to manually ensure that details like publication venue, year, volume number, page number, URL, etc. are extracted correctly and also shown correctly in LaTeX format. Different publications can use different citation styles. This can unfortunately lead to taking shortcuts with AI-generated citation data due to the lack of an easy and unified approach to extract consistent citation data. I am not sure whether hallucinated citations are being generated in the main manuscript or in a separate BibTeX file, so I may be a bit off in my understanding.
Fun fact: if an article has a DOI, you can just use curl to get a BibTeX entry. An example using one of my articles:
This is the exact same method that Zotero uses internally, so this won't ever give you better results, but I still find it kinda neat.Note that Zotero also has a free online tool to generate citations in any format or BibTeX files from a URL/DOI/ISBN/...
https://zbib.org/
Good.
If it’s not worth your time to check the output of your LLM carefully, it’s not worth my time to read it.
Unfortunately, it's probably not worth your time to read 99% of arxiv papers, LLM generated or otherwise.
Ever pick a random one and really dive in?
Well, yeah, 99% of arXiv papers were not written for me or you. They were written for someone who works in a niche within a niche. That's (in my view) the beauty of research.
Agreed. There was already too much human generated slop in academia.
And I’m not talking about good faith research that didn’t pan out, I mean research that is completely useless for any other purpose other than convincing a casual observer that the authors are doing research.
Next, for AI papers, a reproducibility requirement. So much code and details are fudged and paper's cannot be reproduced. Ran the training with some other config, or other data, etc. to make their mechanism or intervention seem better.
I just wish to anyone who is against this policy to be forced to review a paper that turns out to be unedited AI slop. Reviewers are experts volunteers who do it for free. It is incredibly frustrating to have spent 4 hours reading a paper where you try your best to make sense of what the authors are trying to prove just to realize that it is hallucinations.
The authors should value the time of the reviewers higher than their own time. So, if you include AI nonsense in your paper, it is insulting.
Great, it's so easy to automate checking ref super bad to not check
how will they detect hallucinated refs at scale? Manual spot checks? Automated DOI verification? The policy seems right but enforcement is the hard part.
Enforcement is secondary and is allowed to take weeks / months / never at all if nobody reads the paper. It's about being able to ban if an issue arrises; not about keeping the database strictly clean.
However difficult it might be right now it's only going to get easier. Anyway I don't think proactive enforcement is the point. Rather now they have an official method by which to address incidents that are brought to their attention.
There needs be to a careful vetting before such adverse actions. If somebody includes a name and pushed it without express permission, does everyone get the ban? I agree that implemented the right way, this is good.
Plus afaik you can add any co-author you want without validation. So you can ban everyone on arxiv with one paper with one sentence.
As I mentioned in another thread:
To be a coauthor on a preprint that you have not submitted, you have to actively "claim" it (using a password given to the author who submitted). It's on you to double-check before claiming.
I surely hope that only "confirmed" coauthors will get the ban, it's only logical.
No mercy to brain slugs.
This has become such a problem in scholarly publishing that we have a business that provides citation checking https://groundedai.company/ that we've been buidling for a couple of years now
What’s the hallucination rate of your AI?
It's not unexpected, but still sad to see so many comments opposing even the smallest step against low-effort fraud in academic publications. Is this what hacker culture has been reduced to in the age of the slop era? Open hostility against science and engineering?
Good; academic literature is in crisis because of all of the slop. Forcing some consequences on easily-detectable hallucinations can only be a good thing
It's not just AI, though. I did a doctorate in physics about 40 years back, and bad references were a problem back then.
Doesn't matter if it is AI hallucinations or entirely human scientific fraud, the problem is the same, and the solution works fine for both cases.
If you can't validate that your bibliography is full of real articles, you shouldn't get published.
LLMs have just poured gasoline on the fire.
In what way? Surely something like the source not quite saying what was cited, or mixing up citations, rather than inventing them outright?
That, and mixing reference details from multiple sources and messing it up.
Let's say you read a paper on Arxiv but cite the version that was submitted to a journal or conference, without realizing that the authors made changes to the version they submitted and forgot to upload them to Arxiv.
In physics, references which just didn't exist. That could be that the author made it up, but often it's because they transcribed the reference from another paper without reading it - we know because a few people have deliberately introduced fake references to trace how far they would go. The reasons are not the same as for AI, but the problem they produce is the same.
References which don't accurately reflect the quoted material seem more common in other subjects.
"Bad", like, you literally just made them up? I hope that would have been a problem.
Well, I certainly didn't make them up. But it was common to follow a reference and find that there was no paper on the other end.
Which is why the angry replies on Twitter from AI hype accounts is so funny. You should get penalised for fake references and profanity in your submissions, even if you wrote your slop longhand. I don't know why anyone would have an issue with this policy.
Yes and ffs arrows kill people too but we don't bring that up every time we talk about what to do with guns.
Imagine how bad they are now then.
What are reasonable alternatives to arXiv? It has become increasinbgly slow. Techrxiv?
Had a colleague submit a paper with literal AI slop left in the text, got hit with a nasty revision request. Check your drafts before you submit, people. The reviewers will find it.
Also check your LaTeX comments, Arxiv makes those publicly visible!!!
I'm a screen reader user and usually read papers as raw TeX. I've seen everything: slurs, demeaning comments towards reviewers and professors, admissions of fraud, instructions to coauthors to commit further fraud before paper submission to mask the earlier fraud... it's all there. There's far less of it than I would think, definitely <1% of papers, but it's there.
I think it would be useful to run an LLM anti-fraud pass on the TeX source of all new arxiv papers. It wouldn't catch everything, but it would catch some of the dumbest fraudsters.
On the positive side, you can also find stronger claims that didn't survive review, additional explanations that didn't make the cut due to the conference's page limit, as well as experimental results that the authors felt weren't really worth including. Those need to be approached with an abundance of caution, but are genuinely useful sometimes.
https://xcancel.com/leaksph
That's why my forarxiv make target includes a run through latexpand
Sad the suggestion here is to just disguise the slop to make it harder for reviewers to spot rather than not submitting slop to begin with.
Hurray!
As of yet no comments here seem to address the "reputable" condition. Reputable review is based on what criteria?
It's been pretty eye opening watching Craig Wright (of bitcoin fakery fame) flooding out LLM generated 'academic' papers and even having some of them accepted.
He's toast if SSRN were to adopt a similar policy.
Should be more harsh in my opinion.
[flagged]
[flagged]
[flagged]
It seems a good idea to ban cheating, but how hard is it, especially in new reasoning/agents contexts to validate references?
The deeper question is whether legitimate AI generated results are allowed or not? Test - In the extreme - think proof of Riemann Hypothesis autonomously generated (end to end) formally proven - is it allowed or not?
This is not about banning cheating, it’s about banning inaccurate information.
You don’t need to solve everything, catching a few thousand non existent citations with such a policy is on its own a net benefit.
It is allowed as long as it’s verified.
The thread specifically points out that if authors can’t be arsed to simply proofread their text the rest can not be trusted either.
It’s a simple heuristic against low quality submissions, not an anti-ai measure.
If you use AI correctly, nobody should be able to tell that it was used at all.
In that case, you would just not do a reference. End to end autonomous science might have fewer concrete citations as the contributing knowledge is just the sum of the training data of the model.
There already exists multiple tools for automatically verifying references. This measure will likely only filter out the laziest and most incompetent of AI slop submissions. It's a very modest raising of the bar, but comes at zero cost to honest researchers.
I expect arXiv will still have problems with slop submissions but, at least, their references should actually exist going forward.
It isn't "cheating" they're concerned with, it's sloppiness. This dictum isn't some sort of AI ban, but instead simply that if there is evidence that it was so low effort that the work includes such blatant problems, it's just adding noise.
> think proof of Riemann Hypothesis autonomously generated (end to end) formally proven - is it allowed or not?
Sorry to be rude, but this seems like a dumb question. I want science to progress. A primary purpose of these journals is to progress science. A full proof of the Riemann Hypothesis progresses science. I don't care how it was produced, if Hitler is coauthor, etc, I just care that it is correct. Whether the authors should be rewarded for whatever methods they used can be a separate question.
Terence Tao had a nice talk from the Future of Mathematics conference posted yesterday [0] that shapes a lot of my own feelings on this matter.
The short of it is he argues how first to correctness shouldn't be the only goal / isn't a great optimisation incentive. Presentation and digestibility of correct results is a missing 1/3 when you've finished generation and verification. I completely agree with him. You don't just need an AI generated proof of the Reimann Hypothesis. You would really like it to be intentional and structured for others to understand.
A really beautiful quote I learned of in the talk is this:
> "We are not trying to meet some abstract production quota of definitions, theorems, and proofs. The measure of our success is whether what we do enables people to understand and think more clearly and effectively about math." - William Thurston
[0] https://www.youtube.com/watch?v=Uc2zt198U_U
He would say that now he’s got tenure though.
Ya, I think this totally makes sense. Just to be clear though, I don’t think we’re actually disagreeing. A proof of the Riemann hypothesis that’s obtuse and basically unreadable is a great step on the path to a proof that is enlightening and clear. If ai provides correct-but-annoying results, I’m confident humans can still learn benefit from that marginal result.