Should Academic Journals Use Detection Tools and Auto-Reject AI-Written Work?

There are growing calls for journals to prohibit “extensive” AI use in manuscripts and referee reports, sometimes coupled with proposals to run AI-detection software and automatically reject texts flagged as AI-written. I worry this approach will lower scientific quality and productivity, and that it will do so for reasons that are only weakly connected to what journals are supposed to optimize. I propose a simpler principle: journals should evaluate manuscripts by scientific merit and enforce accountability for claims, rather than policing the provenance of prose.

Definition: what counts as “extensive” AI use?

By “extensive AI-tool usage”, I mean cases where most of the text of a manuscript or referee report has been drafted by a generative AI tool, as opposed to narrow uses such as copyediting, proofreading, or language polishing. The latter are functionally similar to human editing and rarely treated as ethically problematic.

Why do some people want categorical rejection?

The impulse behind categorical rejection appears to be an authorship-and-credit view: scholarly credit should attach only when the author personally produces the work, in a strong, almost artisanal sense. If an AI system writes 95 percent of the words, the argument goes, the author did not truly “do the work”, so the output should not count.

That position is not obviously correct, and it is not obviously aligned with the goals of science.

A simple thought experiment

Consider a simple case. A scientist has a novel idea with clear scientific value. Instead of spending weeks turning it into polished prose, the scientist iterates with an AI tool to produce a coherent paper. The scientist supplies the core idea, pushes the tool through multiple rounds of critique and revision, checks the logic, and ultimately submits an excellent manuscript. The journal runs an AI detector, it claims “95% AI-written”, and the paper is automatically rejected. The same happens elsewhere. The work is effectively blocked from the scientific record unless the scientist spends substantial time rewriting it purely to satisfy a production norm about who typed the sentences.

If the mission of a journal is to select and disseminate high-quality contributions, this is difficult to defend. It treats “who wrote the prose” as decisive even when the science is strong. That is a mismatch between means and ends.

Tool use is already central to modern research

A common reply is that writing is part of scholarship, so authors should do it themselves. I regard this as a category mistake. Many parts of the research process are already delegated to tools, including highly consequential parts. Empirical economists rely on statistical packages to estimate models, compute standard errors, and implement algorithms that almost nobody could reproduce by hand in practice. We do not reject papers because the author did not personally execute matrix algebra on paper. We instead require that the methods are appropriate, the results are correct, the code is available when needed, and the claims survive scrutiny.

If tool use is acceptable for inference, computation, and simulation, it is hard to see why tool use for exposition should trigger categorical rejection. The relevant question is whether the manuscript is correct, intelligible, and contributes to knowledge, not whether the prose was typed by a human.

Comparative advantage and scientific productivity

There is also an efficiency argument that matters for scientific progress. Researchers differ in strengths. Some have an unusual ability to generate fruitful ideas, identify sharp questions, and design credible empirical or theoretical strategies, while being comparatively weaker writers. If journals enforce a norm that valuable ideas must be expressed in wholly human-written prose to be admissible, they impose a tax on idea production and redirect effort toward mechanical tasks. The predictable result is lower output and slower progress, with little compensating gain in scientific reliability.

The epistemic objection: “Then we do not know who had the idea”

A related objection is epistemic and incentive-based: if extensive AI assistance makes it harder to infer who contributed the underlying ideas, then publication may become a noisier signal for credit allocation, and credit allocation matters because it shapes incentives for effort, originality, and integrity. This is a serious concern.

The institutional question is therefore which rule best aligns credit with contribution while preserving a high-quality scientific record. A blanket prohibition on AI-drafted prose is a blunt instrument because prose provenance is, at best, an imperfect proxy for intellectual contribution. It would encourage superficial compliance behavior (rewriting to satisfy a production norm) rather than the behavior journals ultimately need (truthful claims, accurate citations, and defensible inference).

A more coherent response is to tighten responsibility and signaling where it actually matters. Journals can require functional disclosure of AI use (especially where it could affect factual content), require explicit author contribution statements (for example, who formulated the core question, designed the strategy, conducted the analysis, and took responsibility for the argument), and enforce sanctions for misrepresentation. Under such a regime, even if AI contributed to ideation at the margin, credit remains tied to disclosed, accountable contributions rather than to who typed the sentences. AI-assisted ideation raises genuine questions about credit and incentives that deserve further thought, but those questions are best addressed through disclosure, contribution statements, and enforcement against misrepresentation, rather than through blanket bans or detector-based auto-rejection.

Why detection-and-auto-reject is a poor mechanism

Even if one endorses stricter norms around disclosure, AI-detection tools are a weak foundation for high-stakes editorial decisions. It seems to me they are opaque, error-prone, and unstable over time, particularly as writing models evolve and as humans adapt style in response. Rejecting manuscripts or referee reports on the basis of a probabilistic classifier raises fairness concerns because authors can be punished without transparent, contestable evidence.

Moreover, a detection-and-evasion arms race diverts resources away from research and toward compliance theater. It creates incentives to spend time “beating the detector” rather than improving arguments, verifying citations, or clarifying contributions. That is a poor use of high-skill labor, and it targets the wrong object. Journals should care about validity, novelty, and contribution, not stylometric guesses about authorship.

Suppose, for the sake of argument, that detection were perfect. Even then, an auto-rejection rule remains misguided. Perfect detection would establish only that an AI system produced much of the prose, which is comparatively superficial if the underlying contribution is sound. It would institutionalize a norm that elevates the mechanics of sentence production over the epistemic content of the work.

A more coherent policy stance

Journals should evaluate manuscripts by scientific merit and enforce accountability for claims. Extensive AI assistance should not, in itself, be a reason for rejection. If AI-assisted drafting produces weak argumentation, mischaracterizes prior work, fabricates citations, or introduces factual errors, then the manuscript should be rejected for the usual reason: it fails to meet scholarly standards.

However, manuscript policy and peer-review policy should probably not be identical. The right design depends on what is being protected.

  1. Manuscripts: quality, transparency, and responsibility

For manuscripts, journals can protect the scientific record without detector-based gatekeeping.

First, I recommend maintaining and strengthening substantive standards. Require clear identification of the contribution, coherent argumentation, appropriate engagement with the literature, and claims that are testable and not overstated. Tool-assisted prose does not relax any of these requirements.

Second, I favor enforcing author responsibility. Authors should remain accountable for all claims, citations, data provenance, and ethical compliance, regardless of what tools were used to draft or edit the text. If an AI tool generated an argument, the author is still responsible for defending it. If an AI tool proposed a reference, the author is responsible for verifying it.

Third, I think disclosure could be used in a limited, functional way. If journals require disclosure of AI use, the purpose should be accountability and reproducibility rather than stigma. A sensible approach is to distinguish between editing and drafting, to note whether AI was used for translation or code assistance, and to clarify whether any tool could plausibly have affected factual content (for example, references, numerical claims, or descriptions of prior work). This is analogous to disclosing statistical software and computational tools: the point is transparency, not moral judgment.

Fourth, journals should focus on verifiability, not style. Where risks are salient, journals can require concrete checks: reference verification, access to data and code when relevant, and a review process that explicitly targets validity rather than rhetorical aspects. If AI use increases the frequency of certain failures, then journals should address those failures directly.

  1. Referee reports: confidentiality and data handling

Referee reports raise an additional constraint: confidentiality. Journals are justified in imposing stricter rules here, because reviewers typically have access to unpublished ideas, data, and arguments. A journal can reasonably prohibit reviewers from uploading manuscript text, detailed descriptions of unpublished results, or other confidential materials into external AI systems. This is not about punishing AI use as such; it is about preventing disclosure.

Within that constraint, the relevant enforcement target should be confidentiality compliance and the substantive quality of the report, not stylometric detection. A journal could permit narrow uses that do not implicate confidentiality, such as local grammar correction, formatting assistance, or rewriting of a reviewer’s own notes, provided the reviewer remains intellectually accountable for the content. If an AI tool can credibly guarantee non-training and, crucially, stringent confidentiality safeguards (minimal retention, strong access controls, and auditable compliance), then the privacy concern is substantially mitigated, though not necessarily eliminated.

Conclusion

Science should be organized around producing and selecting true, important, and useful knowledge. A policy of rejecting work because an AI system wrote much of the prose substitutes an authorship norm for epistemic evaluation, encourages wasteful compliance behavior, and risks suppressing valuable contributions. Journals should not implement auto-rejection rules based on AI-detection. They should evaluate what matters: the quality of the science, transparency through functional disclosure where appropriate, and clear human accountability for the claims being advanced.