Is Your Paper Being Reviewed by an LLM? Investigating AI Text Detectability in Peer Review

Peer review is a critical process for ensuring the integrity of published scientific research. Confidence in this process is predicated on the assumption that experts in the relevant domain give careful consideration to the merits of manuscripts which are submitted for publication. With the recent r...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Yu, Sungduk, Luo, Man, Madasu, Avinash, Lal, Vasudev, Howard, Phillip
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Yu, Sungduk
Luo, Man
Madasu, Avinash
Lal, Vasudev
Howard, Phillip
description Peer review is a critical process for ensuring the integrity of published scientific research. Confidence in this process is predicated on the assumption that experts in the relevant domain give careful consideration to the merits of manuscripts which are submitted for publication. With the recent rapid advancements in the linguistic capabilities of large language models (LLMs), a new potential risk to the peer review process is that negligent reviewers will rely on LLMs to perform the often time consuming process of reviewing a paper. In this study, we investigate the ability of existing AI text detection algorithms to distinguish between peer reviews written by humans and different state-of-the-art LLMs. Our analysis shows that existing approaches fail to identify many GPT-4o written reviews without also producing a high number of false positive classifications. To address this deficiency, we propose a new detection approach which surpasses existing methods in the identification of GPT-4o written peer reviews at low levels of false positive classifications. Our work reveals the difficulty of accurately identifying AI-generated text at the individual review level, highlighting the urgent need for new tools and methods to detect this type of unethical application of generative AI.
doi_str_mv 10.48550/arxiv.2410.03019
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2410_03019</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2410_03019</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2410_030193</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgEKGBgbGFpyMkR5FitE5pcWKQQkFqQWKTilZualKwSllmWmlqemKCRVKiTmKfj4-NoreOaVpRaXZKYnloBUOHoqhKRWlCi4pJakJpckJmXmZJZUKmTmKQSkAk2B6OdhYE1LzClO5YXS3Azybq4hzh66YFfEFxRl5iYWVcaDXBMPdo0xYRUAvNc-gg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Is Your Paper Being Reviewed by an LLM? Investigating AI Text Detectability in Peer Review</title><source>arXiv.org</source><creator>Yu, Sungduk ; Luo, Man ; Madasu, Avinash ; Lal, Vasudev ; Howard, Phillip</creator><creatorcontrib>Yu, Sungduk ; Luo, Man ; Madasu, Avinash ; Lal, Vasudev ; Howard, Phillip</creatorcontrib><description>Peer review is a critical process for ensuring the integrity of published scientific research. Confidence in this process is predicated on the assumption that experts in the relevant domain give careful consideration to the merits of manuscripts which are submitted for publication. With the recent rapid advancements in the linguistic capabilities of large language models (LLMs), a new potential risk to the peer review process is that negligent reviewers will rely on LLMs to perform the often time consuming process of reviewing a paper. In this study, we investigate the ability of existing AI text detection algorithms to distinguish between peer reviews written by humans and different state-of-the-art LLMs. Our analysis shows that existing approaches fail to identify many GPT-4o written reviews without also producing a high number of false positive classifications. To address this deficiency, we propose a new detection approach which surpasses existing methods in the identification of GPT-4o written peer reviews at low levels of false positive classifications. Our work reveals the difficulty of accurately identifying AI-generated text at the individual review level, highlighting the urgent need for new tools and methods to detect this type of unethical application of generative AI.</description><identifier>DOI: 10.48550/arxiv.2410.03019</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language</subject><creationdate>2024-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,777,882</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2410.03019$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2410.03019$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Yu, Sungduk</creatorcontrib><creatorcontrib>Luo, Man</creatorcontrib><creatorcontrib>Madasu, Avinash</creatorcontrib><creatorcontrib>Lal, Vasudev</creatorcontrib><creatorcontrib>Howard, Phillip</creatorcontrib><title>Is Your Paper Being Reviewed by an LLM? Investigating AI Text Detectability in Peer Review</title><description>Peer review is a critical process for ensuring the integrity of published scientific research. Confidence in this process is predicated on the assumption that experts in the relevant domain give careful consideration to the merits of manuscripts which are submitted for publication. With the recent rapid advancements in the linguistic capabilities of large language models (LLMs), a new potential risk to the peer review process is that negligent reviewers will rely on LLMs to perform the often time consuming process of reviewing a paper. In this study, we investigate the ability of existing AI text detection algorithms to distinguish between peer reviews written by humans and different state-of-the-art LLMs. Our analysis shows that existing approaches fail to identify many GPT-4o written reviews without also producing a high number of false positive classifications. To address this deficiency, we propose a new detection approach which surpasses existing methods in the identification of GPT-4o written peer reviews at low levels of false positive classifications. Our work reveals the difficulty of accurately identifying AI-generated text at the individual review level, highlighting the urgent need for new tools and methods to detect this type of unethical application of generative AI.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgEKGBgbGFpyMkR5FitE5pcWKQQkFqQWKTilZualKwSllmWmlqemKCRVKiTmKfj4-NoreOaVpRaXZKYnloBUOHoqhKRWlCi4pJakJpckJmXmZJZUKmTmKQSkAk2B6OdhYE1LzClO5YXS3Azybq4hzh66YFfEFxRl5iYWVcaDXBMPdo0xYRUAvNc-gg</recordid><startdate>20241003</startdate><enddate>20241003</enddate><creator>Yu, Sungduk</creator><creator>Luo, Man</creator><creator>Madasu, Avinash</creator><creator>Lal, Vasudev</creator><creator>Howard, Phillip</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241003</creationdate><title>Is Your Paper Being Reviewed by an LLM? Investigating AI Text Detectability in Peer Review</title><author>Yu, Sungduk ; Luo, Man ; Madasu, Avinash ; Lal, Vasudev ; Howard, Phillip</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2410_030193</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Yu, Sungduk</creatorcontrib><creatorcontrib>Luo, Man</creatorcontrib><creatorcontrib>Madasu, Avinash</creatorcontrib><creatorcontrib>Lal, Vasudev</creatorcontrib><creatorcontrib>Howard, Phillip</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Yu, Sungduk</au><au>Luo, Man</au><au>Madasu, Avinash</au><au>Lal, Vasudev</au><au>Howard, Phillip</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Is Your Paper Being Reviewed by an LLM? Investigating AI Text Detectability in Peer Review</atitle><date>2024-10-03</date><risdate>2024</risdate><abstract>Peer review is a critical process for ensuring the integrity of published scientific research. Confidence in this process is predicated on the assumption that experts in the relevant domain give careful consideration to the merits of manuscripts which are submitted for publication. With the recent rapid advancements in the linguistic capabilities of large language models (LLMs), a new potential risk to the peer review process is that negligent reviewers will rely on LLMs to perform the often time consuming process of reviewing a paper. In this study, we investigate the ability of existing AI text detection algorithms to distinguish between peer reviews written by humans and different state-of-the-art LLMs. Our analysis shows that existing approaches fail to identify many GPT-4o written reviews without also producing a high number of false positive classifications. To address this deficiency, we propose a new detection approach which surpasses existing methods in the identification of GPT-4o written peer reviews at low levels of false positive classifications. Our work reveals the difficulty of accurately identifying AI-generated text at the individual review level, highlighting the urgent need for new tools and methods to detect this type of unethical application of generative AI.</abstract><doi>10.48550/arxiv.2410.03019</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2410.03019
ispartof
issn
language eng
recordid cdi_arxiv_primary_2410_03019
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Computation and Language
title Is Your Paper Being Reviewed by an LLM? Investigating AI Text Detectability in Peer Review
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T08%3A47%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Is%20Your%20Paper%20Being%20Reviewed%20by%20an%20LLM?%20Investigating%20AI%20Text%20Detectability%20in%20Peer%20Review&rft.au=Yu,%20Sungduk&rft.date=2024-10-03&rft_id=info:doi/10.48550/arxiv.2410.03019&rft_dat=%3Carxiv_GOX%3E2410_03019%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true