Is Your Paper Being Reviewed by an LLM? Investigating AI Text Detectability in Peer Review

Peer review is a critical process for ensuring the integrity of published scientific research. Confidence in this process is predicated on the assumption that experts in the relevant domain give careful consideration to the merits of manuscripts which are submitted for publication. With the recent r...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Yu, Sungduk, Luo, Man, Madasu, Avinash, Lal, Vasudev, Howard, Phillip
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Computation and Language
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Yu, Sungduk Luo, Man Madasu, Avinash Lal, Vasudev Howard, Phillip
description	Peer review is a critical process for ensuring the integrity of published scientific research. Confidence in this process is predicated on the assumption that experts in the relevant domain give careful consideration to the merits of manuscripts which are submitted for publication. With the recent rapid advancements in the linguistic capabilities of large language models (LLMs), a new potential risk to the peer review process is that negligent reviewers will rely on LLMs to perform the often time consuming process of reviewing a paper. In this study, we investigate the ability of existing AI text detection algorithms to distinguish between peer reviews written by humans and different state-of-the-art LLMs. Our analysis shows that existing approaches fail to identify many GPT-4o written reviews without also producing a high number of false positive classifications. To address this deficiency, we propose a new detection approach which surpasses existing methods in the identification of GPT-4o written peer reviews at low levels of false positive classifications. Our work reveals the difficulty of accurately identifying AI-generated text at the individual review level, highlighting the urgent need for new tools and methods to detect this type of unethical application of generative AI.
doi_str_mv	10.48550/arxiv.2410.03019
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2410_03019</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2410_03019</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2410_030193</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgEKGBgbGFpyMkR5FitE5pcWKQQkFqQWKTilZualKwSllmWmlqemKCRVKiTmKfj4-NoreOaVpRaXZKYnloBUOHoqhKRWlCi4pJakJpckJmXmZJZUKmTmKQSkAk2B6OdhYE1LzClO5YXS3Azybq4hzh66YFfEFxRl5iYWVcaDXBMPdo0xYRUAvNc-gg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Is Your Paper Being Reviewed by an LLM? Investigating AI Text Detectability in Peer Review</title><source>arXiv.org</source><creator>Yu, Sungduk ; Luo, Man ; Madasu, Avinash ; Lal, Vasudev ; Howard, Phillip</creator><creatorcontrib>Yu, Sungduk ; Luo, Man ; Madasu, Avinash ; Lal, Vasudev ; Howard, Phillip</creatorcontrib><description>Peer review is a critical process for ensuring the integrity of published scientific research. Confidence in this process is predicated on the assumption that experts in the relevant domain give careful consideration to the merits of manuscripts which are submitted for publication. With the recent rapid advancements in the linguistic capabilities of large language models (LLMs), a new potential risk to the peer review process is that negligent reviewers will rely on LLMs to perform the often time consuming process of reviewing a paper. In this study, we investigate the ability of existing AI text detection algorithms to distinguish between peer reviews written by humans and different state-of-the-art LLMs. Our analysis shows that existing approaches fail to identify many GPT-4o written reviews without also producing a high number of false positive classifications. To address this deficiency, we propose a new detection approach which surpasses existing methods in the identification of GPT-4o written peer reviews at low levels of false positive classifications. Our work reveals the difficulty of accurately identifying AI-generated text at the individual review level, highlighting the urgent need for new tools and methods to detect this type of unethical application of generative AI.</description><identifier>DOI: 10.48550/arxiv.2410.03019</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language</subject><creationdate>2024-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,777,882</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2410.03019$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2410.03019$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Yu, Sungduk</creatorcontrib><creatorcontrib>Luo, Man</creatorcontrib><creatorcontrib>Madasu, Avinash</creatorcontrib><creatorcontrib>Lal, Vasudev</creatorcontrib><creatorcontrib>Howard, Phillip</creatorcontrib><title>Is Your Paper Being Reviewed by an LLM? Investigating AI Text Detectability in Peer Review</title><description>Peer review is a critical process for ensuring the integrity of published scientific research. Confidence in this process is predicated on the assumption that experts in the relevant domain give careful consideration to the merits of manuscripts which are submitted for publication. With the recent rapid advancements in the linguistic capabilities of large language models (LLMs), a new potential risk to the peer review process is that negligent reviewers will rely on LLMs to perform the often time consuming process of reviewing a paper. In this study, we investigate the ability of existing AI text detection algorithms to distinguish between peer reviews written by humans and different state-of-the-art LLMs. Our analysis shows that existing approaches fail to identify many GPT-4o written reviews without also producing a high number of false positive classifications. To address this deficiency, we propose a new detection approach which surpasses existing methods in the identification of GPT-4o written peer reviews at low levels of false positive classifications. Our work reveals the difficulty of accurately identifying AI-generated text at the individual review level, highlighting the urgent need for new tools and methods to detect this type of unethical application of generative AI.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgEKGBgbGFpyMkR5FitE5pcWKQQkFqQWKTilZualKwSllmWmlqemKCRVKiTmKfj4-NoreOaVpRaXZKYnloBUOHoqhKRWlCi4pJakJpckJmXmZJZUKmTmKQSkAk2B6OdhYE1LzClO5YXS3Azybq4hzh66YFfEFxRl5iYWVcaDXBMPdo0xYRUAvNc-gg</recordid><startdate>20241003</startdate><enddate>20241003</enddate><creator>Yu, Sungduk</creator><creator>Luo, Man</creator><creator>Madasu, Avinash</creator><creator>Lal, Vasudev</creator><creator>Howard, Phillip</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241003</creationdate><title>Is Your Paper Being Reviewed by an LLM? Investigating AI Text Detectability in Peer Review</title><author>Yu, Sungduk ; Luo, Man ; Madasu, Avinash ; Lal, Vasudev ; Howard, Phillip</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2410_030193</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Yu, Sungduk</creatorcontrib><creatorcontrib>Luo, Man</creatorcontrib><creatorcontrib>Madasu, Avinash</creatorcontrib><creatorcontrib>Lal, Vasudev</creatorcontrib><creatorcontrib>Howard, Phillip</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Yu, Sungduk</au><au>Luo, Man</au><au>Madasu, Avinash</au><au>Lal, Vasudev</au><au>Howard, Phillip</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Is Your Paper Being Reviewed by an LLM? Investigating AI Text Detectability in Peer Review</atitle><date>2024-10-03</date><risdate>2024</risdate><abstract>Peer review is a critical process for ensuring the integrity of published scientific research. Confidence in this process is predicated on the assumption that experts in the relevant domain give careful consideration to the merits of manuscripts which are submitted for publication. With the recent rapid advancements in the linguistic capabilities of large language models (LLMs), a new potential risk to the peer review process is that negligent reviewers will rely on LLMs to perform the often time consuming process of reviewing a paper. In this study, we investigate the ability of existing AI text detection algorithms to distinguish between peer reviews written by humans and different state-of-the-art LLMs. Our analysis shows that existing approaches fail to identify many GPT-4o written reviews without also producing a high number of false positive classifications. To address this deficiency, we propose a new detection approach which surpasses existing methods in the identification of GPT-4o written peer reviews at low levels of false positive classifications. Our work reveals the difficulty of accurately identifying AI-generated text at the individual review level, highlighting the urgent need for new tools and methods to detect this type of unethical application of generative AI.</abstract><doi>10.48550/arxiv.2410.03019</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2410.03019
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2410_03019
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Computation and Language
title	Is Your Paper Being Reviewed by an LLM? Investigating AI Text Detectability in Peer Review
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T08%3A47%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Is%20Your%20Paper%20Being%20Reviewed%20by%20an%20LLM?%20Investigating%20AI%20Text%20Detectability%20in%20Peer%20Review&rft.au=Yu,%20Sungduk&rft.date=2024-10-03&rft_id=info:doi/10.48550/arxiv.2410.03019&rft_dat=%3Carxiv_GOX%3E2410_03019%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true