Imitate Before Detect: Aligning Machine Stylistic Preference for Machine-Revised Text Detection

Large Language Models (LLMs) have revolutionized text generation, making detecting machine-generated text increasingly challenging. Although past methods have achieved good performance on detecting pure machine-generated text, those detectors have poor performance on distinguishing machine-revised t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Chen, Jiaqi, Zhu, Xiaoye, Liu, Tianyang, Chen, Ying, Chen, Xinhui, Yuan, Yiwen, Leong, Chak Tou, Li, Zuchao, Long, Tang, Zhang, Lei, Yan, Chenyu, Mei, Guanghao, Zhang, Jie, Zhang, Lefei
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Cryptography and Security
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Chen, Jiaqi Zhu, Xiaoye Liu, Tianyang Chen, Ying Chen, Xinhui Yuan, Yiwen Leong, Chak Tou Li, Zuchao Long, Tang Zhang, Lei Yan, Chenyu Mei, Guanghao Zhang, Jie Zhang, Lefei
description	Large Language Models (LLMs) have revolutionized text generation, making detecting machine-generated text increasingly challenging. Although past methods have achieved good performance on detecting pure machine-generated text, those detectors have poor performance on distinguishing machine-revised text (rewriting, expansion, and polishing), which can have only minor changes from its original human prompt. As the content of text may originate from human prompts, detecting machine-revised text often involves identifying distinctive machine styles, e.g., worded favored by LLMs. However, existing methods struggle to detect machine-style phrasing hidden within the content contributed by humans. We propose the "Imitate Before Detect" (ImBD) approach, which first imitates the machine-style token distribution, and then compares the distribution of the text to be tested with the machine-style distribution to determine whether the text has been machine-revised. To this end, we introduce style preference optimization (SPO), which aligns a scoring LLM model to the preference of text styles generated by machines. The aligned scoring model is then used to calculate the style-conditional probability curvature (Style-CPC), quantifying the log probability difference between the original and conditionally sampled texts for effective detection. We conduct extensive comparisons across various scenarios, encompassing text revisions by six LLMs, four distinct text domains, and three machine revision types. Compared to existing state-of-the-art methods, our method yields a 13% increase in AUC for detecting text revised by open-source LLMs, and improves performance by 5% and 19% for detecting GPT-3.5 and GPT-4o revised text, respectively. Notably, our method surpasses the commercially trained GPT-Zero with just $1,000$ samples and five minutes of SPO, demonstrating its efficiency and effectiveness.
doi_str_mv	10.48550/arxiv.2412.10432
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2412_10432</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2412_10432</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2412_104323</originalsourceid><addsrcrecordid>eNqFjrsKwkAQRbexEPUDrJwfSMwTxM4nWgii6ZdlncSBzUY2Q0j-Xg2xtrrNuYcjxDwM_GSVpsFSuZYaP0rCyA-DJI7GQp5LYsUIW8wrh7BHRs1r2BgqLNkCLko_ySLcuTNUM2m4OszRodUIn8sP8G7YUI0PyLDlQUOVnYpRrkyNs2EnYnE8ZLuT16fIl6NSuU5-k2SfFP8n3tKFQaM</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Imitate Before Detect: Aligning Machine Stylistic Preference for Machine-Revised Text Detection</title><source>arXiv.org</source><creator>Chen, Jiaqi ; Zhu, Xiaoye ; Liu, Tianyang ; Chen, Ying ; Chen, Xinhui ; Yuan, Yiwen ; Leong, Chak Tou ; Li, Zuchao ; Long, Tang ; Zhang, Lei ; Yan, Chenyu ; Mei, Guanghao ; Zhang, Jie ; Zhang, Lefei</creator><creatorcontrib>Chen, Jiaqi ; Zhu, Xiaoye ; Liu, Tianyang ; Chen, Ying ; Chen, Xinhui ; Yuan, Yiwen ; Leong, Chak Tou ; Li, Zuchao ; Long, Tang ; Zhang, Lei ; Yan, Chenyu ; Mei, Guanghao ; Zhang, Jie ; Zhang, Lefei</creatorcontrib><description>Large Language Models (LLMs) have revolutionized text generation, making detecting machine-generated text increasingly challenging. Although past methods have achieved good performance on detecting pure machine-generated text, those detectors have poor performance on distinguishing machine-revised text (rewriting, expansion, and polishing), which can have only minor changes from its original human prompt. As the content of text may originate from human prompts, detecting machine-revised text often involves identifying distinctive machine styles, e.g., worded favored by LLMs. However, existing methods struggle to detect machine-style phrasing hidden within the content contributed by humans. We propose the "Imitate Before Detect" (ImBD) approach, which first imitates the machine-style token distribution, and then compares the distribution of the text to be tested with the machine-style distribution to determine whether the text has been machine-revised. To this end, we introduce style preference optimization (SPO), which aligns a scoring LLM model to the preference of text styles generated by machines. The aligned scoring model is then used to calculate the style-conditional probability curvature (Style-CPC), quantifying the log probability difference between the original and conditionally sampled texts for effective detection. We conduct extensive comparisons across various scenarios, encompassing text revisions by six LLMs, four distinct text domains, and three machine revision types. Compared to existing state-of-the-art methods, our method yields a 13% increase in AUC for detecting text revised by open-source LLMs, and improves performance by 5% and 19% for detecting GPT-3.5 and GPT-4o revised text, respectively. Notably, our method surpasses the commercially trained GPT-Zero with just $1,000$ samples and five minutes of SPO, demonstrating its efficiency and effectiveness.</description><identifier>DOI: 10.48550/arxiv.2412.10432</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Cryptography and Security</subject><creationdate>2024-12</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2412.10432$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2412.10432$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Chen, Jiaqi</creatorcontrib><creatorcontrib>Zhu, Xiaoye</creatorcontrib><creatorcontrib>Liu, Tianyang</creatorcontrib><creatorcontrib>Chen, Ying</creatorcontrib><creatorcontrib>Chen, Xinhui</creatorcontrib><creatorcontrib>Yuan, Yiwen</creatorcontrib><creatorcontrib>Leong, Chak Tou</creatorcontrib><creatorcontrib>Li, Zuchao</creatorcontrib><creatorcontrib>Long, Tang</creatorcontrib><creatorcontrib>Zhang, Lei</creatorcontrib><creatorcontrib>Yan, Chenyu</creatorcontrib><creatorcontrib>Mei, Guanghao</creatorcontrib><creatorcontrib>Zhang, Jie</creatorcontrib><creatorcontrib>Zhang, Lefei</creatorcontrib><title>Imitate Before Detect: Aligning Machine Stylistic Preference for Machine-Revised Text Detection</title><description>Large Language Models (LLMs) have revolutionized text generation, making detecting machine-generated text increasingly challenging. Although past methods have achieved good performance on detecting pure machine-generated text, those detectors have poor performance on distinguishing machine-revised text (rewriting, expansion, and polishing), which can have only minor changes from its original human prompt. As the content of text may originate from human prompts, detecting machine-revised text often involves identifying distinctive machine styles, e.g., worded favored by LLMs. However, existing methods struggle to detect machine-style phrasing hidden within the content contributed by humans. We propose the "Imitate Before Detect" (ImBD) approach, which first imitates the machine-style token distribution, and then compares the distribution of the text to be tested with the machine-style distribution to determine whether the text has been machine-revised. To this end, we introduce style preference optimization (SPO), which aligns a scoring LLM model to the preference of text styles generated by machines. The aligned scoring model is then used to calculate the style-conditional probability curvature (Style-CPC), quantifying the log probability difference between the original and conditionally sampled texts for effective detection. We conduct extensive comparisons across various scenarios, encompassing text revisions by six LLMs, four distinct text domains, and three machine revision types. Compared to existing state-of-the-art methods, our method yields a 13% increase in AUC for detecting text revised by open-source LLMs, and improves performance by 5% and 19% for detecting GPT-3.5 and GPT-4o revised text, respectively. Notably, our method surpasses the commercially trained GPT-Zero with just $1,000$ samples and five minutes of SPO, demonstrating its efficiency and effectiveness.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Cryptography and Security</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFjrsKwkAQRbexEPUDrJwfSMwTxM4nWgii6ZdlncSBzUY2Q0j-Xg2xtrrNuYcjxDwM_GSVpsFSuZYaP0rCyA-DJI7GQp5LYsUIW8wrh7BHRs1r2BgqLNkCLko_ySLcuTNUM2m4OszRodUIn8sP8G7YUI0PyLDlQUOVnYpRrkyNs2EnYnE8ZLuT16fIl6NSuU5-k2SfFP8n3tKFQaM</recordid><startdate>20241210</startdate><enddate>20241210</enddate><creator>Chen, Jiaqi</creator><creator>Zhu, Xiaoye</creator><creator>Liu, Tianyang</creator><creator>Chen, Ying</creator><creator>Chen, Xinhui</creator><creator>Yuan, Yiwen</creator><creator>Leong, Chak Tou</creator><creator>Li, Zuchao</creator><creator>Long, Tang</creator><creator>Zhang, Lei</creator><creator>Yan, Chenyu</creator><creator>Mei, Guanghao</creator><creator>Zhang, Jie</creator><creator>Zhang, Lefei</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241210</creationdate><title>Imitate Before Detect: Aligning Machine Stylistic Preference for Machine-Revised Text Detection</title><author>Chen, Jiaqi ; Zhu, Xiaoye ; Liu, Tianyang ; Chen, Ying ; Chen, Xinhui ; Yuan, Yiwen ; Leong, Chak Tou ; Li, Zuchao ; Long, Tang ; Zhang, Lei ; Yan, Chenyu ; Mei, Guanghao ; Zhang, Jie ; Zhang, Lefei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2412_104323</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Cryptography and Security</topic><toplevel>online_resources</toplevel><creatorcontrib>Chen, Jiaqi</creatorcontrib><creatorcontrib>Zhu, Xiaoye</creatorcontrib><creatorcontrib>Liu, Tianyang</creatorcontrib><creatorcontrib>Chen, Ying</creatorcontrib><creatorcontrib>Chen, Xinhui</creatorcontrib><creatorcontrib>Yuan, Yiwen</creatorcontrib><creatorcontrib>Leong, Chak Tou</creatorcontrib><creatorcontrib>Li, Zuchao</creatorcontrib><creatorcontrib>Long, Tang</creatorcontrib><creatorcontrib>Zhang, Lei</creatorcontrib><creatorcontrib>Yan, Chenyu</creatorcontrib><creatorcontrib>Mei, Guanghao</creatorcontrib><creatorcontrib>Zhang, Jie</creatorcontrib><creatorcontrib>Zhang, Lefei</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chen, Jiaqi</au><au>Zhu, Xiaoye</au><au>Liu, Tianyang</au><au>Chen, Ying</au><au>Chen, Xinhui</au><au>Yuan, Yiwen</au><au>Leong, Chak Tou</au><au>Li, Zuchao</au><au>Long, Tang</au><au>Zhang, Lei</au><au>Yan, Chenyu</au><au>Mei, Guanghao</au><au>Zhang, Jie</au><au>Zhang, Lefei</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Imitate Before Detect: Aligning Machine Stylistic Preference for Machine-Revised Text Detection</atitle><date>2024-12-10</date><risdate>2024</risdate><abstract>Large Language Models (LLMs) have revolutionized text generation, making detecting machine-generated text increasingly challenging. Although past methods have achieved good performance on detecting pure machine-generated text, those detectors have poor performance on distinguishing machine-revised text (rewriting, expansion, and polishing), which can have only minor changes from its original human prompt. As the content of text may originate from human prompts, detecting machine-revised text often involves identifying distinctive machine styles, e.g., worded favored by LLMs. However, existing methods struggle to detect machine-style phrasing hidden within the content contributed by humans. We propose the "Imitate Before Detect" (ImBD) approach, which first imitates the machine-style token distribution, and then compares the distribution of the text to be tested with the machine-style distribution to determine whether the text has been machine-revised. To this end, we introduce style preference optimization (SPO), which aligns a scoring LLM model to the preference of text styles generated by machines. The aligned scoring model is then used to calculate the style-conditional probability curvature (Style-CPC), quantifying the log probability difference between the original and conditionally sampled texts for effective detection. We conduct extensive comparisons across various scenarios, encompassing text revisions by six LLMs, four distinct text domains, and three machine revision types. Compared to existing state-of-the-art methods, our method yields a 13% increase in AUC for detecting text revised by open-source LLMs, and improves performance by 5% and 19% for detecting GPT-3.5 and GPT-4o revised text, respectively. Notably, our method surpasses the commercially trained GPT-Zero with just $1,000$ samples and five minutes of SPO, demonstrating its efficiency and effectiveness.</abstract><doi>10.48550/arxiv.2412.10432</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2412.10432
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2412_10432
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Cryptography and Security
title	Imitate Before Detect: Aligning Machine Stylistic Preference for Machine-Revised Text Detection
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T01%3A30%3A15IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Imitate%20Before%20Detect:%20Aligning%20Machine%20Stylistic%20Preference%20for%20Machine-Revised%20Text%20Detection&rft.au=Chen,%20Jiaqi&rft.date=2024-12-10&rft_id=info:doi/10.48550/arxiv.2412.10432&rft_dat=%3Carxiv_GOX%3E2412_10432%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true