Measuring and Modifying the Readability of English Texts with GPT-4

The success of Large Language Models (LLMs) in other domains has raised the question of whether LLMs can reliably assess and manipulate the readability of text. We approach this question empirically. First, using a published corpus of 4,724 English text excerpts, we find that readability estimates p...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Trott, Sean, Rivière, Pamela D
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computation and Language
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Trott, Sean Rivière, Pamela D
description	The success of Large Language Models (LLMs) in other domains has raised the question of whether LLMs can reliably assess and manipulate the readability of text. We approach this question empirically. First, using a published corpus of 4,724 English text excerpts, we find that readability estimates produced ``zero-shot'' from GPT-4 Turbo and GPT-4o mini exhibit relatively high correlation with human judgments (r = 0.76 and r = 0.74, respectively), out-performing estimates derived from traditional readability formulas and various psycholinguistic indices. Then, in a pre-registered human experiment (N = 59), we ask whether Turbo can reliably make text easier or harder to read. We find evidence to support this hypothesis, though considerable variance in human judgments remains unexplained. We conclude by discussing the limitations of this approach, including limited scope, as well as the validity of the ``readability'' construct and its dependence on context, audience, and goal.
doi_str_mv	10.48550/arxiv.2410.14028
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2410_14028</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2410_14028</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2410_140283</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgEKGJoYGFlwMjj7piYWlxZl5qUrJOalKPjmp2SmVYJ4JRmpCkGpiSmJSZk5mSWVCvlpCq556TmZxRkKIakVJcUK5ZklGQruASG6JjwMrGmJOcWpvFCam0HezTXE2UMXbF18QVFmbmJRZTzI2niwtcaEVQAAUxA2xw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Measuring and Modifying the Readability of English Texts with GPT-4</title><source>arXiv.org</source><creator>Trott, Sean ; Rivière, Pamela D</creator><creatorcontrib>Trott, Sean ; Rivière, Pamela D</creatorcontrib><description>The success of Large Language Models (LLMs) in other domains has raised the question of whether LLMs can reliably assess and manipulate the readability of text. We approach this question empirically. First, using a published corpus of 4,724 English text excerpts, we find that readability estimates produced ``zero-shot'' from GPT-4 Turbo and GPT-4o mini exhibit relatively high correlation with human judgments (r = 0.76 and r = 0.74, respectively), out-performing estimates derived from traditional readability formulas and various psycholinguistic indices. Then, in a pre-registered human experiment (N = 59), we ask whether Turbo can reliably make text easier or harder to read. We find evidence to support this hypothesis, though considerable variance in human judgments remains unexplained. We conclude by discussing the limitations of this approach, including limited scope, as well as the validity of the ``readability'' construct and its dependence on context, audience, and goal.</description><identifier>DOI: 10.48550/arxiv.2410.14028</identifier><language>eng</language><subject>Computer Science - Computation and Language</subject><creationdate>2024-10</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2410.14028$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2410.14028$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Trott, Sean</creatorcontrib><creatorcontrib>Rivière, Pamela D</creatorcontrib><title>Measuring and Modifying the Readability of English Texts with GPT-4</title><description>The success of Large Language Models (LLMs) in other domains has raised the question of whether LLMs can reliably assess and manipulate the readability of text. We approach this question empirically. First, using a published corpus of 4,724 English text excerpts, we find that readability estimates produced ``zero-shot'' from GPT-4 Turbo and GPT-4o mini exhibit relatively high correlation with human judgments (r = 0.76 and r = 0.74, respectively), out-performing estimates derived from traditional readability formulas and various psycholinguistic indices. Then, in a pre-registered human experiment (N = 59), we ask whether Turbo can reliably make text easier or harder to read. We find evidence to support this hypothesis, though considerable variance in human judgments remains unexplained. We conclude by discussing the limitations of this approach, including limited scope, as well as the validity of the ``readability'' construct and its dependence on context, audience, and goal.</description><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgEKGJoYGFlwMjj7piYWlxZl5qUrJOalKPjmp2SmVYJ4JRmpCkGpiSmJSZk5mSWVCvlpCq556TmZxRkKIakVJcUK5ZklGQruASG6JjwMrGmJOcWpvFCam0HezTXE2UMXbF18QVFmbmJRZTzI2niwtcaEVQAAUxA2xw</recordid><startdate>20241017</startdate><enddate>20241017</enddate><creator>Trott, Sean</creator><creator>Rivière, Pamela D</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241017</creationdate><title>Measuring and Modifying the Readability of English Texts with GPT-4</title><author>Trott, Sean ; Rivière, Pamela D</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2410_140283</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Trott, Sean</creatorcontrib><creatorcontrib>Rivière, Pamela D</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Trott, Sean</au><au>Rivière, Pamela D</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Measuring and Modifying the Readability of English Texts with GPT-4</atitle><date>2024-10-17</date><risdate>2024</risdate><abstract>The success of Large Language Models (LLMs) in other domains has raised the question of whether LLMs can reliably assess and manipulate the readability of text. We approach this question empirically. First, using a published corpus of 4,724 English text excerpts, we find that readability estimates produced ``zero-shot'' from GPT-4 Turbo and GPT-4o mini exhibit relatively high correlation with human judgments (r = 0.76 and r = 0.74, respectively), out-performing estimates derived from traditional readability formulas and various psycholinguistic indices. Then, in a pre-registered human experiment (N = 59), we ask whether Turbo can reliably make text easier or harder to read. We find evidence to support this hypothesis, though considerable variance in human judgments remains unexplained. We conclude by discussing the limitations of this approach, including limited scope, as well as the validity of the ``readability'' construct and its dependence on context, audience, and goal.</abstract><doi>10.48550/arxiv.2410.14028</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2410.14028
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2410_14028
source	arXiv.org
subjects	Computer Science - Computation and Language
title	Measuring and Modifying the Readability of English Texts with GPT-4
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-13T23%3A05%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Measuring%20and%20Modifying%20the%20Readability%20of%20English%20Texts%20with%20GPT-4&rft.au=Trott,%20Sean&rft.date=2024-10-17&rft_id=info:doi/10.48550/arxiv.2410.14028&rft_dat=%3Carxiv_GOX%3E2410_14028%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true