MATHWELL: Generating Educational Math Word Problems Using Teacher Annotations

Math word problems are critical K-8 educational tools, but writing them is time consuming and requires extensive expertise. To be educational, problems must be solvable, have accurate answers, and, most importantly, be educationally appropriate. We propose that language models have potential to supp...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Christ, Bryan R, Kropko, Jonathan, Hartvigsen, Thomas
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Christ, Bryan R
Kropko, Jonathan
Hartvigsen, Thomas
description Math word problems are critical K-8 educational tools, but writing them is time consuming and requires extensive expertise. To be educational, problems must be solvable, have accurate answers, and, most importantly, be educationally appropriate. We propose that language models have potential to support K-8 math education by automatically generating word problems. However, evaluating educational appropriateness is hard to quantify. We fill this gap by having teachers evaluate problems generated by LLMs, who find existing models and data often fail to be educationally appropriate. We then explore automatically generating educational word problems, ultimately using our expert annotations to finetune a 70B language model. Our model, MATHWELL, is the first K-8 word problem generator targeted at educational appropriateness. Further expert studies find MATHWELL generates problems far more solvable, accurate, and appropriate than public models. MATHWELL also matches GPT-4's problem quality while attaining more appropriate reading levels for K-8 students and avoiding generating harmful questions.
doi_str_mv 10.48550/arxiv.2402.15861
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2402_15861</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2402_15861</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2402_158613</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjEw0jM0tTAz5GTw9XUM8Qh39fGxUnBPzUstSizJzEtXcE0pTQay8vMScxR8E0syFMLzi1IUAoryk3JSc4sVQotBikJSE5MzUosUHPPy8kvAqot5GFjTEnOKU3mhNDeDvJtriLOHLtji-IKizNzEosp4kAPiwQ4wJqwCAJ_AOpc</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>MATHWELL: Generating Educational Math Word Problems Using Teacher Annotations</title><source>arXiv.org</source><creator>Christ, Bryan R ; Kropko, Jonathan ; Hartvigsen, Thomas</creator><creatorcontrib>Christ, Bryan R ; Kropko, Jonathan ; Hartvigsen, Thomas</creatorcontrib><description>Math word problems are critical K-8 educational tools, but writing them is time consuming and requires extensive expertise. To be educational, problems must be solvable, have accurate answers, and, most importantly, be educationally appropriate. We propose that language models have potential to support K-8 math education by automatically generating word problems. However, evaluating educational appropriateness is hard to quantify. We fill this gap by having teachers evaluate problems generated by LLMs, who find existing models and data often fail to be educationally appropriate. We then explore automatically generating educational word problems, ultimately using our expert annotations to finetune a 70B language model. Our model, MATHWELL, is the first K-8 word problem generator targeted at educational appropriateness. Further expert studies find MATHWELL generates problems far more solvable, accurate, and appropriate than public models. MATHWELL also matches GPT-4's problem quality while attaining more appropriate reading levels for K-8 students and avoiding generating harmful questions.</description><identifier>DOI: 10.48550/arxiv.2402.15861</identifier><language>eng</language><subject>Computer Science - Computation and Language</subject><creationdate>2024-02</creationdate><rights>http://creativecommons.org/licenses/by-sa/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2402.15861$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2402.15861$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Christ, Bryan R</creatorcontrib><creatorcontrib>Kropko, Jonathan</creatorcontrib><creatorcontrib>Hartvigsen, Thomas</creatorcontrib><title>MATHWELL: Generating Educational Math Word Problems Using Teacher Annotations</title><description>Math word problems are critical K-8 educational tools, but writing them is time consuming and requires extensive expertise. To be educational, problems must be solvable, have accurate answers, and, most importantly, be educationally appropriate. We propose that language models have potential to support K-8 math education by automatically generating word problems. However, evaluating educational appropriateness is hard to quantify. We fill this gap by having teachers evaluate problems generated by LLMs, who find existing models and data often fail to be educationally appropriate. We then explore automatically generating educational word problems, ultimately using our expert annotations to finetune a 70B language model. Our model, MATHWELL, is the first K-8 word problem generator targeted at educational appropriateness. Further expert studies find MATHWELL generates problems far more solvable, accurate, and appropriate than public models. MATHWELL also matches GPT-4's problem quality while attaining more appropriate reading levels for K-8 students and avoiding generating harmful questions.</description><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjEw0jM0tTAz5GTw9XUM8Qh39fGxUnBPzUstSizJzEtXcE0pTQay8vMScxR8E0syFMLzi1IUAoryk3JSc4sVQotBikJSE5MzUosUHPPy8kvAqot5GFjTEnOKU3mhNDeDvJtriLOHLtji-IKizNzEosp4kAPiwQ4wJqwCAJ_AOpc</recordid><startdate>20240224</startdate><enddate>20240224</enddate><creator>Christ, Bryan R</creator><creator>Kropko, Jonathan</creator><creator>Hartvigsen, Thomas</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240224</creationdate><title>MATHWELL: Generating Educational Math Word Problems Using Teacher Annotations</title><author>Christ, Bryan R ; Kropko, Jonathan ; Hartvigsen, Thomas</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2402_158613</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Christ, Bryan R</creatorcontrib><creatorcontrib>Kropko, Jonathan</creatorcontrib><creatorcontrib>Hartvigsen, Thomas</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Christ, Bryan R</au><au>Kropko, Jonathan</au><au>Hartvigsen, Thomas</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MATHWELL: Generating Educational Math Word Problems Using Teacher Annotations</atitle><date>2024-02-24</date><risdate>2024</risdate><abstract>Math word problems are critical K-8 educational tools, but writing them is time consuming and requires extensive expertise. To be educational, problems must be solvable, have accurate answers, and, most importantly, be educationally appropriate. We propose that language models have potential to support K-8 math education by automatically generating word problems. However, evaluating educational appropriateness is hard to quantify. We fill this gap by having teachers evaluate problems generated by LLMs, who find existing models and data often fail to be educationally appropriate. We then explore automatically generating educational word problems, ultimately using our expert annotations to finetune a 70B language model. Our model, MATHWELL, is the first K-8 word problem generator targeted at educational appropriateness. Further expert studies find MATHWELL generates problems far more solvable, accurate, and appropriate than public models. MATHWELL also matches GPT-4's problem quality while attaining more appropriate reading levels for K-8 students and avoiding generating harmful questions.</abstract><doi>10.48550/arxiv.2402.15861</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2402.15861
ispartof
issn
language eng
recordid cdi_arxiv_primary_2402_15861
source arXiv.org
subjects Computer Science - Computation and Language
title MATHWELL: Generating Educational Math Word Problems Using Teacher Annotations
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T20%3A13%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MATHWELL:%20Generating%20Educational%20Math%20Word%20Problems%20Using%20Teacher%20Annotations&rft.au=Christ,%20Bryan%20R&rft.date=2024-02-24&rft_id=info:doi/10.48550/arxiv.2402.15861&rft_dat=%3Carxiv_GOX%3E2402_15861%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true