Recovering Word Forms by Context for Morphologically Rich Languages

In this work, we focus on “sentence-level unlemmatization,” the task of generating a grammatical sentence given a lemmatized one; this task is usually easy to do for humans but may present problems for machine learning models. We treat this setting as a machine translation problem and, as a first tr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of mathematical sciences (New York, N.Y.) N.Y.), 2023-07, Vol.273 (4), p.527-532
Hauptverfasser: Alekseev, A. M., Nikolenko, S. I.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 532
container_issue 4
container_start_page 527
container_title Journal of mathematical sciences (New York, N.Y.)
container_volume 273
creator Alekseev, A. M.
Nikolenko, S. I.
description In this work, we focus on “sentence-level unlemmatization,” the task of generating a grammatical sentence given a lemmatized one; this task is usually easy to do for humans but may present problems for machine learning models. We treat this setting as a machine translation problem and, as a first try, apply a sequence-to-sequence model to the texts of Russian Wikipedia articles, evaluate the effect of the different training sets sizes quantitatively and achieve the BLUE score of 67, 3 using the largest training set available. We discuss preliminary results and flaws of traditional machine translation evaluation methods for this task and suggest directions for future research.
doi_str_mv 10.1007/s10958-023-06518-7
format Article
fullrecord <record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_journals_2833678177</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A757804657</galeid><sourcerecordid>A757804657</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3687-27c9998b42368b150cc1dd74a0c5287064d779d461512bda786863dfc2f60cd23</originalsourceid><addsrcrecordid>eNp9kU1LAzEQhhdRsFb_gKcFTx5S87HJZI-l-FGoCFXxGNIku92y3dSkFfvv3bpCKRSZQybheSYwb5JcEzwgGMNdJDjnEmHKEBacSAQnSY9wYEhCzk_bHgNFjEF2nlzEuMCtJCTrJaOpM_7Lhaop0w8fbPrgwzKms2068s3afa_Twof02YfV3Ne-rIyu6206rcw8neim3OjSxcvkrNB1dFd_Zz95f7h_Gz2hycvjeDScIMOEBETB5HkuZxltrzPCsTHEWsg0NpxKwCKzALnNBOGEzqwGKaRgtjC0ENhYyvrJTTd3FfznxsW1WvhNaNovFZWMCZAEYE-Vunaqagq_Dtosq2jUEDhInAm-o9ARqnSNC7r2jSuq9vmAHxzh27JuWZmjwu2BYLp1lnoToxq_Tg9Z2rEm-BiDK9QqVEsdtopgtctXdfmqNl_1m6_aSayT4mqXngv7bfxj_QAjK6Pd</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2833678177</pqid></control><display><type>article</type><title>Recovering Word Forms by Context for Morphologically Rich Languages</title><source>SpringerLink Journals - AutoHoldings</source><creator>Alekseev, A. M. ; Nikolenko, S. I.</creator><creatorcontrib>Alekseev, A. M. ; Nikolenko, S. I.</creatorcontrib><description>In this work, we focus on “sentence-level unlemmatization,” the task of generating a grammatical sentence given a lemmatized one; this task is usually easy to do for humans but may present problems for machine learning models. We treat this setting as a machine translation problem and, as a first try, apply a sequence-to-sequence model to the texts of Russian Wikipedia articles, evaluate the effect of the different training sets sizes quantitatively and achieve the BLUE score of 67, 3 using the largest training set available. We discuss preliminary results and flaws of traditional machine translation evaluation methods for this task and suggest directions for future research.</description><identifier>ISSN: 1072-3374</identifier><identifier>EISSN: 1573-8795</identifier><identifier>DOI: 10.1007/s10958-023-06518-7</identifier><language>eng</language><publisher>Cham: Springer International Publishing</publisher><subject>Computational linguistics ; Language processing ; Machine learning ; Machine translation ; Mathematics ; Mathematics and Statistics ; Natural language interfaces ; Training</subject><ispartof>Journal of mathematical sciences (New York, N.Y.), 2023-07, Vol.273 (4), p.527-532</ispartof><rights>Springer Nature Switzerland AG 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><rights>COPYRIGHT 2023 Springer</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c3687-27c9998b42368b150cc1dd74a0c5287064d779d461512bda786863dfc2f60cd23</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10958-023-06518-7$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10958-023-06518-7$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Alekseev, A. M.</creatorcontrib><creatorcontrib>Nikolenko, S. I.</creatorcontrib><title>Recovering Word Forms by Context for Morphologically Rich Languages</title><title>Journal of mathematical sciences (New York, N.Y.)</title><addtitle>J Math Sci</addtitle><description>In this work, we focus on “sentence-level unlemmatization,” the task of generating a grammatical sentence given a lemmatized one; this task is usually easy to do for humans but may present problems for machine learning models. We treat this setting as a machine translation problem and, as a first try, apply a sequence-to-sequence model to the texts of Russian Wikipedia articles, evaluate the effect of the different training sets sizes quantitatively and achieve the BLUE score of 67, 3 using the largest training set available. We discuss preliminary results and flaws of traditional machine translation evaluation methods for this task and suggest directions for future research.</description><subject>Computational linguistics</subject><subject>Language processing</subject><subject>Machine learning</subject><subject>Machine translation</subject><subject>Mathematics</subject><subject>Mathematics and Statistics</subject><subject>Natural language interfaces</subject><subject>Training</subject><issn>1072-3374</issn><issn>1573-8795</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNp9kU1LAzEQhhdRsFb_gKcFTx5S87HJZI-l-FGoCFXxGNIku92y3dSkFfvv3bpCKRSZQybheSYwb5JcEzwgGMNdJDjnEmHKEBacSAQnSY9wYEhCzk_bHgNFjEF2nlzEuMCtJCTrJaOpM_7Lhaop0w8fbPrgwzKms2068s3afa_Twof02YfV3Ne-rIyu6206rcw8neim3OjSxcvkrNB1dFd_Zz95f7h_Gz2hycvjeDScIMOEBETB5HkuZxltrzPCsTHEWsg0NpxKwCKzALnNBOGEzqwGKaRgtjC0ENhYyvrJTTd3FfznxsW1WvhNaNovFZWMCZAEYE-Vunaqagq_Dtosq2jUEDhInAm-o9ARqnSNC7r2jSuq9vmAHxzh27JuWZmjwu2BYLp1lnoToxq_Tg9Z2rEm-BiDK9QqVEsdtopgtctXdfmqNl_1m6_aSayT4mqXngv7bfxj_QAjK6Pd</recordid><startdate>20230701</startdate><enddate>20230701</enddate><creator>Alekseev, A. M.</creator><creator>Nikolenko, S. I.</creator><general>Springer International Publishing</general><general>Springer</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope></search><sort><creationdate>20230701</creationdate><title>Recovering Word Forms by Context for Morphologically Rich Languages</title><author>Alekseev, A. M. ; Nikolenko, S. I.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3687-27c9998b42368b150cc1dd74a0c5287064d779d461512bda786863dfc2f60cd23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computational linguistics</topic><topic>Language processing</topic><topic>Machine learning</topic><topic>Machine translation</topic><topic>Mathematics</topic><topic>Mathematics and Statistics</topic><topic>Natural language interfaces</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Alekseev, A. M.</creatorcontrib><creatorcontrib>Nikolenko, S. I.</creatorcontrib><collection>CrossRef</collection><collection>Gale In Context: Science</collection><jtitle>Journal of mathematical sciences (New York, N.Y.)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Alekseev, A. M.</au><au>Nikolenko, S. I.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Recovering Word Forms by Context for Morphologically Rich Languages</atitle><jtitle>Journal of mathematical sciences (New York, N.Y.)</jtitle><stitle>J Math Sci</stitle><date>2023-07-01</date><risdate>2023</risdate><volume>273</volume><issue>4</issue><spage>527</spage><epage>532</epage><pages>527-532</pages><issn>1072-3374</issn><eissn>1573-8795</eissn><abstract>In this work, we focus on “sentence-level unlemmatization,” the task of generating a grammatical sentence given a lemmatized one; this task is usually easy to do for humans but may present problems for machine learning models. We treat this setting as a machine translation problem and, as a first try, apply a sequence-to-sequence model to the texts of Russian Wikipedia articles, evaluate the effect of the different training sets sizes quantitatively and achieve the BLUE score of 67, 3 using the largest training set available. We discuss preliminary results and flaws of traditional machine translation evaluation methods for this task and suggest directions for future research.</abstract><cop>Cham</cop><pub>Springer International Publishing</pub><doi>10.1007/s10958-023-06518-7</doi><tpages>6</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1072-3374
ispartof Journal of mathematical sciences (New York, N.Y.), 2023-07, Vol.273 (4), p.527-532
issn 1072-3374
1573-8795
language eng
recordid cdi_proquest_journals_2833678177
source SpringerLink Journals - AutoHoldings
subjects Computational linguistics
Language processing
Machine learning
Machine translation
Mathematics
Mathematics and Statistics
Natural language interfaces
Training
title Recovering Word Forms by Context for Morphologically Rich Languages
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T12%3A56%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Recovering%20Word%20Forms%20by%20Context%20for%20Morphologically%20Rich%20Languages&rft.jtitle=Journal%20of%20mathematical%20sciences%20(New%20York,%20N.Y.)&rft.au=Alekseev,%20A.%20M.&rft.date=2023-07-01&rft.volume=273&rft.issue=4&rft.spage=527&rft.epage=532&rft.pages=527-532&rft.issn=1072-3374&rft.eissn=1573-8795&rft_id=info:doi/10.1007/s10958-023-06518-7&rft_dat=%3Cgale_proqu%3EA757804657%3C/gale_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2833678177&rft_id=info:pmid/&rft_galeid=A757804657&rfr_iscdi=true