MRL Parsing Without Tears: The Case of Hebrew
Syntactic parsing remains a critical tool for relation extraction and information extraction, especially in resource-scarce languages where LLMs are lacking. Yet in morphologically rich languages (MRLs), where parsers need to identify multiple lexical units in each token, existing systems suffer in...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2024-03 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Shaltiel Shmidman Shmidman, Avi Koppel, Moshe Tsarfaty, Reut |
description | Syntactic parsing remains a critical tool for relation extraction and information extraction, especially in resource-scarce languages where LLMs are lacking. Yet in morphologically rich languages (MRLs), where parsers need to identify multiple lexical units in each token, existing systems suffer in latency and setup complexity. Some use a pipeline to peel away the layers: first segmentation, then morphology tagging, and then syntax parsing; however, errors in earlier layers are then propagated forward. Others use a joint architecture to evaluate all permutations at once; while this improves accuracy, it is notoriously slow. In contrast, and taking Hebrew as a test case, we present a new "flipped pipeline": decisions are made directly on the whole-token units by expert classifiers, each one dedicated to one specific task. The classifiers are independent of one another, and only at the end do we synthesize their predictions. This blazingly fast approach sets a new SOTA in Hebrew POS tagging and dependency parsing, while also reaching near-SOTA performance on other Hebrew NLP tasks. Because our architecture does not rely on any language-specific resources, it can serve as a model to develop similar parsers for other MRLs. |
format | Article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2955956010</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2955956010</sourcerecordid><originalsourceid>FETCH-proquest_journals_29559560103</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mTQ9Q3yUQhILCrOzEtXCM8sycgvLVEISQUKWCmEZKQqOCcWpyrkpyl4pCYVpZbzMLCmJeYUp_JCaW4GZTfXEGcP3YKi_MLS1OKS-Kz80qI8oFS8kaWpqaWpmYGhgTFxqgAikzCh</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2955956010</pqid></control><display><type>article</type><title>MRL Parsing Without Tears: The Case of Hebrew</title><source>Open Access: Freely Accessible Journals by multiple vendors</source><creator>Shaltiel Shmidman ; Shmidman, Avi ; Koppel, Moshe ; Tsarfaty, Reut</creator><creatorcontrib>Shaltiel Shmidman ; Shmidman, Avi ; Koppel, Moshe ; Tsarfaty, Reut</creatorcontrib><description>Syntactic parsing remains a critical tool for relation extraction and information extraction, especially in resource-scarce languages where LLMs are lacking. Yet in morphologically rich languages (MRLs), where parsers need to identify multiple lexical units in each token, existing systems suffer in latency and setup complexity. Some use a pipeline to peel away the layers: first segmentation, then morphology tagging, and then syntax parsing; however, errors in earlier layers are then propagated forward. Others use a joint architecture to evaluate all permutations at once; while this improves accuracy, it is notoriously slow. In contrast, and taking Hebrew as a test case, we present a new "flipped pipeline": decisions are made directly on the whole-token units by expert classifiers, each one dedicated to one specific task. The classifiers are independent of one another, and only at the end do we synthesize their predictions. This blazingly fast approach sets a new SOTA in Hebrew POS tagging and dependency parsing, while also reaching near-SOTA performance on other Hebrew NLP tasks. Because our architecture does not rely on any language-specific resources, it can serve as a model to develop similar parsers for other MRLs.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Classifiers ; Information retrieval ; Languages ; Morphology ; Parsers ; Permutations</subject><ispartof>arXiv.org, 2024-03</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Shaltiel Shmidman</creatorcontrib><creatorcontrib>Shmidman, Avi</creatorcontrib><creatorcontrib>Koppel, Moshe</creatorcontrib><creatorcontrib>Tsarfaty, Reut</creatorcontrib><title>MRL Parsing Without Tears: The Case of Hebrew</title><title>arXiv.org</title><description>Syntactic parsing remains a critical tool for relation extraction and information extraction, especially in resource-scarce languages where LLMs are lacking. Yet in morphologically rich languages (MRLs), where parsers need to identify multiple lexical units in each token, existing systems suffer in latency and setup complexity. Some use a pipeline to peel away the layers: first segmentation, then morphology tagging, and then syntax parsing; however, errors in earlier layers are then propagated forward. Others use a joint architecture to evaluate all permutations at once; while this improves accuracy, it is notoriously slow. In contrast, and taking Hebrew as a test case, we present a new "flipped pipeline": decisions are made directly on the whole-token units by expert classifiers, each one dedicated to one specific task. The classifiers are independent of one another, and only at the end do we synthesize their predictions. This blazingly fast approach sets a new SOTA in Hebrew POS tagging and dependency parsing, while also reaching near-SOTA performance on other Hebrew NLP tasks. Because our architecture does not rely on any language-specific resources, it can serve as a model to develop similar parsers for other MRLs.</description><subject>Classifiers</subject><subject>Information retrieval</subject><subject>Languages</subject><subject>Morphology</subject><subject>Parsers</subject><subject>Permutations</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mTQ9Q3yUQhILCrOzEtXCM8sycgvLVEISQUKWCmEZKQqOCcWpyrkpyl4pCYVpZbzMLCmJeYUp_JCaW4GZTfXEGcP3YKi_MLS1OKS-Kz80qI8oFS8kaWpqaWpmYGhgTFxqgAikzCh</recordid><startdate>20240311</startdate><enddate>20240311</enddate><creator>Shaltiel Shmidman</creator><creator>Shmidman, Avi</creator><creator>Koppel, Moshe</creator><creator>Tsarfaty, Reut</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240311</creationdate><title>MRL Parsing Without Tears: The Case of Hebrew</title><author>Shaltiel Shmidman ; Shmidman, Avi ; Koppel, Moshe ; Tsarfaty, Reut</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_29559560103</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Classifiers</topic><topic>Information retrieval</topic><topic>Languages</topic><topic>Morphology</topic><topic>Parsers</topic><topic>Permutations</topic><toplevel>online_resources</toplevel><creatorcontrib>Shaltiel Shmidman</creatorcontrib><creatorcontrib>Shmidman, Avi</creatorcontrib><creatorcontrib>Koppel, Moshe</creatorcontrib><creatorcontrib>Tsarfaty, Reut</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Shaltiel Shmidman</au><au>Shmidman, Avi</au><au>Koppel, Moshe</au><au>Tsarfaty, Reut</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>MRL Parsing Without Tears: The Case of Hebrew</atitle><jtitle>arXiv.org</jtitle><date>2024-03-11</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Syntactic parsing remains a critical tool for relation extraction and information extraction, especially in resource-scarce languages where LLMs are lacking. Yet in morphologically rich languages (MRLs), where parsers need to identify multiple lexical units in each token, existing systems suffer in latency and setup complexity. Some use a pipeline to peel away the layers: first segmentation, then morphology tagging, and then syntax parsing; however, errors in earlier layers are then propagated forward. Others use a joint architecture to evaluate all permutations at once; while this improves accuracy, it is notoriously slow. In contrast, and taking Hebrew as a test case, we present a new "flipped pipeline": decisions are made directly on the whole-token units by expert classifiers, each one dedicated to one specific task. The classifiers are independent of one another, and only at the end do we synthesize their predictions. This blazingly fast approach sets a new SOTA in Hebrew POS tagging and dependency parsing, while also reaching near-SOTA performance on other Hebrew NLP tasks. Because our architecture does not rely on any language-specific resources, it can serve as a model to develop similar parsers for other MRLs.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2024-03 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2955956010 |
source | Open Access: Freely Accessible Journals by multiple vendors |
subjects | Classifiers Information retrieval Languages Morphology Parsers Permutations |
title | MRL Parsing Without Tears: The Case of Hebrew |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T19%3A24%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=MRL%20Parsing%20Without%20Tears:%20The%20Case%20of%20Hebrew&rft.jtitle=arXiv.org&rft.au=Shaltiel%20Shmidman&rft.date=2024-03-11&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2955956010%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2955956010&rft_id=info:pmid/&rfr_iscdi=true |