The Best of Both Worlds: Lexical Resources To Improve Low-Resource Part-of-Speech Tagging

In natural language processing, the deep learning revolution has shifted the focus from conventional hand-crafted symbolic representations to dense inputs, which are adequate representations learned automatically from corpora. However, particularly when working with low-resource languages, small amo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Plank, Barbara, Klerke, Sigrid, Agic, Zeljko
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Plank, Barbara
Klerke, Sigrid
Agic, Zeljko
description In natural language processing, the deep learning revolution has shifted the focus from conventional hand-crafted symbolic representations to dense inputs, which are adequate representations learned automatically from corpora. However, particularly when working with low-resource languages, small amounts of symbolic lexical resources such as user-generated lexicons are often available even when gold-standard corpora are not. Such additional linguistic information is though often neglected, and recent neural approaches to cross-lingual tagging typically rely only on word and subword embeddings. While these representations are effective, our recent work has shown clear benefits of combining the best of both worlds: integrating conventional lexical information improves neural cross-lingual part-of-speech (PoS) tagging. However, little is known on how complementary such additional information is, and to what extent improvements depend on the coverage and quality of these external resources. This paper seeks to fill this gap by providing the first thorough analysis on the contributions of lexical resources for cross-lingual PoS tagging in neural times.
doi_str_mv 10.48550/arxiv.1811.08757
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1811_08757</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1811_08757</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-cc9953eb0e9015e71bec98b860925ee8cee354555fed95bf17a447244c4d60bd3</originalsourceid><addsrcrecordid>eNo1z81KxDAUhuFsXMjoBbjy3EBqMs1pEnfO4M9AQXEKg6uSpqc_0DElreN49-Koq2_xwgcPY1dSJMogihsXj_0hkUbKRBiN-py9FR3BiqYZQgOrMHewC3Gop1vI6dh7N8ArTeEjepqgCLDZjzEcCPLwyf8DvLg489Dw7UjkOyhc2_bv7QU7a9ww0eXfLljxcF-sn3j-_LhZ3-XcZVpz763FlCpBVkgkLSvy1lQmE3aJRMYTpagQsaHaYtVI7ZTSS6W8qjNR1emCXf_enmzlGPu9i1_lj7E8GdNvlKBMBg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>The Best of Both Worlds: Lexical Resources To Improve Low-Resource Part-of-Speech Tagging</title><source>arXiv.org</source><creator>Plank, Barbara ; Klerke, Sigrid ; Agic, Zeljko</creator><creatorcontrib>Plank, Barbara ; Klerke, Sigrid ; Agic, Zeljko</creatorcontrib><description>In natural language processing, the deep learning revolution has shifted the focus from conventional hand-crafted symbolic representations to dense inputs, which are adequate representations learned automatically from corpora. However, particularly when working with low-resource languages, small amounts of symbolic lexical resources such as user-generated lexicons are often available even when gold-standard corpora are not. Such additional linguistic information is though often neglected, and recent neural approaches to cross-lingual tagging typically rely only on word and subword embeddings. While these representations are effective, our recent work has shown clear benefits of combining the best of both worlds: integrating conventional lexical information improves neural cross-lingual part-of-speech (PoS) tagging. However, little is known on how complementary such additional information is, and to what extent improvements depend on the coverage and quality of these external resources. This paper seeks to fill this gap by providing the first thorough analysis on the contributions of lexical resources for cross-lingual PoS tagging in neural times.</description><identifier>DOI: 10.48550/arxiv.1811.08757</identifier><language>eng</language><subject>Computer Science - Computation and Language</subject><creationdate>2018-11</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1811.08757$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1811.08757$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Plank, Barbara</creatorcontrib><creatorcontrib>Klerke, Sigrid</creatorcontrib><creatorcontrib>Agic, Zeljko</creatorcontrib><title>The Best of Both Worlds: Lexical Resources To Improve Low-Resource Part-of-Speech Tagging</title><description>In natural language processing, the deep learning revolution has shifted the focus from conventional hand-crafted symbolic representations to dense inputs, which are adequate representations learned automatically from corpora. However, particularly when working with low-resource languages, small amounts of symbolic lexical resources such as user-generated lexicons are often available even when gold-standard corpora are not. Such additional linguistic information is though often neglected, and recent neural approaches to cross-lingual tagging typically rely only on word and subword embeddings. While these representations are effective, our recent work has shown clear benefits of combining the best of both worlds: integrating conventional lexical information improves neural cross-lingual part-of-speech (PoS) tagging. However, little is known on how complementary such additional information is, and to what extent improvements depend on the coverage and quality of these external resources. This paper seeks to fill this gap by providing the first thorough analysis on the contributions of lexical resources for cross-lingual PoS tagging in neural times.</description><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNo1z81KxDAUhuFsXMjoBbjy3EBqMs1pEnfO4M9AQXEKg6uSpqc_0DElreN49-Koq2_xwgcPY1dSJMogihsXj_0hkUbKRBiN-py9FR3BiqYZQgOrMHewC3Gop1vI6dh7N8ArTeEjepqgCLDZjzEcCPLwyf8DvLg489Dw7UjkOyhc2_bv7QU7a9ww0eXfLljxcF-sn3j-_LhZ3-XcZVpz763FlCpBVkgkLSvy1lQmE3aJRMYTpagQsaHaYtVI7ZTSS6W8qjNR1emCXf_enmzlGPu9i1_lj7E8GdNvlKBMBg</recordid><startdate>20181121</startdate><enddate>20181121</enddate><creator>Plank, Barbara</creator><creator>Klerke, Sigrid</creator><creator>Agic, Zeljko</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20181121</creationdate><title>The Best of Both Worlds: Lexical Resources To Improve Low-Resource Part-of-Speech Tagging</title><author>Plank, Barbara ; Klerke, Sigrid ; Agic, Zeljko</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-cc9953eb0e9015e71bec98b860925ee8cee354555fed95bf17a447244c4d60bd3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Plank, Barbara</creatorcontrib><creatorcontrib>Klerke, Sigrid</creatorcontrib><creatorcontrib>Agic, Zeljko</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Plank, Barbara</au><au>Klerke, Sigrid</au><au>Agic, Zeljko</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>The Best of Both Worlds: Lexical Resources To Improve Low-Resource Part-of-Speech Tagging</atitle><date>2018-11-21</date><risdate>2018</risdate><abstract>In natural language processing, the deep learning revolution has shifted the focus from conventional hand-crafted symbolic representations to dense inputs, which are adequate representations learned automatically from corpora. However, particularly when working with low-resource languages, small amounts of symbolic lexical resources such as user-generated lexicons are often available even when gold-standard corpora are not. Such additional linguistic information is though often neglected, and recent neural approaches to cross-lingual tagging typically rely only on word and subword embeddings. While these representations are effective, our recent work has shown clear benefits of combining the best of both worlds: integrating conventional lexical information improves neural cross-lingual part-of-speech (PoS) tagging. However, little is known on how complementary such additional information is, and to what extent improvements depend on the coverage and quality of these external resources. This paper seeks to fill this gap by providing the first thorough analysis on the contributions of lexical resources for cross-lingual PoS tagging in neural times.</abstract><doi>10.48550/arxiv.1811.08757</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.1811.08757
ispartof
issn
language eng
recordid cdi_arxiv_primary_1811_08757
source arXiv.org
subjects Computer Science - Computation and Language
title The Best of Both Worlds: Lexical Resources To Improve Low-Resource Part-of-Speech Tagging
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T05%3A52%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=The%20Best%20of%20Both%20Worlds:%20Lexical%20Resources%20To%20Improve%20Low-Resource%20Part-of-Speech%20Tagging&rft.au=Plank,%20Barbara&rft.date=2018-11-21&rft_id=info:doi/10.48550/arxiv.1811.08757&rft_dat=%3Carxiv_GOX%3E1811_08757%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true