A universal multilingual weightless neural network tagger via quantitative linguistics

In the last decade, given the availability of corpora in several distinct languages, research on multilingual part-of-speech tagging started to grow. Amongst the novelties there is mWANN-Tagger (multilingual weightless artificial neural network tagger), a weightless neural part-of-speech tagger capa...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Neural networks 2017-07, Vol.91, p.85-101
Hauptverfasser: Carneiro, Hugo C.C., Pedreira, Carlos E., França, Felipe M.G., Lima, Priscila M.V.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 101
container_issue
container_start_page 85
container_title Neural networks
container_volume 91
creator Carneiro, Hugo C.C.
Pedreira, Carlos E.
França, Felipe M.G.
Lima, Priscila M.V.
description In the last decade, given the availability of corpora in several distinct languages, research on multilingual part-of-speech tagging started to grow. Amongst the novelties there is mWANN-Tagger (multilingual weightless artificial neural network tagger), a weightless neural part-of-speech tagger capable of being used for mostly-suffix-oriented languages. The tagger was subjected to corpora in eight languages of quite distinct natures and had a remarkable accuracy with very low sample deviation in every one of them, indicating the robustness of weightless neural systems for part-of-speech tagging tasks. However, mWANN-Tagger needed to be tuned for every new corpus, since each one required a different parameter configuration. For mWANN-Tagger to be truly multilingual, it should be usable for any new language with no need of parameter tuning. This article proposes a study that aims to find a relation between the lexical diversity of a language and the parameter configuration that would produce the best performing mWANN-Tagger instance. Preliminary analyses suggested that a single parameter configuration may be applied to the eight aforementioned languages. The mWANN-Tagger instance produced by this configuration was as accurate as the language-dependent ones obtained through tuning. Afterwards, the weightless neural tagger was further subjected to new corpora in languages that range from very isolating to polysynthetic ones. The best performing instances of mWANN-Tagger are again the ones produced by the universal parameter configuration. Hence, mWANN-Tagger can be applied to new corpora with no need of parameter tuning, making it a universal multilingual part-of-speech tagger. Further experiments with Universal Dependencies treebanks reveal that mWANN-Tagger may be extended and that it has potential to outperform most state-of-the-art part-of-speech taggers if better word representations are provided.
doi_str_mv 10.1016/j.neunet.2017.04.011
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1899118871</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0893608017300916</els_id><sourcerecordid>1899118871</sourcerecordid><originalsourceid>FETCH-LOGICAL-c362t-b59ce9e2c1010b0d21fd36e09b185a86c8654719ebb8cded7ee2fcb4cc0bc1153</originalsourceid><addsrcrecordid>eNp9kE1PwzAMhiMEYmPwDxDqkUuL3c_0gjRNfElIXIBrlKbeyOjaLUk38e_J6ODIybL9vn7lh7FLhAgB85tl1FLfkotiwCKCNALEIzZGXpRhXPD4mI2Bl0mYA4cRO7N2CQA5T5NTNop5Bn6Zjdn7NOhbvSVjZROs-sbpRreL3jc70osP15C1gQ8yfuLDdp35DJxcLMgEWy2DTS9bp510_kTw49TWaWXP2clcNpYuDnXC3u7vXmeP4fPLw9Ns-hyqJI9dWGWlopJi5T-CCuoY53WSE5QV8kzyXPE8Swssqaq4qqkuiOK5qlKloFKIWTJh18Pdtek2PVknVtoqahrZUtdbgbwsETkv0EvTQapMZ62huVgbvZLmSyCIPVGxFANRsScqIBWeqLddHRL6akX1n-kXoRfcDgLyf241GWGVplZRrQ0pJ-pO_5_wDd34jA4</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1899118871</pqid></control><display><type>article</type><title>A universal multilingual weightless neural network tagger via quantitative linguistics</title><source>MEDLINE</source><source>ScienceDirect Journals (5 years ago - present)</source><creator>Carneiro, Hugo C.C. ; Pedreira, Carlos E. ; França, Felipe M.G. ; Lima, Priscila M.V.</creator><creatorcontrib>Carneiro, Hugo C.C. ; Pedreira, Carlos E. ; França, Felipe M.G. ; Lima, Priscila M.V.</creatorcontrib><description>In the last decade, given the availability of corpora in several distinct languages, research on multilingual part-of-speech tagging started to grow. Amongst the novelties there is mWANN-Tagger (multilingual weightless artificial neural network tagger), a weightless neural part-of-speech tagger capable of being used for mostly-suffix-oriented languages. The tagger was subjected to corpora in eight languages of quite distinct natures and had a remarkable accuracy with very low sample deviation in every one of them, indicating the robustness of weightless neural systems for part-of-speech tagging tasks. However, mWANN-Tagger needed to be tuned for every new corpus, since each one required a different parameter configuration. For mWANN-Tagger to be truly multilingual, it should be usable for any new language with no need of parameter tuning. This article proposes a study that aims to find a relation between the lexical diversity of a language and the parameter configuration that would produce the best performing mWANN-Tagger instance. Preliminary analyses suggested that a single parameter configuration may be applied to the eight aforementioned languages. The mWANN-Tagger instance produced by this configuration was as accurate as the language-dependent ones obtained through tuning. Afterwards, the weightless neural tagger was further subjected to new corpora in languages that range from very isolating to polysynthetic ones. The best performing instances of mWANN-Tagger are again the ones produced by the universal parameter configuration. Hence, mWANN-Tagger can be applied to new corpora with no need of parameter tuning, making it a universal multilingual part-of-speech tagger. Further experiments with Universal Dependencies treebanks reveal that mWANN-Tagger may be extended and that it has potential to outperform most state-of-the-art part-of-speech taggers if better word representations are provided.</description><identifier>ISSN: 0893-6080</identifier><identifier>EISSN: 1879-2782</identifier><identifier>DOI: 10.1016/j.neunet.2017.04.011</identifier><identifier>PMID: 28500895</identifier><language>eng</language><publisher>United States: Elsevier Ltd</publisher><subject>Lexical diversity ; Linguistics - methods ; Natural Language Processing ; Neural Networks (Computer) ; Part-of-speech tagging ; Weightless neural networks ; Zipf’s law</subject><ispartof>Neural networks, 2017-07, Vol.91, p.85-101</ispartof><rights>2017 Elsevier Ltd</rights><rights>Copyright © 2017 Elsevier Ltd. All rights reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c362t-b59ce9e2c1010b0d21fd36e09b185a86c8654719ebb8cded7ee2fcb4cc0bc1153</citedby><cites>FETCH-LOGICAL-c362t-b59ce9e2c1010b0d21fd36e09b185a86c8654719ebb8cded7ee2fcb4cc0bc1153</cites><orcidid>0000-0001-5094-5908</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.neunet.2017.04.011$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/28500895$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Carneiro, Hugo C.C.</creatorcontrib><creatorcontrib>Pedreira, Carlos E.</creatorcontrib><creatorcontrib>França, Felipe M.G.</creatorcontrib><creatorcontrib>Lima, Priscila M.V.</creatorcontrib><title>A universal multilingual weightless neural network tagger via quantitative linguistics</title><title>Neural networks</title><addtitle>Neural Netw</addtitle><description>In the last decade, given the availability of corpora in several distinct languages, research on multilingual part-of-speech tagging started to grow. Amongst the novelties there is mWANN-Tagger (multilingual weightless artificial neural network tagger), a weightless neural part-of-speech tagger capable of being used for mostly-suffix-oriented languages. The tagger was subjected to corpora in eight languages of quite distinct natures and had a remarkable accuracy with very low sample deviation in every one of them, indicating the robustness of weightless neural systems for part-of-speech tagging tasks. However, mWANN-Tagger needed to be tuned for every new corpus, since each one required a different parameter configuration. For mWANN-Tagger to be truly multilingual, it should be usable for any new language with no need of parameter tuning. This article proposes a study that aims to find a relation between the lexical diversity of a language and the parameter configuration that would produce the best performing mWANN-Tagger instance. Preliminary analyses suggested that a single parameter configuration may be applied to the eight aforementioned languages. The mWANN-Tagger instance produced by this configuration was as accurate as the language-dependent ones obtained through tuning. Afterwards, the weightless neural tagger was further subjected to new corpora in languages that range from very isolating to polysynthetic ones. The best performing instances of mWANN-Tagger are again the ones produced by the universal parameter configuration. Hence, mWANN-Tagger can be applied to new corpora with no need of parameter tuning, making it a universal multilingual part-of-speech tagger. Further experiments with Universal Dependencies treebanks reveal that mWANN-Tagger may be extended and that it has potential to outperform most state-of-the-art part-of-speech taggers if better word representations are provided.</description><subject>Lexical diversity</subject><subject>Linguistics - methods</subject><subject>Natural Language Processing</subject><subject>Neural Networks (Computer)</subject><subject>Part-of-speech tagging</subject><subject>Weightless neural networks</subject><subject>Zipf’s law</subject><issn>0893-6080</issn><issn>1879-2782</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp9kE1PwzAMhiMEYmPwDxDqkUuL3c_0gjRNfElIXIBrlKbeyOjaLUk38e_J6ODIybL9vn7lh7FLhAgB85tl1FLfkotiwCKCNALEIzZGXpRhXPD4mI2Bl0mYA4cRO7N2CQA5T5NTNop5Bn6Zjdn7NOhbvSVjZROs-sbpRreL3jc70osP15C1gQ8yfuLDdp35DJxcLMgEWy2DTS9bp510_kTw49TWaWXP2clcNpYuDnXC3u7vXmeP4fPLw9Ns-hyqJI9dWGWlopJi5T-CCuoY53WSE5QV8kzyXPE8Swssqaq4qqkuiOK5qlKloFKIWTJh18Pdtek2PVknVtoqahrZUtdbgbwsETkv0EvTQapMZ62huVgbvZLmSyCIPVGxFANRsScqIBWeqLddHRL6akX1n-kXoRfcDgLyf241GWGVplZRrQ0pJ-pO_5_wDd34jA4</recordid><startdate>201707</startdate><enddate>201707</enddate><creator>Carneiro, Hugo C.C.</creator><creator>Pedreira, Carlos E.</creator><creator>França, Felipe M.G.</creator><creator>Lima, Priscila M.V.</creator><general>Elsevier Ltd</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-5094-5908</orcidid></search><sort><creationdate>201707</creationdate><title>A universal multilingual weightless neural network tagger via quantitative linguistics</title><author>Carneiro, Hugo C.C. ; Pedreira, Carlos E. ; França, Felipe M.G. ; Lima, Priscila M.V.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c362t-b59ce9e2c1010b0d21fd36e09b185a86c8654719ebb8cded7ee2fcb4cc0bc1153</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Lexical diversity</topic><topic>Linguistics - methods</topic><topic>Natural Language Processing</topic><topic>Neural Networks (Computer)</topic><topic>Part-of-speech tagging</topic><topic>Weightless neural networks</topic><topic>Zipf’s law</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Carneiro, Hugo C.C.</creatorcontrib><creatorcontrib>Pedreira, Carlos E.</creatorcontrib><creatorcontrib>França, Felipe M.G.</creatorcontrib><creatorcontrib>Lima, Priscila M.V.</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Neural networks</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Carneiro, Hugo C.C.</au><au>Pedreira, Carlos E.</au><au>França, Felipe M.G.</au><au>Lima, Priscila M.V.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A universal multilingual weightless neural network tagger via quantitative linguistics</atitle><jtitle>Neural networks</jtitle><addtitle>Neural Netw</addtitle><date>2017-07</date><risdate>2017</risdate><volume>91</volume><spage>85</spage><epage>101</epage><pages>85-101</pages><issn>0893-6080</issn><eissn>1879-2782</eissn><abstract>In the last decade, given the availability of corpora in several distinct languages, research on multilingual part-of-speech tagging started to grow. Amongst the novelties there is mWANN-Tagger (multilingual weightless artificial neural network tagger), a weightless neural part-of-speech tagger capable of being used for mostly-suffix-oriented languages. The tagger was subjected to corpora in eight languages of quite distinct natures and had a remarkable accuracy with very low sample deviation in every one of them, indicating the robustness of weightless neural systems for part-of-speech tagging tasks. However, mWANN-Tagger needed to be tuned for every new corpus, since each one required a different parameter configuration. For mWANN-Tagger to be truly multilingual, it should be usable for any new language with no need of parameter tuning. This article proposes a study that aims to find a relation between the lexical diversity of a language and the parameter configuration that would produce the best performing mWANN-Tagger instance. Preliminary analyses suggested that a single parameter configuration may be applied to the eight aforementioned languages. The mWANN-Tagger instance produced by this configuration was as accurate as the language-dependent ones obtained through tuning. Afterwards, the weightless neural tagger was further subjected to new corpora in languages that range from very isolating to polysynthetic ones. The best performing instances of mWANN-Tagger are again the ones produced by the universal parameter configuration. Hence, mWANN-Tagger can be applied to new corpora with no need of parameter tuning, making it a universal multilingual part-of-speech tagger. Further experiments with Universal Dependencies treebanks reveal that mWANN-Tagger may be extended and that it has potential to outperform most state-of-the-art part-of-speech taggers if better word representations are provided.</abstract><cop>United States</cop><pub>Elsevier Ltd</pub><pmid>28500895</pmid><doi>10.1016/j.neunet.2017.04.011</doi><tpages>17</tpages><orcidid>https://orcid.org/0000-0001-5094-5908</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0893-6080
ispartof Neural networks, 2017-07, Vol.91, p.85-101
issn 0893-6080
1879-2782
language eng
recordid cdi_proquest_miscellaneous_1899118871
source MEDLINE; ScienceDirect Journals (5 years ago - present)
subjects Lexical diversity
Linguistics - methods
Natural Language Processing
Neural Networks (Computer)
Part-of-speech tagging
Weightless neural networks
Zipf’s law
title A universal multilingual weightless neural network tagger via quantitative linguistics
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T21%3A16%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20universal%20multilingual%20weightless%20neural%20network%20tagger%20via%20quantitative%20linguistics&rft.jtitle=Neural%20networks&rft.au=Carneiro,%20Hugo%20C.C.&rft.date=2017-07&rft.volume=91&rft.spage=85&rft.epage=101&rft.pages=85-101&rft.issn=0893-6080&rft.eissn=1879-2782&rft_id=info:doi/10.1016/j.neunet.2017.04.011&rft_dat=%3Cproquest_cross%3E1899118871%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1899118871&rft_id=info:pmid/28500895&rft_els_id=S0893608017300916&rfr_iscdi=true