A multilabel approach to morphosyntactic probing
We introduce a multilabel probing task to assess the morphosyntactic representations of word embeddings from multilingual language models. We demonstrate this task with multilingual BERT (Devlin et al., 2018), training probes for seven typologically diverse languages of varying morphological complex...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2021-04 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Naomi Tachikawa Shapiro Paullada, Amandalynne Steinert-Threlkeld, Shane |
description | We introduce a multilabel probing task to assess the morphosyntactic representations of word embeddings from multilingual language models. We demonstrate this task with multilingual BERT (Devlin et al., 2018), training probes for seven typologically diverse languages of varying morphological complexity: Afrikaans, Croatian, Finnish, Hebrew, Korean, Spanish, and Turkish. Through this simple but robust paradigm, we show that multilingual BERT renders many morphosyntactic features easily and simultaneously extractable (e.g., gender, grammatical case, pronominal type). We further evaluate the probes on six "held-out" languages in a zero-shot transfer setting: Arabic, Chinese, Marathi, Slovenian, Tagalog, and Yoruba. This style of probing has the added benefit of revealing the linguistic properties that language models recognize as being shared across languages. For instance, the probes performed well on recognizing nouns in the held-out languages, suggesting that multilingual BERT has a conception of noun-hood that transcends individual languages; yet, the same was not true of adjectives. |
format | Article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2515482516</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2515482516</sourcerecordid><originalsourceid>FETCH-proquest_journals_25154825163</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mQwcFTILc0pycxJTErNUUgsKCjKT0zOUCjJV8jNLyrIyC-uzCtJTC7JTFYAyiRl5qXzMLCmJeYUp_JCaW4GZTfXEGcPXaB8YWlqcUl8Vn5pUR5QKt7I1NDUxAJImhkTpwoAnTQzmA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2515482516</pqid></control><display><type>article</type><title>A multilabel approach to morphosyntactic probing</title><source>Free E- Journals</source><creator>Naomi Tachikawa Shapiro ; Paullada, Amandalynne ; Steinert-Threlkeld, Shane</creator><creatorcontrib>Naomi Tachikawa Shapiro ; Paullada, Amandalynne ; Steinert-Threlkeld, Shane</creatorcontrib><description>We introduce a multilabel probing task to assess the morphosyntactic representations of word embeddings from multilingual language models. We demonstrate this task with multilingual BERT (Devlin et al., 2018), training probes for seven typologically diverse languages of varying morphological complexity: Afrikaans, Croatian, Finnish, Hebrew, Korean, Spanish, and Turkish. Through this simple but robust paradigm, we show that multilingual BERT renders many morphosyntactic features easily and simultaneously extractable (e.g., gender, grammatical case, pronominal type). We further evaluate the probes on six "held-out" languages in a zero-shot transfer setting: Arabic, Chinese, Marathi, Slovenian, Tagalog, and Yoruba. This style of probing has the added benefit of revealing the linguistic properties that language models recognize as being shared across languages. For instance, the probes performed well on recognizing nouns in the held-out languages, suggesting that multilingual BERT has a conception of noun-hood that transcends individual languages; yet, the same was not true of adjectives.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Feature extraction ; Languages ; Model testing ; Multilingualism</subject><ispartof>arXiv.org, 2021-04</ispartof><rights>2021. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Naomi Tachikawa Shapiro</creatorcontrib><creatorcontrib>Paullada, Amandalynne</creatorcontrib><creatorcontrib>Steinert-Threlkeld, Shane</creatorcontrib><title>A multilabel approach to morphosyntactic probing</title><title>arXiv.org</title><description>We introduce a multilabel probing task to assess the morphosyntactic representations of word embeddings from multilingual language models. We demonstrate this task with multilingual BERT (Devlin et al., 2018), training probes for seven typologically diverse languages of varying morphological complexity: Afrikaans, Croatian, Finnish, Hebrew, Korean, Spanish, and Turkish. Through this simple but robust paradigm, we show that multilingual BERT renders many morphosyntactic features easily and simultaneously extractable (e.g., gender, grammatical case, pronominal type). We further evaluate the probes on six "held-out" languages in a zero-shot transfer setting: Arabic, Chinese, Marathi, Slovenian, Tagalog, and Yoruba. This style of probing has the added benefit of revealing the linguistic properties that language models recognize as being shared across languages. For instance, the probes performed well on recognizing nouns in the held-out languages, suggesting that multilingual BERT has a conception of noun-hood that transcends individual languages; yet, the same was not true of adjectives.</description><subject>Feature extraction</subject><subject>Languages</subject><subject>Model testing</subject><subject>Multilingualism</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mQwcFTILc0pycxJTErNUUgsKCjKT0zOUCjJV8jNLyrIyC-uzCtJTC7JTFYAyiRl5qXzMLCmJeYUp_JCaW4GZTfXEGcPXaB8YWlqcUl8Vn5pUR5QKt7I1NDUxAJImhkTpwoAnTQzmA</recordid><startdate>20210417</startdate><enddate>20210417</enddate><creator>Naomi Tachikawa Shapiro</creator><creator>Paullada, Amandalynne</creator><creator>Steinert-Threlkeld, Shane</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20210417</creationdate><title>A multilabel approach to morphosyntactic probing</title><author>Naomi Tachikawa Shapiro ; Paullada, Amandalynne ; Steinert-Threlkeld, Shane</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_25154825163</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Feature extraction</topic><topic>Languages</topic><topic>Model testing</topic><topic>Multilingualism</topic><toplevel>online_resources</toplevel><creatorcontrib>Naomi Tachikawa Shapiro</creatorcontrib><creatorcontrib>Paullada, Amandalynne</creatorcontrib><creatorcontrib>Steinert-Threlkeld, Shane</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Naomi Tachikawa Shapiro</au><au>Paullada, Amandalynne</au><au>Steinert-Threlkeld, Shane</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>A multilabel approach to morphosyntactic probing</atitle><jtitle>arXiv.org</jtitle><date>2021-04-17</date><risdate>2021</risdate><eissn>2331-8422</eissn><abstract>We introduce a multilabel probing task to assess the morphosyntactic representations of word embeddings from multilingual language models. We demonstrate this task with multilingual BERT (Devlin et al., 2018), training probes for seven typologically diverse languages of varying morphological complexity: Afrikaans, Croatian, Finnish, Hebrew, Korean, Spanish, and Turkish. Through this simple but robust paradigm, we show that multilingual BERT renders many morphosyntactic features easily and simultaneously extractable (e.g., gender, grammatical case, pronominal type). We further evaluate the probes on six "held-out" languages in a zero-shot transfer setting: Arabic, Chinese, Marathi, Slovenian, Tagalog, and Yoruba. This style of probing has the added benefit of revealing the linguistic properties that language models recognize as being shared across languages. For instance, the probes performed well on recognizing nouns in the held-out languages, suggesting that multilingual BERT has a conception of noun-hood that transcends individual languages; yet, the same was not true of adjectives.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2021-04 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2515482516 |
source | Free E- Journals |
subjects | Feature extraction Languages Model testing Multilingualism |
title | A multilabel approach to morphosyntactic probing |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T17%3A27%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=A%20multilabel%20approach%20to%20morphosyntactic%20probing&rft.jtitle=arXiv.org&rft.au=Naomi%20Tachikawa%20Shapiro&rft.date=2021-04-17&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2515482516%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2515482516&rft_id=info:pmid/&rfr_iscdi=true |