UM6P-CS at SemEval-2022 Task 11: Enhancing Multilingual and Code-Mixed Complex Named Entity Recognition via Pseudo Labels using Multilingual Transformer

Building real-world complex Named Entity Recognition (NER) systems is a challenging task. This is due to the complexity and ambiguity of named entities that appear in various contexts such as short input sentences, emerging entities, and complex entities. Besides, real-world queries are mostly malfo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2022-04
Hauptverfasser:	Abdellah El Mekki, Abdelkader El Mahdaouy, Akallouch, Mohammed, Berrada, Ismail, Khoumsi, Ahmed
Format:	Artikel
Sprache:	eng
Schlagworte:	Classification Complexity Multilingualism Queries Recognition Transformers
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Abdellah El Mekki Abdelkader El Mahdaouy Akallouch, Mohammed Berrada, Ismail Khoumsi, Ahmed
description	Building real-world complex Named Entity Recognition (NER) systems is a challenging task. This is due to the complexity and ambiguity of named entities that appear in various contexts such as short input sentences, emerging entities, and complex entities. Besides, real-world queries are mostly malformed, as they can be code-mixed or multilingual, among other scenarios. In this paper, we introduce our submitted system to the Multilingual Complex Named Entity Recognition (MultiCoNER) shared task. We approach the complex NER for multilingual and code-mixed queries, by relying on the contextualized representation provided by the multilingual Transformer XLM-RoBERTa. In addition to the CRF-based token classification layer, we incorporate a span classification loss to recognize named entities spans. Furthermore, we use a self-training mechanism to generate weakly-annotated data from a large unlabeled dataset. Our proposed system is ranked 6th and 8th in the multilingual and code-mixed MultiCoNER's tracks respectively.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2656970615</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2656970615</sourcerecordid><originalsourceid>FETCH-proquest_journals_26569706153</originalsourceid><addsrcrecordid>eNqNjM1Kw0AUhQdBaNG-wwXXA5MbM61uQ8SFkWLjulyb2zp1MlPnp9Q38XGN4K4bV-c7fIdzIaZYloVc3CJOxCzGvVIK9RyrqpyK79dWL2W9Akqw4qE5kpWoEKGj-AFFcQ-Neye3MW4HbbbJ2JEyWSDXQ-17lq058S8OB8sneKZhbI1LJn3BC2_8zplkvIOjIVhGzr2HJ3pjGyHHs9MukItbHwYO1-JySzby7C-vxM1D09WP8hD8Z-aY1nufgxvVGnWl7-ZKF1X5v9UPOsNXDA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2656970615</pqid></control><display><type>article</type><title>UM6P-CS at SemEval-2022 Task 11: Enhancing Multilingual and Code-Mixed Complex Named Entity Recognition via Pseudo Labels using Multilingual Transformer</title><source>Free E- Journals</source><creator>Abdellah El Mekki ; Abdelkader El Mahdaouy ; Akallouch, Mohammed ; Berrada, Ismail ; Khoumsi, Ahmed</creator><creatorcontrib>Abdellah El Mekki ; Abdelkader El Mahdaouy ; Akallouch, Mohammed ; Berrada, Ismail ; Khoumsi, Ahmed</creatorcontrib><description>Building real-world complex Named Entity Recognition (NER) systems is a challenging task. This is due to the complexity and ambiguity of named entities that appear in various contexts such as short input sentences, emerging entities, and complex entities. Besides, real-world queries are mostly malformed, as they can be code-mixed or multilingual, among other scenarios. In this paper, we introduce our submitted system to the Multilingual Complex Named Entity Recognition (MultiCoNER) shared task. We approach the complex NER for multilingual and code-mixed queries, by relying on the contextualized representation provided by the multilingual Transformer XLM-RoBERTa. In addition to the CRF-based token classification layer, we incorporate a span classification loss to recognize named entities spans. Furthermore, we use a self-training mechanism to generate weakly-annotated data from a large unlabeled dataset. Our proposed system is ranked 6th and 8th in the multilingual and code-mixed MultiCoNER's tracks respectively.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Classification ; Complexity ; Multilingualism ; Queries ; Recognition ; Transformers</subject><ispartof>arXiv.org, 2022-04</ispartof><rights>2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Abdellah El Mekki</creatorcontrib><creatorcontrib>Abdelkader El Mahdaouy</creatorcontrib><creatorcontrib>Akallouch, Mohammed</creatorcontrib><creatorcontrib>Berrada, Ismail</creatorcontrib><creatorcontrib>Khoumsi, Ahmed</creatorcontrib><title>UM6P-CS at SemEval-2022 Task 11: Enhancing Multilingual and Code-Mixed Complex Named Entity Recognition via Pseudo Labels using Multilingual Transformer</title><title>arXiv.org</title><description>Building real-world complex Named Entity Recognition (NER) systems is a challenging task. This is due to the complexity and ambiguity of named entities that appear in various contexts such as short input sentences, emerging entities, and complex entities. Besides, real-world queries are mostly malformed, as they can be code-mixed or multilingual, among other scenarios. In this paper, we introduce our submitted system to the Multilingual Complex Named Entity Recognition (MultiCoNER) shared task. We approach the complex NER for multilingual and code-mixed queries, by relying on the contextualized representation provided by the multilingual Transformer XLM-RoBERTa. In addition to the CRF-based token classification layer, we incorporate a span classification loss to recognize named entities spans. Furthermore, we use a self-training mechanism to generate weakly-annotated data from a large unlabeled dataset. Our proposed system is ranked 6th and 8th in the multilingual and code-mixed MultiCoNER's tracks respectively.</description><subject>Classification</subject><subject>Complexity</subject><subject>Multilingualism</subject><subject>Queries</subject><subject>Recognition</subject><subject>Transformers</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNjM1Kw0AUhQdBaNG-wwXXA5MbM61uQ8SFkWLjulyb2zp1MlPnp9Q38XGN4K4bV-c7fIdzIaZYloVc3CJOxCzGvVIK9RyrqpyK79dWL2W9Akqw4qE5kpWoEKGj-AFFcQ-Neye3MW4HbbbJ2JEyWSDXQ-17lq058S8OB8sneKZhbI1LJn3BC2_8zplkvIOjIVhGzr2HJ3pjGyHHs9MukItbHwYO1-JySzby7C-vxM1D09WP8hD8Z-aY1nufgxvVGnWl7-ZKF1X5v9UPOsNXDA</recordid><startdate>20220428</startdate><enddate>20220428</enddate><creator>Abdellah El Mekki</creator><creator>Abdelkader El Mahdaouy</creator><creator>Akallouch, Mohammed</creator><creator>Berrada, Ismail</creator><creator>Khoumsi, Ahmed</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20220428</creationdate><title>UM6P-CS at SemEval-2022 Task 11: Enhancing Multilingual and Code-Mixed Complex Named Entity Recognition via Pseudo Labels using Multilingual Transformer</title><author>Abdellah El Mekki ; Abdelkader El Mahdaouy ; Akallouch, Mohammed ; Berrada, Ismail ; Khoumsi, Ahmed</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_26569706153</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Classification</topic><topic>Complexity</topic><topic>Multilingualism</topic><topic>Queries</topic><topic>Recognition</topic><topic>Transformers</topic><toplevel>online_resources</toplevel><creatorcontrib>Abdellah El Mekki</creatorcontrib><creatorcontrib>Abdelkader El Mahdaouy</creatorcontrib><creatorcontrib>Akallouch, Mohammed</creatorcontrib><creatorcontrib>Berrada, Ismail</creatorcontrib><creatorcontrib>Khoumsi, Ahmed</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Abdellah El Mekki</au><au>Abdelkader El Mahdaouy</au><au>Akallouch, Mohammed</au><au>Berrada, Ismail</au><au>Khoumsi, Ahmed</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>UM6P-CS at SemEval-2022 Task 11: Enhancing Multilingual and Code-Mixed Complex Named Entity Recognition via Pseudo Labels using Multilingual Transformer</atitle><jtitle>arXiv.org</jtitle><date>2022-04-28</date><risdate>2022</risdate><eissn>2331-8422</eissn><abstract>Building real-world complex Named Entity Recognition (NER) systems is a challenging task. This is due to the complexity and ambiguity of named entities that appear in various contexts such as short input sentences, emerging entities, and complex entities. Besides, real-world queries are mostly malformed, as they can be code-mixed or multilingual, among other scenarios. In this paper, we introduce our submitted system to the Multilingual Complex Named Entity Recognition (MultiCoNER) shared task. We approach the complex NER for multilingual and code-mixed queries, by relying on the contextualized representation provided by the multilingual Transformer XLM-RoBERTa. In addition to the CRF-based token classification layer, we incorporate a span classification loss to recognize named entities spans. Furthermore, we use a self-training mechanism to generate weakly-annotated data from a large unlabeled dataset. Our proposed system is ranked 6th and 8th in the multilingual and code-mixed MultiCoNER's tracks respectively.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2022-04
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2656970615
source	Free E- Journals
subjects	Classification Complexity Multilingualism Queries Recognition Transformers
title	UM6P-CS at SemEval-2022 Task 11: Enhancing Multilingual and Code-Mixed Complex Named Entity Recognition via Pseudo Labels using Multilingual Transformer
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T13%3A23%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=UM6P-CS%20at%20SemEval-2022%20Task%2011:%20Enhancing%20Multilingual%20and%20Code-Mixed%20Complex%20Named%20Entity%20Recognition%20via%20Pseudo%20Labels%20using%20Multilingual%20Transformer&rft.jtitle=arXiv.org&rft.au=Abdellah%20El%20Mekki&rft.date=2022-04-28&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2656970615%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2656970615&rft_id=info:pmid/&rfr_iscdi=true