SocialNER2.0: A comprehensive dataset for enhancing named entity recognition in short human-produced text

Named Entity Recognition (NER) is an essential task in Natural Language Processing (NLP), and deep learning-based models have shown outstanding performance. However, the effectiveness of deep learning models in NER relies heavily on the quality and quantity of labeled training datasets available. A...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Intelligent data analysis 2024-01, Vol.28 (3), p.841-865
Hauptverfasser:	Belbekri, Adel, Benchikha, Fouzia, Slimani, Yahya, Marir, Naila
Format:	Artikel
Sprache:	eng
Schlagworte:	Datasets Deep learning Natural language processing Recognition
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	865
container_issue	3
container_start_page	841
container_title	Intelligent data analysis
container_volume	28
creator	Belbekri, Adel Benchikha, Fouzia Slimani, Yahya Marir, Naila
description	Named Entity Recognition (NER) is an essential task in Natural Language Processing (NLP), and deep learning-based models have shown outstanding performance. However, the effectiveness of deep learning models in NER relies heavily on the quality and quantity of labeled training datasets available. A novel and comprehensive training dataset called SocialNER2.0 is proposed to address this challenge. Based on selected datasets dedicated to different tasks related to NER, the SocialNER2.0 construction process involves data selection, extraction, enrichment, conversion, and balancing steps. The pre-trained BERT (Bidirectional Encoder Representations from Transformers) model is fine-tuned using the proposed dataset. Experimental results highlight the superior performance of the fine-tuned BERT in accurately identifying named entities, demonstrating the SocialNER2.0 dataset’s capacity to provide valuable training data for performing NER in human-produced texts.
doi_str_mv	10.3233/IDA-230588
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3062684914</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sage_id>10.3233_IDA-230588</sage_id><sourcerecordid>3062684914</sourcerecordid><originalsourceid>FETCH-LOGICAL-c180t-2be4272e2cba83fdc34b8d2e944d22e14340729c120056bff7fcfb88ada582693</originalsourceid><addsrcrecordid>eNptkF1LwzAYhYMoOKc3_oKAF4LQma-2qXdjTh0MBT_Au5KmyZqxJjNJxf17IxO88eo9Lzw8Bw4A5xhNKKH0enE7zQhFOecHYITzEmcME36YMuI8Y0X5fgxOQlgjhBhBbATMi5NGbB7nz2SCbuAUStdvveqUDeZTwVZEEVSE2nmobCesNHYFrehVm_5o4g56Jd3KmmichcbC0DkfYTf0wmZb79pBJjSqr3gKjrTYBHX2e8fg7W7-OnvIlk_3i9l0mUnMUcxIoxgpiSKyEZzqVlLW8JaoirGWEIUZZagklcQEobxotC611A3nohU5J0VFx-Bi703tH4MKsV67wdtUWVNUkIKzKknG4GpPSe9C8ErXW2964Xc1RvXPlHWast5PmeDLPRzESv3p_iG_AXR-cro</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3062684914</pqid></control><display><type>article</type><title>SocialNER2.0: A comprehensive dataset for enhancing named entity recognition in short human-produced text</title><source>Business Source Complete</source><creator>Belbekri, Adel ; Benchikha, Fouzia ; Slimani, Yahya ; Marir, Naila</creator><creatorcontrib>Belbekri, Adel ; Benchikha, Fouzia ; Slimani, Yahya ; Marir, Naila</creatorcontrib><description>Named Entity Recognition (NER) is an essential task in Natural Language Processing (NLP), and deep learning-based models have shown outstanding performance. However, the effectiveness of deep learning models in NER relies heavily on the quality and quantity of labeled training datasets available. A novel and comprehensive training dataset called SocialNER2.0 is proposed to address this challenge. Based on selected datasets dedicated to different tasks related to NER, the SocialNER2.0 construction process involves data selection, extraction, enrichment, conversion, and balancing steps. The pre-trained BERT (Bidirectional Encoder Representations from Transformers) model is fine-tuned using the proposed dataset. Experimental results highlight the superior performance of the fine-tuned BERT in accurately identifying named entities, demonstrating the SocialNER2.0 dataset’s capacity to provide valuable training data for performing NER in human-produced texts.</description><identifier>ISSN: 1088-467X</identifier><identifier>EISSN: 1571-4128</identifier><identifier>DOI: 10.3233/IDA-230588</identifier><language>eng</language><publisher>London, England: SAGE Publications</publisher><subject>Datasets ; Deep learning ; Natural language processing ; Recognition</subject><ispartof>Intelligent data analysis, 2024-01, Vol.28 (3), p.841-865</ispartof><rights>2024 – IOS Press. All rights reserved.</rights><rights>Copyright IOS Press BV 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c180t-2be4272e2cba83fdc34b8d2e944d22e14340729c120056bff7fcfb88ada582693</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27903,27904</link.rule.ids></links><search><creatorcontrib>Belbekri, Adel</creatorcontrib><creatorcontrib>Benchikha, Fouzia</creatorcontrib><creatorcontrib>Slimani, Yahya</creatorcontrib><creatorcontrib>Marir, Naila</creatorcontrib><title>SocialNER2.0: A comprehensive dataset for enhancing named entity recognition in short human-produced text</title><title>Intelligent data analysis</title><description>Named Entity Recognition (NER) is an essential task in Natural Language Processing (NLP), and deep learning-based models have shown outstanding performance. However, the effectiveness of deep learning models in NER relies heavily on the quality and quantity of labeled training datasets available. A novel and comprehensive training dataset called SocialNER2.0 is proposed to address this challenge. Based on selected datasets dedicated to different tasks related to NER, the SocialNER2.0 construction process involves data selection, extraction, enrichment, conversion, and balancing steps. The pre-trained BERT (Bidirectional Encoder Representations from Transformers) model is fine-tuned using the proposed dataset. Experimental results highlight the superior performance of the fine-tuned BERT in accurately identifying named entities, demonstrating the SocialNER2.0 dataset’s capacity to provide valuable training data for performing NER in human-produced texts.</description><subject>Datasets</subject><subject>Deep learning</subject><subject>Natural language processing</subject><subject>Recognition</subject><issn>1088-467X</issn><issn>1571-4128</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNptkF1LwzAYhYMoOKc3_oKAF4LQma-2qXdjTh0MBT_Au5KmyZqxJjNJxf17IxO88eo9Lzw8Bw4A5xhNKKH0enE7zQhFOecHYITzEmcME36YMuI8Y0X5fgxOQlgjhBhBbATMi5NGbB7nz2SCbuAUStdvveqUDeZTwVZEEVSE2nmobCesNHYFrehVm_5o4g56Jd3KmmichcbC0DkfYTf0wmZb79pBJjSqr3gKjrTYBHX2e8fg7W7-OnvIlk_3i9l0mUnMUcxIoxgpiSKyEZzqVlLW8JaoirGWEIUZZagklcQEobxotC611A3nohU5J0VFx-Bi703tH4MKsV67wdtUWVNUkIKzKknG4GpPSe9C8ErXW2964Xc1RvXPlHWast5PmeDLPRzESv3p_iG_AXR-cro</recordid><startdate>20240101</startdate><enddate>20240101</enddate><creator>Belbekri, Adel</creator><creator>Benchikha, Fouzia</creator><creator>Slimani, Yahya</creator><creator>Marir, Naila</creator><general>SAGE Publications</general><general>IOS Press BV</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20240101</creationdate><title>SocialNER2.0: A comprehensive dataset for enhancing named entity recognition in short human-produced text</title><author>Belbekri, Adel ; Benchikha, Fouzia ; Slimani, Yahya ; Marir, Naila</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c180t-2be4272e2cba83fdc34b8d2e944d22e14340729c120056bff7fcfb88ada582693</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Datasets</topic><topic>Deep learning</topic><topic>Natural language processing</topic><topic>Recognition</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Belbekri, Adel</creatorcontrib><creatorcontrib>Benchikha, Fouzia</creatorcontrib><creatorcontrib>Slimani, Yahya</creatorcontrib><creatorcontrib>Marir, Naila</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Intelligent data analysis</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Belbekri, Adel</au><au>Benchikha, Fouzia</au><au>Slimani, Yahya</au><au>Marir, Naila</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SocialNER2.0: A comprehensive dataset for enhancing named entity recognition in short human-produced text</atitle><jtitle>Intelligent data analysis</jtitle><date>2024-01-01</date><risdate>2024</risdate><volume>28</volume><issue>3</issue><spage>841</spage><epage>865</epage><pages>841-865</pages><issn>1088-467X</issn><eissn>1571-4128</eissn><abstract>Named Entity Recognition (NER) is an essential task in Natural Language Processing (NLP), and deep learning-based models have shown outstanding performance. However, the effectiveness of deep learning models in NER relies heavily on the quality and quantity of labeled training datasets available. A novel and comprehensive training dataset called SocialNER2.0 is proposed to address this challenge. Based on selected datasets dedicated to different tasks related to NER, the SocialNER2.0 construction process involves data selection, extraction, enrichment, conversion, and balancing steps. The pre-trained BERT (Bidirectional Encoder Representations from Transformers) model is fine-tuned using the proposed dataset. Experimental results highlight the superior performance of the fine-tuned BERT in accurately identifying named entities, demonstrating the SocialNER2.0 dataset’s capacity to provide valuable training data for performing NER in human-produced texts.</abstract><cop>London, England</cop><pub>SAGE Publications</pub><doi>10.3233/IDA-230588</doi><tpages>25</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 1088-467X
ispartof	Intelligent data analysis, 2024-01, Vol.28 (3), p.841-865
issn	1088-467X 1571-4128
language	eng
recordid	cdi_proquest_journals_3062684914
source	Business Source Complete
subjects	Datasets Deep learning Natural language processing Recognition
title	SocialNER2.0: A comprehensive dataset for enhancing named entity recognition in short human-produced text
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T22%3A34%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SocialNER2.0:%20A%20comprehensive%20dataset%20for%20enhancing%20named%20entity%20recognition%20in%20short%20human-produced%20text&rft.jtitle=Intelligent%20data%20analysis&rft.au=Belbekri,%20Adel&rft.date=2024-01-01&rft.volume=28&rft.issue=3&rft.spage=841&rft.epage=865&rft.pages=841-865&rft.issn=1088-467X&rft.eissn=1571-4128&rft_id=info:doi/10.3233/IDA-230588&rft_dat=%3Cproquest_cross%3E3062684914%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3062684914&rft_id=info:pmid/&rft_sage_id=10.3233_IDA-230588&rfr_iscdi=true