SocialNER2.0: A comprehensive dataset for enhancing named entity recognition in short human-produced text

Named Entity Recognition (NER) is an essential task in Natural Language Processing (NLP), and deep learning-based models have shown outstanding performance. However, the effectiveness of deep learning models in NER relies heavily on the quality and quantity of labeled training datasets available. A...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Intelligent data analysis 2024-01, Vol.28 (3), p.841-865
Hauptverfasser: Belbekri, Adel, Benchikha, Fouzia, Slimani, Yahya, Marir, Naila
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 865
container_issue 3
container_start_page 841
container_title Intelligent data analysis
container_volume 28
creator Belbekri, Adel
Benchikha, Fouzia
Slimani, Yahya
Marir, Naila
description Named Entity Recognition (NER) is an essential task in Natural Language Processing (NLP), and deep learning-based models have shown outstanding performance. However, the effectiveness of deep learning models in NER relies heavily on the quality and quantity of labeled training datasets available. A novel and comprehensive training dataset called SocialNER2.0 is proposed to address this challenge. Based on selected datasets dedicated to different tasks related to NER, the SocialNER2.0 construction process involves data selection, extraction, enrichment, conversion, and balancing steps. The pre-trained BERT (Bidirectional Encoder Representations from Transformers) model is fine-tuned using the proposed dataset. Experimental results highlight the superior performance of the fine-tuned BERT in accurately identifying named entities, demonstrating the SocialNER2.0 dataset’s capacity to provide valuable training data for performing NER in human-produced texts.
doi_str_mv 10.3233/IDA-230588
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3062684914</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sage_id>10.3233_IDA-230588</sage_id><sourcerecordid>3062684914</sourcerecordid><originalsourceid>FETCH-LOGICAL-c180t-2be4272e2cba83fdc34b8d2e944d22e14340729c120056bff7fcfb88ada582693</originalsourceid><addsrcrecordid>eNptkF1LwzAYhYMoOKc3_oKAF4LQma-2qXdjTh0MBT_Au5KmyZqxJjNJxf17IxO88eo9Lzw8Bw4A5xhNKKH0enE7zQhFOecHYITzEmcME36YMuI8Y0X5fgxOQlgjhBhBbATMi5NGbB7nz2SCbuAUStdvveqUDeZTwVZEEVSE2nmobCesNHYFrehVm_5o4g56Jd3KmmichcbC0DkfYTf0wmZb79pBJjSqr3gKjrTYBHX2e8fg7W7-OnvIlk_3i9l0mUnMUcxIoxgpiSKyEZzqVlLW8JaoirGWEIUZZagklcQEobxotC611A3nohU5J0VFx-Bi703tH4MKsV67wdtUWVNUkIKzKknG4GpPSe9C8ErXW2964Xc1RvXPlHWast5PmeDLPRzESv3p_iG_AXR-cro</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3062684914</pqid></control><display><type>article</type><title>SocialNER2.0: A comprehensive dataset for enhancing named entity recognition in short human-produced text</title><source>Business Source Complete</source><creator>Belbekri, Adel ; Benchikha, Fouzia ; Slimani, Yahya ; Marir, Naila</creator><creatorcontrib>Belbekri, Adel ; Benchikha, Fouzia ; Slimani, Yahya ; Marir, Naila</creatorcontrib><description>Named Entity Recognition (NER) is an essential task in Natural Language Processing (NLP), and deep learning-based models have shown outstanding performance. However, the effectiveness of deep learning models in NER relies heavily on the quality and quantity of labeled training datasets available. A novel and comprehensive training dataset called SocialNER2.0 is proposed to address this challenge. Based on selected datasets dedicated to different tasks related to NER, the SocialNER2.0 construction process involves data selection, extraction, enrichment, conversion, and balancing steps. The pre-trained BERT (Bidirectional Encoder Representations from Transformers) model is fine-tuned using the proposed dataset. Experimental results highlight the superior performance of the fine-tuned BERT in accurately identifying named entities, demonstrating the SocialNER2.0 dataset’s capacity to provide valuable training data for performing NER in human-produced texts.</description><identifier>ISSN: 1088-467X</identifier><identifier>EISSN: 1571-4128</identifier><identifier>DOI: 10.3233/IDA-230588</identifier><language>eng</language><publisher>London, England: SAGE Publications</publisher><subject>Datasets ; Deep learning ; Natural language processing ; Recognition</subject><ispartof>Intelligent data analysis, 2024-01, Vol.28 (3), p.841-865</ispartof><rights>2024 – IOS Press. All rights reserved.</rights><rights>Copyright IOS Press BV 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c180t-2be4272e2cba83fdc34b8d2e944d22e14340729c120056bff7fcfb88ada582693</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27903,27904</link.rule.ids></links><search><creatorcontrib>Belbekri, Adel</creatorcontrib><creatorcontrib>Benchikha, Fouzia</creatorcontrib><creatorcontrib>Slimani, Yahya</creatorcontrib><creatorcontrib>Marir, Naila</creatorcontrib><title>SocialNER2.0: A comprehensive dataset for enhancing named entity recognition in short human-produced text</title><title>Intelligent data analysis</title><description>Named Entity Recognition (NER) is an essential task in Natural Language Processing (NLP), and deep learning-based models have shown outstanding performance. However, the effectiveness of deep learning models in NER relies heavily on the quality and quantity of labeled training datasets available. A novel and comprehensive training dataset called SocialNER2.0 is proposed to address this challenge. Based on selected datasets dedicated to different tasks related to NER, the SocialNER2.0 construction process involves data selection, extraction, enrichment, conversion, and balancing steps. The pre-trained BERT (Bidirectional Encoder Representations from Transformers) model is fine-tuned using the proposed dataset. Experimental results highlight the superior performance of the fine-tuned BERT in accurately identifying named entities, demonstrating the SocialNER2.0 dataset’s capacity to provide valuable training data for performing NER in human-produced texts.</description><subject>Datasets</subject><subject>Deep learning</subject><subject>Natural language processing</subject><subject>Recognition</subject><issn>1088-467X</issn><issn>1571-4128</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNptkF1LwzAYhYMoOKc3_oKAF4LQma-2qXdjTh0MBT_Au5KmyZqxJjNJxf17IxO88eo9Lzw8Bw4A5xhNKKH0enE7zQhFOecHYITzEmcME36YMuI8Y0X5fgxOQlgjhBhBbATMi5NGbB7nz2SCbuAUStdvveqUDeZTwVZEEVSE2nmobCesNHYFrehVm_5o4g56Jd3KmmichcbC0DkfYTf0wmZb79pBJjSqr3gKjrTYBHX2e8fg7W7-OnvIlk_3i9l0mUnMUcxIoxgpiSKyEZzqVlLW8JaoirGWEIUZZagklcQEobxotC611A3nohU5J0VFx-Bi703tH4MKsV67wdtUWVNUkIKzKknG4GpPSe9C8ErXW2964Xc1RvXPlHWast5PmeDLPRzESv3p_iG_AXR-cro</recordid><startdate>20240101</startdate><enddate>20240101</enddate><creator>Belbekri, Adel</creator><creator>Benchikha, Fouzia</creator><creator>Slimani, Yahya</creator><creator>Marir, Naila</creator><general>SAGE Publications</general><general>IOS Press BV</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20240101</creationdate><title>SocialNER2.0: A comprehensive dataset for enhancing named entity recognition in short human-produced text</title><author>Belbekri, Adel ; Benchikha, Fouzia ; Slimani, Yahya ; Marir, Naila</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c180t-2be4272e2cba83fdc34b8d2e944d22e14340729c120056bff7fcfb88ada582693</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Datasets</topic><topic>Deep learning</topic><topic>Natural language processing</topic><topic>Recognition</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Belbekri, Adel</creatorcontrib><creatorcontrib>Benchikha, Fouzia</creatorcontrib><creatorcontrib>Slimani, Yahya</creatorcontrib><creatorcontrib>Marir, Naila</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Intelligent data analysis</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Belbekri, Adel</au><au>Benchikha, Fouzia</au><au>Slimani, Yahya</au><au>Marir, Naila</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SocialNER2.0: A comprehensive dataset for enhancing named entity recognition in short human-produced text</atitle><jtitle>Intelligent data analysis</jtitle><date>2024-01-01</date><risdate>2024</risdate><volume>28</volume><issue>3</issue><spage>841</spage><epage>865</epage><pages>841-865</pages><issn>1088-467X</issn><eissn>1571-4128</eissn><abstract>Named Entity Recognition (NER) is an essential task in Natural Language Processing (NLP), and deep learning-based models have shown outstanding performance. However, the effectiveness of deep learning models in NER relies heavily on the quality and quantity of labeled training datasets available. A novel and comprehensive training dataset called SocialNER2.0 is proposed to address this challenge. Based on selected datasets dedicated to different tasks related to NER, the SocialNER2.0 construction process involves data selection, extraction, enrichment, conversion, and balancing steps. The pre-trained BERT (Bidirectional Encoder Representations from Transformers) model is fine-tuned using the proposed dataset. Experimental results highlight the superior performance of the fine-tuned BERT in accurately identifying named entities, demonstrating the SocialNER2.0 dataset’s capacity to provide valuable training data for performing NER in human-produced texts.</abstract><cop>London, England</cop><pub>SAGE Publications</pub><doi>10.3233/IDA-230588</doi><tpages>25</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1088-467X
ispartof Intelligent data analysis, 2024-01, Vol.28 (3), p.841-865
issn 1088-467X
1571-4128
language eng
recordid cdi_proquest_journals_3062684914
source Business Source Complete
subjects Datasets
Deep learning
Natural language processing
Recognition
title SocialNER2.0: A comprehensive dataset for enhancing named entity recognition in short human-produced text
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T22%3A34%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SocialNER2.0:%20A%20comprehensive%20dataset%20for%20enhancing%20named%20entity%20recognition%20in%20short%20human-produced%20text&rft.jtitle=Intelligent%20data%20analysis&rft.au=Belbekri,%20Adel&rft.date=2024-01-01&rft.volume=28&rft.issue=3&rft.spage=841&rft.epage=865&rft.pages=841-865&rft.issn=1088-467X&rft.eissn=1571-4128&rft_id=info:doi/10.3233/IDA-230588&rft_dat=%3Cproquest_cross%3E3062684914%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3062684914&rft_id=info:pmid/&rft_sage_id=10.3233_IDA-230588&rfr_iscdi=true