SocialNER2.0: A comprehensive dataset for enhancing named entity recognition in short human-produced text
Named Entity Recognition (NER) is an essential task in Natural Language Processing (NLP), and deep learning-based models have shown outstanding performance. However, the effectiveness of deep learning models in NER relies heavily on the quality and quantity of labeled training datasets available. A...
Gespeichert in:
Veröffentlicht in: | Intelligent data analysis 2024-01, Vol.28 (3), p.841-865 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 865 |
---|---|
container_issue | 3 |
container_start_page | 841 |
container_title | Intelligent data analysis |
container_volume | 28 |
creator | Belbekri, Adel Benchikha, Fouzia Slimani, Yahya Marir, Naila |
description | Named Entity Recognition (NER) is an essential task in Natural Language Processing (NLP), and deep learning-based models have shown outstanding performance. However, the effectiveness of deep learning models in NER relies heavily on the quality and quantity of labeled training datasets available. A novel and comprehensive training dataset called SocialNER2.0 is proposed to address this challenge. Based on selected datasets dedicated to different tasks related to NER, the SocialNER2.0 construction process involves data selection, extraction, enrichment, conversion, and balancing steps. The pre-trained BERT (Bidirectional Encoder Representations from Transformers) model is fine-tuned using the proposed dataset. Experimental results highlight the superior performance of the fine-tuned BERT in accurately identifying named entities, demonstrating the SocialNER2.0 dataset’s capacity to provide valuable training data for performing NER in human-produced texts. |
doi_str_mv | 10.3233/IDA-230588 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3062684914</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sage_id>10.3233_IDA-230588</sage_id><sourcerecordid>3062684914</sourcerecordid><originalsourceid>FETCH-LOGICAL-c180t-2be4272e2cba83fdc34b8d2e944d22e14340729c120056bff7fcfb88ada582693</originalsourceid><addsrcrecordid>eNptkF1LwzAYhYMoOKc3_oKAF4LQma-2qXdjTh0MBT_Au5KmyZqxJjNJxf17IxO88eo9Lzw8Bw4A5xhNKKH0enE7zQhFOecHYITzEmcME36YMuI8Y0X5fgxOQlgjhBhBbATMi5NGbB7nz2SCbuAUStdvveqUDeZTwVZEEVSE2nmobCesNHYFrehVm_5o4g56Jd3KmmichcbC0DkfYTf0wmZb79pBJjSqr3gKjrTYBHX2e8fg7W7-OnvIlk_3i9l0mUnMUcxIoxgpiSKyEZzqVlLW8JaoirGWEIUZZagklcQEobxotC611A3nohU5J0VFx-Bi703tH4MKsV67wdtUWVNUkIKzKknG4GpPSe9C8ErXW2964Xc1RvXPlHWast5PmeDLPRzESv3p_iG_AXR-cro</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3062684914</pqid></control><display><type>article</type><title>SocialNER2.0: A comprehensive dataset for enhancing named entity recognition in short human-produced text</title><source>Business Source Complete</source><creator>Belbekri, Adel ; Benchikha, Fouzia ; Slimani, Yahya ; Marir, Naila</creator><creatorcontrib>Belbekri, Adel ; Benchikha, Fouzia ; Slimani, Yahya ; Marir, Naila</creatorcontrib><description>Named Entity Recognition (NER) is an essential task in Natural Language Processing (NLP), and deep learning-based models have shown outstanding performance. However, the effectiveness of deep learning models in NER relies heavily on the quality and quantity of labeled training datasets available. A novel and comprehensive training dataset called SocialNER2.0 is proposed to address this challenge. Based on selected datasets dedicated to different tasks related to NER, the SocialNER2.0 construction process involves data selection, extraction, enrichment, conversion, and balancing steps. The pre-trained BERT (Bidirectional Encoder Representations from Transformers) model is fine-tuned using the proposed dataset. Experimental results highlight the superior performance of the fine-tuned BERT in accurately identifying named entities, demonstrating the SocialNER2.0 dataset’s capacity to provide valuable training data for performing NER in human-produced texts.</description><identifier>ISSN: 1088-467X</identifier><identifier>EISSN: 1571-4128</identifier><identifier>DOI: 10.3233/IDA-230588</identifier><language>eng</language><publisher>London, England: SAGE Publications</publisher><subject>Datasets ; Deep learning ; Natural language processing ; Recognition</subject><ispartof>Intelligent data analysis, 2024-01, Vol.28 (3), p.841-865</ispartof><rights>2024 – IOS Press. All rights reserved.</rights><rights>Copyright IOS Press BV 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c180t-2be4272e2cba83fdc34b8d2e944d22e14340729c120056bff7fcfb88ada582693</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27903,27904</link.rule.ids></links><search><creatorcontrib>Belbekri, Adel</creatorcontrib><creatorcontrib>Benchikha, Fouzia</creatorcontrib><creatorcontrib>Slimani, Yahya</creatorcontrib><creatorcontrib>Marir, Naila</creatorcontrib><title>SocialNER2.0: A comprehensive dataset for enhancing named entity recognition in short human-produced text</title><title>Intelligent data analysis</title><description>Named Entity Recognition (NER) is an essential task in Natural Language Processing (NLP), and deep learning-based models have shown outstanding performance. However, the effectiveness of deep learning models in NER relies heavily on the quality and quantity of labeled training datasets available. A novel and comprehensive training dataset called SocialNER2.0 is proposed to address this challenge. Based on selected datasets dedicated to different tasks related to NER, the SocialNER2.0 construction process involves data selection, extraction, enrichment, conversion, and balancing steps. The pre-trained BERT (Bidirectional Encoder Representations from Transformers) model is fine-tuned using the proposed dataset. Experimental results highlight the superior performance of the fine-tuned BERT in accurately identifying named entities, demonstrating the SocialNER2.0 dataset’s capacity to provide valuable training data for performing NER in human-produced texts.</description><subject>Datasets</subject><subject>Deep learning</subject><subject>Natural language processing</subject><subject>Recognition</subject><issn>1088-467X</issn><issn>1571-4128</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNptkF1LwzAYhYMoOKc3_oKAF4LQma-2qXdjTh0MBT_Au5KmyZqxJjNJxf17IxO88eo9Lzw8Bw4A5xhNKKH0enE7zQhFOecHYITzEmcME36YMuI8Y0X5fgxOQlgjhBhBbATMi5NGbB7nz2SCbuAUStdvveqUDeZTwVZEEVSE2nmobCesNHYFrehVm_5o4g56Jd3KmmichcbC0DkfYTf0wmZb79pBJjSqr3gKjrTYBHX2e8fg7W7-OnvIlk_3i9l0mUnMUcxIoxgpiSKyEZzqVlLW8JaoirGWEIUZZagklcQEobxotC611A3nohU5J0VFx-Bi703tH4MKsV67wdtUWVNUkIKzKknG4GpPSe9C8ErXW2964Xc1RvXPlHWast5PmeDLPRzESv3p_iG_AXR-cro</recordid><startdate>20240101</startdate><enddate>20240101</enddate><creator>Belbekri, Adel</creator><creator>Benchikha, Fouzia</creator><creator>Slimani, Yahya</creator><creator>Marir, Naila</creator><general>SAGE Publications</general><general>IOS Press BV</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20240101</creationdate><title>SocialNER2.0: A comprehensive dataset for enhancing named entity recognition in short human-produced text</title><author>Belbekri, Adel ; Benchikha, Fouzia ; Slimani, Yahya ; Marir, Naila</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c180t-2be4272e2cba83fdc34b8d2e944d22e14340729c120056bff7fcfb88ada582693</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Datasets</topic><topic>Deep learning</topic><topic>Natural language processing</topic><topic>Recognition</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Belbekri, Adel</creatorcontrib><creatorcontrib>Benchikha, Fouzia</creatorcontrib><creatorcontrib>Slimani, Yahya</creatorcontrib><creatorcontrib>Marir, Naila</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Intelligent data analysis</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Belbekri, Adel</au><au>Benchikha, Fouzia</au><au>Slimani, Yahya</au><au>Marir, Naila</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SocialNER2.0: A comprehensive dataset for enhancing named entity recognition in short human-produced text</atitle><jtitle>Intelligent data analysis</jtitle><date>2024-01-01</date><risdate>2024</risdate><volume>28</volume><issue>3</issue><spage>841</spage><epage>865</epage><pages>841-865</pages><issn>1088-467X</issn><eissn>1571-4128</eissn><abstract>Named Entity Recognition (NER) is an essential task in Natural Language Processing (NLP), and deep learning-based models have shown outstanding performance. However, the effectiveness of deep learning models in NER relies heavily on the quality and quantity of labeled training datasets available. A novel and comprehensive training dataset called SocialNER2.0 is proposed to address this challenge. Based on selected datasets dedicated to different tasks related to NER, the SocialNER2.0 construction process involves data selection, extraction, enrichment, conversion, and balancing steps. The pre-trained BERT (Bidirectional Encoder Representations from Transformers) model is fine-tuned using the proposed dataset. Experimental results highlight the superior performance of the fine-tuned BERT in accurately identifying named entities, demonstrating the SocialNER2.0 dataset’s capacity to provide valuable training data for performing NER in human-produced texts.</abstract><cop>London, England</cop><pub>SAGE Publications</pub><doi>10.3233/IDA-230588</doi><tpages>25</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1088-467X |
ispartof | Intelligent data analysis, 2024-01, Vol.28 (3), p.841-865 |
issn | 1088-467X 1571-4128 |
language | eng |
recordid | cdi_proquest_journals_3062684914 |
source | Business Source Complete |
subjects | Datasets Deep learning Natural language processing Recognition |
title | SocialNER2.0: A comprehensive dataset for enhancing named entity recognition in short human-produced text |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T22%3A34%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SocialNER2.0:%20A%20comprehensive%20dataset%20for%20enhancing%20named%20entity%20recognition%20in%20short%20human-produced%20text&rft.jtitle=Intelligent%20data%20analysis&rft.au=Belbekri,%20Adel&rft.date=2024-01-01&rft.volume=28&rft.issue=3&rft.spage=841&rft.epage=865&rft.pages=841-865&rft.issn=1088-467X&rft.eissn=1571-4128&rft_id=info:doi/10.3233/IDA-230588&rft_dat=%3Cproquest_cross%3E3062684914%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3062684914&rft_id=info:pmid/&rft_sage_id=10.3233_IDA-230588&rfr_iscdi=true |