A combination of active learning and self-learning for named entity recognition on Twitter using conditional random fields

In recent years, many applications in natural language processing (NLP) have been developed using the machine learning approach. Annotating data is an important task in applying machine learning to NLP applications. A common approach to improve the system performance is to train on a large and high-...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Knowledge-based systems 2017-09, Vol.132, p.179-187
Hauptverfasser:	Tran, Van Cuong, Nguyen, Ngoc Thanh, Fujita, Hamido, Hoang, Dinh Tuyen, Hwang, Dosam
Format:	Artikel
Sprache:	eng
Schlagworte:	Active learning Artificial intelligence Classifiers Conditional random fields Labeling Machine learning Named entity recognition Natural language processing Performance enhancement Queries Recognition Self-learning Social networks Tweet streams
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	187
container_issue
container_start_page	179
container_title	Knowledge-based systems
container_volume	132
creator	Tran, Van Cuong Nguyen, Ngoc Thanh Fujita, Hamido Hoang, Dinh Tuyen Hwang, Dosam
description	In recent years, many applications in natural language processing (NLP) have been developed using the machine learning approach. Annotating data is an important task in applying machine learning to NLP applications. A common approach to improve the system performance is to train on a large and high-quality set of training data that is annotated by experts. Besides, active learning (AL) and self-learning can be utilized to reduce the annotation costs. The self-learning method discovers highly reliable instances based on a trained classifier, while AL queries the most informative instances based on active query algorithms. This paper proposes a method that combines AL and self-learning to reduce the labeling effort for the named entity recognition task from tweet streams by using both machine-labeled and manually-labeled data. We employ AL queries based on the diversity of the context and content of instances to select the most informative instances. The conditional random fields are also chosen as an underlying model to train a classifier for selecting highly reliable instances. The experiments using Twitter data show that the proposed method achieves good results in reducing the human labeling effort, and it can significantly improve the performance of the systems.
doi_str_mv	10.1016/j.knosys.2017.06.023
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_1941698769</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0950705117303040</els_id><sourcerecordid>1941698769</sourcerecordid><originalsourceid>FETCH-LOGICAL-c334t-aff6c6321db27a8cd75ca87a9d30df4a15cb07a7dc825119b02c67d107c9bcac3</originalsourceid><addsrcrecordid>eNp9kE1LxDAQhoMouK7-Aw8Bz62T9CPtRRDxCwQveg7pJJGs3USTrLL-ertWPHoamHnnYeYh5JRByYC156vy1Ye0TSUHJkpoS-DVHlmwTvBC1NDvkwX0DRQCGnZIjlJaAQDnrFuQr0uKYT04r7ILngZLFWb3YehoVPTOv1DlNU1mtMVfx4ZIvVobTY3PLm9pNBhevJsJnj59upxNpJu0S2Pw-mekRhonWFhT68yo0zE5sGpM5uS3LsnzzfXT1V3x8Hh7f3X5UGBV1blQ1rbYVpzpgQvVoRYNqk6oXlegba1YgwMIJTR2vGGsH4BjKzQDgf2ACqslOZu5bzG8b0zKchU2cTonSdbXrO070fZTqp5TGENK0Vj5Ft1axa1kIHeW5UrOluXOsoRWTpantYt5zUwffDgTZUJnPBrtJitZ6uD-B3wDAEGLEQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1941698769</pqid></control><display><type>article</type><title>A combination of active learning and self-learning for named entity recognition on Twitter using conditional random fields</title><source>Elsevier ScienceDirect Journals</source><creator>Tran, Van Cuong ; Nguyen, Ngoc Thanh ; Fujita, Hamido ; Hoang, Dinh Tuyen ; Hwang, Dosam</creator><creatorcontrib>Tran, Van Cuong ; Nguyen, Ngoc Thanh ; Fujita, Hamido ; Hoang, Dinh Tuyen ; Hwang, Dosam</creatorcontrib><description>In recent years, many applications in natural language processing (NLP) have been developed using the machine learning approach. Annotating data is an important task in applying machine learning to NLP applications. A common approach to improve the system performance is to train on a large and high-quality set of training data that is annotated by experts. Besides, active learning (AL) and self-learning can be utilized to reduce the annotation costs. The self-learning method discovers highly reliable instances based on a trained classifier, while AL queries the most informative instances based on active query algorithms. This paper proposes a method that combines AL and self-learning to reduce the labeling effort for the named entity recognition task from tweet streams by using both machine-labeled and manually-labeled data. We employ AL queries based on the diversity of the context and content of instances to select the most informative instances. The conditional random fields are also chosen as an underlying model to train a classifier for selecting highly reliable instances. The experiments using Twitter data show that the proposed method achieves good results in reducing the human labeling effort, and it can significantly improve the performance of the systems.</description><identifier>ISSN: 0950-7051</identifier><identifier>EISSN: 1872-7409</identifier><identifier>DOI: 10.1016/j.knosys.2017.06.023</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Active learning ; Artificial intelligence ; Classifiers ; Conditional random fields ; Labeling ; Machine learning ; Named entity recognition ; Natural language processing ; Performance enhancement ; Queries ; Recognition ; Self-learning ; Social networks ; Tweet streams</subject><ispartof>Knowledge-based systems, 2017-09, Vol.132, p.179-187</ispartof><rights>2017</rights><rights>Copyright Elsevier Science Ltd. Sep 15, 2017</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c334t-aff6c6321db27a8cd75ca87a9d30df4a15cb07a7dc825119b02c67d107c9bcac3</citedby><cites>FETCH-LOGICAL-c334t-aff6c6321db27a8cd75ca87a9d30df4a15cb07a7dc825119b02c67d107c9bcac3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0950705117303040$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids></links><search><creatorcontrib>Tran, Van Cuong</creatorcontrib><creatorcontrib>Nguyen, Ngoc Thanh</creatorcontrib><creatorcontrib>Fujita, Hamido</creatorcontrib><creatorcontrib>Hoang, Dinh Tuyen</creatorcontrib><creatorcontrib>Hwang, Dosam</creatorcontrib><title>A combination of active learning and self-learning for named entity recognition on Twitter using conditional random fields</title><title>Knowledge-based systems</title><description>In recent years, many applications in natural language processing (NLP) have been developed using the machine learning approach. Annotating data is an important task in applying machine learning to NLP applications. A common approach to improve the system performance is to train on a large and high-quality set of training data that is annotated by experts. Besides, active learning (AL) and self-learning can be utilized to reduce the annotation costs. The self-learning method discovers highly reliable instances based on a trained classifier, while AL queries the most informative instances based on active query algorithms. This paper proposes a method that combines AL and self-learning to reduce the labeling effort for the named entity recognition task from tweet streams by using both machine-labeled and manually-labeled data. We employ AL queries based on the diversity of the context and content of instances to select the most informative instances. The conditional random fields are also chosen as an underlying model to train a classifier for selecting highly reliable instances. The experiments using Twitter data show that the proposed method achieves good results in reducing the human labeling effort, and it can significantly improve the performance of the systems.</description><subject>Active learning</subject><subject>Artificial intelligence</subject><subject>Classifiers</subject><subject>Conditional random fields</subject><subject>Labeling</subject><subject>Machine learning</subject><subject>Named entity recognition</subject><subject>Natural language processing</subject><subject>Performance enhancement</subject><subject>Queries</subject><subject>Recognition</subject><subject>Self-learning</subject><subject>Social networks</subject><subject>Tweet streams</subject><issn>0950-7051</issn><issn>1872-7409</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LxDAQhoMouK7-Aw8Bz62T9CPtRRDxCwQveg7pJJGs3USTrLL-ertWPHoamHnnYeYh5JRByYC156vy1Ye0TSUHJkpoS-DVHlmwTvBC1NDvkwX0DRQCGnZIjlJaAQDnrFuQr0uKYT04r7ILngZLFWb3YehoVPTOv1DlNU1mtMVfx4ZIvVobTY3PLm9pNBhevJsJnj59upxNpJu0S2Pw-mekRhonWFhT68yo0zE5sGpM5uS3LsnzzfXT1V3x8Hh7f3X5UGBV1blQ1rbYVpzpgQvVoRYNqk6oXlegba1YgwMIJTR2vGGsH4BjKzQDgf2ACqslOZu5bzG8b0zKchU2cTonSdbXrO070fZTqp5TGENK0Vj5Ft1axa1kIHeW5UrOluXOsoRWTpantYt5zUwffDgTZUJnPBrtJitZ6uD-B3wDAEGLEQ</recordid><startdate>20170915</startdate><enddate>20170915</enddate><creator>Tran, Van Cuong</creator><creator>Nguyen, Ngoc Thanh</creator><creator>Fujita, Hamido</creator><creator>Hoang, Dinh Tuyen</creator><creator>Hwang, Dosam</creator><general>Elsevier B.V</general><general>Elsevier Science Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>E3H</scope><scope>F2A</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20170915</creationdate><title>A combination of active learning and self-learning for named entity recognition on Twitter using conditional random fields</title><author>Tran, Van Cuong ; Nguyen, Ngoc Thanh ; Fujita, Hamido ; Hoang, Dinh Tuyen ; Hwang, Dosam</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c334t-aff6c6321db27a8cd75ca87a9d30df4a15cb07a7dc825119b02c67d107c9bcac3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Active learning</topic><topic>Artificial intelligence</topic><topic>Classifiers</topic><topic>Conditional random fields</topic><topic>Labeling</topic><topic>Machine learning</topic><topic>Named entity recognition</topic><topic>Natural language processing</topic><topic>Performance enhancement</topic><topic>Queries</topic><topic>Recognition</topic><topic>Self-learning</topic><topic>Social networks</topic><topic>Tweet streams</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Tran, Van Cuong</creatorcontrib><creatorcontrib>Nguyen, Ngoc Thanh</creatorcontrib><creatorcontrib>Fujita, Hamido</creatorcontrib><creatorcontrib>Hoang, Dinh Tuyen</creatorcontrib><creatorcontrib>Hwang, Dosam</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Knowledge-based systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Tran, Van Cuong</au><au>Nguyen, Ngoc Thanh</au><au>Fujita, Hamido</au><au>Hoang, Dinh Tuyen</au><au>Hwang, Dosam</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A combination of active learning and self-learning for named entity recognition on Twitter using conditional random fields</atitle><jtitle>Knowledge-based systems</jtitle><date>2017-09-15</date><risdate>2017</risdate><volume>132</volume><spage>179</spage><epage>187</epage><pages>179-187</pages><issn>0950-7051</issn><eissn>1872-7409</eissn><abstract>In recent years, many applications in natural language processing (NLP) have been developed using the machine learning approach. Annotating data is an important task in applying machine learning to NLP applications. A common approach to improve the system performance is to train on a large and high-quality set of training data that is annotated by experts. Besides, active learning (AL) and self-learning can be utilized to reduce the annotation costs. The self-learning method discovers highly reliable instances based on a trained classifier, while AL queries the most informative instances based on active query algorithms. This paper proposes a method that combines AL and self-learning to reduce the labeling effort for the named entity recognition task from tweet streams by using both machine-labeled and manually-labeled data. We employ AL queries based on the diversity of the context and content of instances to select the most informative instances. The conditional random fields are also chosen as an underlying model to train a classifier for selecting highly reliable instances. The experiments using Twitter data show that the proposed method achieves good results in reducing the human labeling effort, and it can significantly improve the performance of the systems.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.knosys.2017.06.023</doi><tpages>9</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0950-7051
ispartof	Knowledge-based systems, 2017-09, Vol.132, p.179-187
issn	0950-7051 1872-7409
language	eng
recordid	cdi_proquest_journals_1941698769
source	Elsevier ScienceDirect Journals
subjects	Active learning Artificial intelligence Classifiers Conditional random fields Labeling Machine learning Named entity recognition Natural language processing Performance enhancement Queries Recognition Self-learning Social networks Tweet streams
title	A combination of active learning and self-learning for named entity recognition on Twitter using conditional random fields
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T19%3A16%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20combination%20of%20active%20learning%20and%20self-learning%20for%20named%20entity%20recognition%20on%20Twitter%20using%20conditional%20random%20fields&rft.jtitle=Knowledge-based%20systems&rft.au=Tran,%20Van%20Cuong&rft.date=2017-09-15&rft.volume=132&rft.spage=179&rft.epage=187&rft.pages=179-187&rft.issn=0950-7051&rft.eissn=1872-7409&rft_id=info:doi/10.1016/j.knosys.2017.06.023&rft_dat=%3Cproquest_cross%3E1941698769%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1941698769&rft_id=info:pmid/&rft_els_id=S0950705117303040&rfr_iscdi=true