A Bi-Criteria Active Learning Algorithm for Dynamic Data Streams

Active learning (AL) is a promising way to efficiently build up training sets with minimal supervision. A learner deliberately queries specific instances to tune the classifier's model using as few labels as possible. The challenge for streaming is that the data distribution may evolve over tim...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transaction on neural networks and learning systems 2018-01, Vol.29 (1), p.74-86
Hauptverfasser: Mohamad, Saad, Bouchachia, Abdelhamid, Sayed-Mouchaweh, Moamar
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 86
container_issue 1
container_start_page 74
container_title IEEE transaction on neural networks and learning systems
container_volume 29
creator Mohamad, Saad
Bouchachia, Abdelhamid
Sayed-Mouchaweh, Moamar
description Active learning (AL) is a promising way to efficiently build up training sets with minimal supervision. A learner deliberately queries specific instances to tune the classifier's model using as few labels as possible. The challenge for streaming is that the data distribution may evolve over time, and therefore the model must adapt. Another challenge is the sampling bias where the sampled training set does not reflect the underlying data distribution. In the presence of concept drift, sampling bias is more likely to occur as the training set needs to represent the whole evolving data. To tackle these challenges, we propose a novel bi-criteria AL (BAL) approach that relies on two selection criteria, namely, label uncertainty criterion and density-based criterion. While the first criterion selects instances that are the most uncertain in terms of class membership, the latter dynamically curbs the sampling bias by weighting the samples to reflect on the true underlying distribution. To design and implement these two criteria for learning from streams, BAL adopts a Bayesian online learning approach and combines online classification and online clustering through the use of online logistic regression and online growing Gaussian mixture models, respectively. Empirical results obtained on standard synthetic and real-world benchmarks show the high performance of the proposed BAL method compared with the state-of-the-art AL methods.
doi_str_mv 10.1109/TNNLS.2016.2614393
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TNNLS_2016_2614393</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7605500</ieee_id><sourcerecordid>1984556432</sourcerecordid><originalsourceid>FETCH-LOGICAL-c429t-589d68fb6e5872bd1714621f03e668e7bd26ad6d993fe2ee6a2db668e6040db23</originalsourceid><addsrcrecordid>eNpdkctOwzAQRS0EAlT4AZBQJDawSPEjduIdoQWKFMGiRWJnOckEjPIAO63Uv8ehpQu8sTX3zMNzETojeEwIljeL5-dsPqaYiDEVJGKS7aFjSgQNKUuS_d07fjtCp859Yn8E5iKSh-iIxnHMJcHH6DYN7kw4saYHa3SQFr1ZQZCBtq1p34O0fu-89tEEVWeD6brVjSmCqe51MO8t6MadoINK1w5Ot_cIvT7cLyazMHt5fJqkWVhEVPYhT2QpkioXwJOY5iWJSSQoqTADIRKI85IKXYpSSlYBBRCalvmgCBzhMqdshK43dT90rb6sabRdq04bNUszNcQwY0RyxlbEs1cb9st230twvWqMK6CudQvd0imSMM5JJHnk0ct_6Ge3tK3_iSIyibhfGBua0w1V2M45C9VuAoLVYIf6tUMNdqitHT7pYlt6mTdQ7lL-lu-B8w1gAGAnx94ljjH7AQWYitc</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1984556432</pqid></control><display><type>article</type><title>A Bi-Criteria Active Learning Algorithm for Dynamic Data Streams</title><source>IEEE Electronic Library (IEL)</source><creator>Mohamad, Saad ; Bouchachia, Abdelhamid ; Sayed-Mouchaweh, Moamar</creator><creatorcontrib>Mohamad, Saad ; Bouchachia, Abdelhamid ; Sayed-Mouchaweh, Moamar</creatorcontrib><description>Active learning (AL) is a promising way to efficiently build up training sets with minimal supervision. A learner deliberately queries specific instances to tune the classifier's model using as few labels as possible. The challenge for streaming is that the data distribution may evolve over time, and therefore the model must adapt. Another challenge is the sampling bias where the sampled training set does not reflect the underlying data distribution. In the presence of concept drift, sampling bias is more likely to occur as the training set needs to represent the whole evolving data. To tackle these challenges, we propose a novel bi-criteria AL (BAL) approach that relies on two selection criteria, namely, label uncertainty criterion and density-based criterion. While the first criterion selects instances that are the most uncertain in terms of class membership, the latter dynamically curbs the sampling bias by weighting the samples to reflect on the true underlying distribution. To design and implement these two criteria for learning from streams, BAL adopts a Bayesian online learning approach and combines online classification and online clustering through the use of online logistic regression and online growing Gaussian mixture models, respectively. Empirical results obtained on standard synthetic and real-world benchmarks show the high performance of the proposed BAL method compared with the state-of-the-art AL methods.</description><identifier>ISSN: 2162-237X</identifier><identifier>EISSN: 2162-2388</identifier><identifier>DOI: 10.1109/TNNLS.2016.2614393</identifier><identifier>PMID: 27775910</identifier><identifier>CODEN: ITNNAL</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Active learning ; Active learning (AL) ; Adaptation models ; Bayesian analysis ; Bayesian online learning ; Benchmarks ; Bias ; Biological evolution ; Clustering ; Clustering algorithms ; concept drift ; Criteria ; data streams ; Data transmission ; Distance learning ; Engineering Sciences ; Engines ; Heuristic algorithms ; Internet ; Labeling ; Learning ; Machine learning ; Mathematical models ; Regression analysis ; Sampling ; Training ; Uncertainty</subject><ispartof>IEEE transaction on neural networks and learning systems, 2018-01, Vol.29 (1), p.74-86</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c429t-589d68fb6e5872bd1714621f03e668e7bd26ad6d993fe2ee6a2db668e6040db23</citedby><cites>FETCH-LOGICAL-c429t-589d68fb6e5872bd1714621f03e668e7bd26ad6d993fe2ee6a2db668e6040db23</cites><orcidid>0000-0002-6929-986X ; 0000-0002-1980-5517</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7605500$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>230,314,776,780,792,881,4010,27900,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7605500$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/27775910$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink><backlink>$$Uhttps://hal.science/hal-03319533$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Mohamad, Saad</creatorcontrib><creatorcontrib>Bouchachia, Abdelhamid</creatorcontrib><creatorcontrib>Sayed-Mouchaweh, Moamar</creatorcontrib><title>A Bi-Criteria Active Learning Algorithm for Dynamic Data Streams</title><title>IEEE transaction on neural networks and learning systems</title><addtitle>TNNLS</addtitle><addtitle>IEEE Trans Neural Netw Learn Syst</addtitle><description>Active learning (AL) is a promising way to efficiently build up training sets with minimal supervision. A learner deliberately queries specific instances to tune the classifier's model using as few labels as possible. The challenge for streaming is that the data distribution may evolve over time, and therefore the model must adapt. Another challenge is the sampling bias where the sampled training set does not reflect the underlying data distribution. In the presence of concept drift, sampling bias is more likely to occur as the training set needs to represent the whole evolving data. To tackle these challenges, we propose a novel bi-criteria AL (BAL) approach that relies on two selection criteria, namely, label uncertainty criterion and density-based criterion. While the first criterion selects instances that are the most uncertain in terms of class membership, the latter dynamically curbs the sampling bias by weighting the samples to reflect on the true underlying distribution. To design and implement these two criteria for learning from streams, BAL adopts a Bayesian online learning approach and combines online classification and online clustering through the use of online logistic regression and online growing Gaussian mixture models, respectively. Empirical results obtained on standard synthetic and real-world benchmarks show the high performance of the proposed BAL method compared with the state-of-the-art AL methods.</description><subject>Active learning</subject><subject>Active learning (AL)</subject><subject>Adaptation models</subject><subject>Bayesian analysis</subject><subject>Bayesian online learning</subject><subject>Benchmarks</subject><subject>Bias</subject><subject>Biological evolution</subject><subject>Clustering</subject><subject>Clustering algorithms</subject><subject>concept drift</subject><subject>Criteria</subject><subject>data streams</subject><subject>Data transmission</subject><subject>Distance learning</subject><subject>Engineering Sciences</subject><subject>Engines</subject><subject>Heuristic algorithms</subject><subject>Internet</subject><subject>Labeling</subject><subject>Learning</subject><subject>Machine learning</subject><subject>Mathematical models</subject><subject>Regression analysis</subject><subject>Sampling</subject><subject>Training</subject><subject>Uncertainty</subject><issn>2162-237X</issn><issn>2162-2388</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkctOwzAQRS0EAlT4AZBQJDawSPEjduIdoQWKFMGiRWJnOckEjPIAO63Uv8ehpQu8sTX3zMNzETojeEwIljeL5-dsPqaYiDEVJGKS7aFjSgQNKUuS_d07fjtCp859Yn8E5iKSh-iIxnHMJcHH6DYN7kw4saYHa3SQFr1ZQZCBtq1p34O0fu-89tEEVWeD6brVjSmCqe51MO8t6MadoINK1w5Ot_cIvT7cLyazMHt5fJqkWVhEVPYhT2QpkioXwJOY5iWJSSQoqTADIRKI85IKXYpSSlYBBRCalvmgCBzhMqdshK43dT90rb6sabRdq04bNUszNcQwY0RyxlbEs1cb9st230twvWqMK6CudQvd0imSMM5JJHnk0ct_6Ge3tK3_iSIyibhfGBua0w1V2M45C9VuAoLVYIf6tUMNdqitHT7pYlt6mTdQ7lL-lu-B8w1gAGAnx94ljjH7AQWYitc</recordid><startdate>201801</startdate><enddate>201801</enddate><creator>Mohamad, Saad</creator><creator>Bouchachia, Abdelhamid</creator><creator>Sayed-Mouchaweh, Moamar</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QP</scope><scope>7QQ</scope><scope>7QR</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7TK</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>7X8</scope><scope>1XC</scope><orcidid>https://orcid.org/0000-0002-6929-986X</orcidid><orcidid>https://orcid.org/0000-0002-1980-5517</orcidid></search><sort><creationdate>201801</creationdate><title>A Bi-Criteria Active Learning Algorithm for Dynamic Data Streams</title><author>Mohamad, Saad ; Bouchachia, Abdelhamid ; Sayed-Mouchaweh, Moamar</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c429t-589d68fb6e5872bd1714621f03e668e7bd26ad6d993fe2ee6a2db668e6040db23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Active learning</topic><topic>Active learning (AL)</topic><topic>Adaptation models</topic><topic>Bayesian analysis</topic><topic>Bayesian online learning</topic><topic>Benchmarks</topic><topic>Bias</topic><topic>Biological evolution</topic><topic>Clustering</topic><topic>Clustering algorithms</topic><topic>concept drift</topic><topic>Criteria</topic><topic>data streams</topic><topic>Data transmission</topic><topic>Distance learning</topic><topic>Engineering Sciences</topic><topic>Engines</topic><topic>Heuristic algorithms</topic><topic>Internet</topic><topic>Labeling</topic><topic>Learning</topic><topic>Machine learning</topic><topic>Mathematical models</topic><topic>Regression analysis</topic><topic>Sampling</topic><topic>Training</topic><topic>Uncertainty</topic><toplevel>online_resources</toplevel><creatorcontrib>Mohamad, Saad</creatorcontrib><creatorcontrib>Bouchachia, Abdelhamid</creatorcontrib><creatorcontrib>Sayed-Mouchaweh, Moamar</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Chemoreception Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><collection>Hyper Article en Ligne (HAL)</collection><jtitle>IEEE transaction on neural networks and learning systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Mohamad, Saad</au><au>Bouchachia, Abdelhamid</au><au>Sayed-Mouchaweh, Moamar</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Bi-Criteria Active Learning Algorithm for Dynamic Data Streams</atitle><jtitle>IEEE transaction on neural networks and learning systems</jtitle><stitle>TNNLS</stitle><addtitle>IEEE Trans Neural Netw Learn Syst</addtitle><date>2018-01</date><risdate>2018</risdate><volume>29</volume><issue>1</issue><spage>74</spage><epage>86</epage><pages>74-86</pages><issn>2162-237X</issn><eissn>2162-2388</eissn><coden>ITNNAL</coden><abstract>Active learning (AL) is a promising way to efficiently build up training sets with minimal supervision. A learner deliberately queries specific instances to tune the classifier's model using as few labels as possible. The challenge for streaming is that the data distribution may evolve over time, and therefore the model must adapt. Another challenge is the sampling bias where the sampled training set does not reflect the underlying data distribution. In the presence of concept drift, sampling bias is more likely to occur as the training set needs to represent the whole evolving data. To tackle these challenges, we propose a novel bi-criteria AL (BAL) approach that relies on two selection criteria, namely, label uncertainty criterion and density-based criterion. While the first criterion selects instances that are the most uncertain in terms of class membership, the latter dynamically curbs the sampling bias by weighting the samples to reflect on the true underlying distribution. To design and implement these two criteria for learning from streams, BAL adopts a Bayesian online learning approach and combines online classification and online clustering through the use of online logistic regression and online growing Gaussian mixture models, respectively. Empirical results obtained on standard synthetic and real-world benchmarks show the high performance of the proposed BAL method compared with the state-of-the-art AL methods.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>27775910</pmid><doi>10.1109/TNNLS.2016.2614393</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-6929-986X</orcidid><orcidid>https://orcid.org/0000-0002-1980-5517</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 2162-237X
ispartof IEEE transaction on neural networks and learning systems, 2018-01, Vol.29 (1), p.74-86
issn 2162-237X
2162-2388
language eng
recordid cdi_crossref_primary_10_1109_TNNLS_2016_2614393
source IEEE Electronic Library (IEL)
subjects Active learning
Active learning (AL)
Adaptation models
Bayesian analysis
Bayesian online learning
Benchmarks
Bias
Biological evolution
Clustering
Clustering algorithms
concept drift
Criteria
data streams
Data transmission
Distance learning
Engineering Sciences
Engines
Heuristic algorithms
Internet
Labeling
Learning
Machine learning
Mathematical models
Regression analysis
Sampling
Training
Uncertainty
title A Bi-Criteria Active Learning Algorithm for Dynamic Data Streams
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-13T21%3A36%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Bi-Criteria%20Active%20Learning%20Algorithm%20for%20Dynamic%20Data%20Streams&rft.jtitle=IEEE%20transaction%20on%20neural%20networks%20and%20learning%20systems&rft.au=Mohamad,%20Saad&rft.date=2018-01&rft.volume=29&rft.issue=1&rft.spage=74&rft.epage=86&rft.pages=74-86&rft.issn=2162-237X&rft.eissn=2162-2388&rft.coden=ITNNAL&rft_id=info:doi/10.1109/TNNLS.2016.2614393&rft_dat=%3Cproquest_RIE%3E1984556432%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1984556432&rft_id=info:pmid/27775910&rft_ieee_id=7605500&rfr_iscdi=true