Local learning for dynamic ensemble selection
Dynamic selection techniques are based on the idea that the classifiers from an ensemble are experts in different areas of the feature space. As such, they attempt to single out only the most competent one(s) to label a given query sample generally based on the locality assumption, i.e., assuming th...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Dissertation |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | De Araujo Souza, Mariana |
description | Dynamic selection techniques are based on the idea that the classifiers from an ensemble are experts in different areas of the feature space. As such, they attempt to single out only the most competent one(s) to label a given query sample generally based on the locality assumption, i.e., assuming that similar instances share a similar set of classifiers able to correctly label them. Therefore, the success of the dynamic selection task is strongly linked to the local data distribution, as it establishes the quality of the defined region for the dynamic selection task and may affect how the classifiers’ local expertise is perceived. As such, characteristics such as local class overlap and data sparsity may lead to a poorly defined local region presenting a weak locality assumption, thus hindering the search for a local expert.
Thus, in this thesis, several techniques that integrate the local context into the multiple classifier system are proposed to improve the dynamic selection of classifiers over challenging scenarios. To that end, the definition of an adequate local region is addressed by characterizing the local data and defining the regions using different methods to tackle complex distributions and with multiple scales to provide ample context to the system. The presence of local experts is also addressed by producing the pool over the local border to yield more specialized classifiers, and by learning the dynamic selection task in an end-to-end manner from the classifiers’ interactions and the local data relations to boost the search for local experts. Thus, by leveraging the information from the local data distribution, the dynamic selection techniques’ ability to find local experts may be enhanced, improving its robustness and performance over complex problems.
In Chapter 2, the Online Local Pool (OLP) technique is proposed to tackle the difficulty the dynamic selection techniques present in searching for local experts in overlap areas. To that end, the OLP technique generates several linear models in the vicinity of the query instance with different locality degrees to produce classifiers able to recognize the local border. To identify the class overlap areas, an instance hardness measure is computed in memorization for all the available samples, and the classifiers are so that they fully “cover” the target region. Experimental results demonstrate that using the generated local pool provided an improvement to the evaluated dynamic classifier selec |
format | Dissertation |
fullrecord | <record><control><sourceid>etsmtl_EYLGR</sourceid><recordid>TN_cdi_etsmtl_espace_oai_espace_etsmtl_ca_3297</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>oai_espace_etsmtl_ca_3297</sourcerecordid><originalsourceid>FETCH-etsmtl_espace_oai_espace_etsmtl_ca_32973</originalsourceid><addsrcrecordid>eNrjZND1yU9OzFHISU0sysvMS1dIyy9SSKnMS8zNTFZIzStOzU3KSVUoTs1JTS7JzM_jYWBNS8wpTuWF0twMem6uIc4euqklxbklOfGpxQWJyanx-YmZMCZUIjkx3tjI0tyYZA0AggE3zg</addsrcrecordid><sourcetype>Institutional Repository</sourcetype><iscdi>true</iscdi><recordtype>dissertation</recordtype></control><display><type>dissertation</type><title>Local learning for dynamic ensemble selection</title><source>Espace ETS</source><creator>De Araujo Souza, Mariana</creator><creatorcontrib>De Araujo Souza, Mariana ; Sabourin, Robert</creatorcontrib><description>Dynamic selection techniques are based on the idea that the classifiers from an ensemble are experts in different areas of the feature space. As such, they attempt to single out only the most competent one(s) to label a given query sample generally based on the locality assumption, i.e., assuming that similar instances share a similar set of classifiers able to correctly label them. Therefore, the success of the dynamic selection task is strongly linked to the local data distribution, as it establishes the quality of the defined region for the dynamic selection task and may affect how the classifiers’ local expertise is perceived. As such, characteristics such as local class overlap and data sparsity may lead to a poorly defined local region presenting a weak locality assumption, thus hindering the search for a local expert.
Thus, in this thesis, several techniques that integrate the local context into the multiple classifier system are proposed to improve the dynamic selection of classifiers over challenging scenarios. To that end, the definition of an adequate local region is addressed by characterizing the local data and defining the regions using different methods to tackle complex distributions and with multiple scales to provide ample context to the system. The presence of local experts is also addressed by producing the pool over the local border to yield more specialized classifiers, and by learning the dynamic selection task in an end-to-end manner from the classifiers’ interactions and the local data relations to boost the search for local experts. Thus, by leveraging the information from the local data distribution, the dynamic selection techniques’ ability to find local experts may be enhanced, improving its robustness and performance over complex problems.
In Chapter 2, the Online Local Pool (OLP) technique is proposed to tackle the difficulty the dynamic selection techniques present in searching for local experts in overlap areas. To that end, the OLP technique generates several linear models in the vicinity of the query instance with different locality degrees to produce classifiers able to recognize the local border. To identify the class overlap areas, an instance hardness measure is computed in memorization for all the available samples, and the classifiers are so that they fully “cover” the target region. Experimental results demonstrate that using the generated local pool provided an improvement to the evaluated dynamic classifier selection techniques compared to a globally generated pool, suggesting an advantage in having locally specialized classifiers in the pool for the dynamic selection task. The proposed approach also performs similarly to several state-of-the-art learning methods.
In Chapter 3, a local ensemble method based on the OLP was proposed, the OLP++, to address the limitations the former presented over high dimensional data due to its local region definition being susceptible to the effects of the curse of dimensionality. To that end, the OLP++ approach leverages the data partitions obtained from tree-based algorithms for the locality definition, and then produces the local experts over the different impure nodes from the decision path that a given query instance traverses in the tree(s), therefore introducing an increasingly wider local context to the local ensemble. Experimental results show that the OLP++’s recursive partition-based region definition successfully identified borderline instances more often than the OLP’s nearest neighbors-based region definition, suggesting an improvement in the data distribution used to learn the local linear rules. The OLP++’s region definition also leads to a more diverse local ensemble and a statistically superior performance compared to the OLP over the high dimensional data. The OLP++ also outperforms the random forest baseline and several local-based dynamic selection techniques, further suggesting the advantages of the proposed approach for dealing with high dimensional data in the context of dynamic selection.
Lastly, in Chapter 4, a novel dynamic multiple classifier system is proposed to deal with sparse and overlapped data as the OLP++ presents a shortcoming due to its reliance on pre-defined partitions that were not optimized for the dynamic selection task. The proposed Graph Neural Network Dynamic Ensemble Selection (GNN-DES) technique addresses this issue by learning the dynamic selection task in an end-to-end manner using a multi-label Graph Neural Network (GNN), that is responsible for the selection of the local experts. By learning from the samples’ local relationships, represented in a graph, and the classifiers’ inter-dependencies, modeled in the meta-labels, the GNN may implicitly learn an embedded space where the locality assumption is stronger without requiring an explicit local region definition. Experimental results demonstrate that the classical dynamic selection techniques generally struggle over sparse and overlapped data, and that the GNN-DES outperforms the static selection baseline and several techniques based on similarities in the feature space. Further analysis also shows the GNN-DES better deals with performed better than the contending techniques over the problems where the locality assumption is weaker in the presence of class overlap, suggesting that leveraging the local data distribution and the classifiers’ interactions can aid the dynamic selection task in challenging scenarios.</description><language>eng</language><publisher>École de technologie supérieure</publisher><subject>apprentissage local ; chevauchement de classes ; dureté des instances ; méta-apprentissage ; rareté des données ; réseaux de neurones graphiques ; systèmes de classification multiples ; sélection dynamique</subject><creationdate>2023</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>311,315,780,27860</link.rule.ids><linktorsrc>$$Uhttps://espace.etsmtl.ca/3297$$EView_record_in_École_de_technologie_supérieure$$FView_record_in_$$GÉcole_de_technologie_supérieure$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>De Araujo Souza, Mariana</creatorcontrib><title>Local learning for dynamic ensemble selection</title><description>Dynamic selection techniques are based on the idea that the classifiers from an ensemble are experts in different areas of the feature space. As such, they attempt to single out only the most competent one(s) to label a given query sample generally based on the locality assumption, i.e., assuming that similar instances share a similar set of classifiers able to correctly label them. Therefore, the success of the dynamic selection task is strongly linked to the local data distribution, as it establishes the quality of the defined region for the dynamic selection task and may affect how the classifiers’ local expertise is perceived. As such, characteristics such as local class overlap and data sparsity may lead to a poorly defined local region presenting a weak locality assumption, thus hindering the search for a local expert.
Thus, in this thesis, several techniques that integrate the local context into the multiple classifier system are proposed to improve the dynamic selection of classifiers over challenging scenarios. To that end, the definition of an adequate local region is addressed by characterizing the local data and defining the regions using different methods to tackle complex distributions and with multiple scales to provide ample context to the system. The presence of local experts is also addressed by producing the pool over the local border to yield more specialized classifiers, and by learning the dynamic selection task in an end-to-end manner from the classifiers’ interactions and the local data relations to boost the search for local experts. Thus, by leveraging the information from the local data distribution, the dynamic selection techniques’ ability to find local experts may be enhanced, improving its robustness and performance over complex problems.
In Chapter 2, the Online Local Pool (OLP) technique is proposed to tackle the difficulty the dynamic selection techniques present in searching for local experts in overlap areas. To that end, the OLP technique generates several linear models in the vicinity of the query instance with different locality degrees to produce classifiers able to recognize the local border. To identify the class overlap areas, an instance hardness measure is computed in memorization for all the available samples, and the classifiers are so that they fully “cover” the target region. Experimental results demonstrate that using the generated local pool provided an improvement to the evaluated dynamic classifier selection techniques compared to a globally generated pool, suggesting an advantage in having locally specialized classifiers in the pool for the dynamic selection task. The proposed approach also performs similarly to several state-of-the-art learning methods.
In Chapter 3, a local ensemble method based on the OLP was proposed, the OLP++, to address the limitations the former presented over high dimensional data due to its local region definition being susceptible to the effects of the curse of dimensionality. To that end, the OLP++ approach leverages the data partitions obtained from tree-based algorithms for the locality definition, and then produces the local experts over the different impure nodes from the decision path that a given query instance traverses in the tree(s), therefore introducing an increasingly wider local context to the local ensemble. Experimental results show that the OLP++’s recursive partition-based region definition successfully identified borderline instances more often than the OLP’s nearest neighbors-based region definition, suggesting an improvement in the data distribution used to learn the local linear rules. The OLP++’s region definition also leads to a more diverse local ensemble and a statistically superior performance compared to the OLP over the high dimensional data. The OLP++ also outperforms the random forest baseline and several local-based dynamic selection techniques, further suggesting the advantages of the proposed approach for dealing with high dimensional data in the context of dynamic selection.
Lastly, in Chapter 4, a novel dynamic multiple classifier system is proposed to deal with sparse and overlapped data as the OLP++ presents a shortcoming due to its reliance on pre-defined partitions that were not optimized for the dynamic selection task. The proposed Graph Neural Network Dynamic Ensemble Selection (GNN-DES) technique addresses this issue by learning the dynamic selection task in an end-to-end manner using a multi-label Graph Neural Network (GNN), that is responsible for the selection of the local experts. By learning from the samples’ local relationships, represented in a graph, and the classifiers’ inter-dependencies, modeled in the meta-labels, the GNN may implicitly learn an embedded space where the locality assumption is stronger without requiring an explicit local region definition. Experimental results demonstrate that the classical dynamic selection techniques generally struggle over sparse and overlapped data, and that the GNN-DES outperforms the static selection baseline and several techniques based on similarities in the feature space. Further analysis also shows the GNN-DES better deals with performed better than the contending techniques over the problems where the locality assumption is weaker in the presence of class overlap, suggesting that leveraging the local data distribution and the classifiers’ interactions can aid the dynamic selection task in challenging scenarios.</description><subject>apprentissage local</subject><subject>chevauchement de classes</subject><subject>dureté des instances</subject><subject>méta-apprentissage</subject><subject>rareté des données</subject><subject>réseaux de neurones graphiques</subject><subject>systèmes de classification multiples</subject><subject>sélection dynamique</subject><fulltext>true</fulltext><rsrctype>dissertation</rsrctype><creationdate>2023</creationdate><recordtype>dissertation</recordtype><sourceid>EYLGR</sourceid><recordid>eNrjZND1yU9OzFHISU0sysvMS1dIyy9SSKnMS8zNTFZIzStOzU3KSVUoTs1JTS7JzM_jYWBNS8wpTuWF0twMem6uIc4euqklxbklOfGpxQWJyanx-YmZMCZUIjkx3tjI0tyYZA0AggE3zg</recordid><startdate>20230728</startdate><enddate>20230728</enddate><creator>De Araujo Souza, Mariana</creator><general>École de technologie supérieure</general><scope>EYLGR</scope></search><sort><creationdate>20230728</creationdate><title>Local learning for dynamic ensemble selection</title><author>De Araujo Souza, Mariana</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-etsmtl_espace_oai_espace_etsmtl_ca_32973</frbrgroupid><rsrctype>dissertations</rsrctype><prefilter>dissertations</prefilter><language>eng</language><creationdate>2023</creationdate><topic>apprentissage local</topic><topic>chevauchement de classes</topic><topic>dureté des instances</topic><topic>méta-apprentissage</topic><topic>rareté des données</topic><topic>réseaux de neurones graphiques</topic><topic>systèmes de classification multiples</topic><topic>sélection dynamique</topic><toplevel>online_resources</toplevel><creatorcontrib>De Araujo Souza, Mariana</creatorcontrib><collection>Espace ETS</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>De Araujo Souza, Mariana</au><format>dissertation</format><genre>dissertation</genre><ristype>THES</ristype><Advisor>Sabourin, Robert</Advisor><Degree>Thèse</Degree><btitle>Local learning for dynamic ensemble selection</btitle><date>2023-07-28</date><risdate>2023</risdate><abstract>Dynamic selection techniques are based on the idea that the classifiers from an ensemble are experts in different areas of the feature space. As such, they attempt to single out only the most competent one(s) to label a given query sample generally based on the locality assumption, i.e., assuming that similar instances share a similar set of classifiers able to correctly label them. Therefore, the success of the dynamic selection task is strongly linked to the local data distribution, as it establishes the quality of the defined region for the dynamic selection task and may affect how the classifiers’ local expertise is perceived. As such, characteristics such as local class overlap and data sparsity may lead to a poorly defined local region presenting a weak locality assumption, thus hindering the search for a local expert.
Thus, in this thesis, several techniques that integrate the local context into the multiple classifier system are proposed to improve the dynamic selection of classifiers over challenging scenarios. To that end, the definition of an adequate local region is addressed by characterizing the local data and defining the regions using different methods to tackle complex distributions and with multiple scales to provide ample context to the system. The presence of local experts is also addressed by producing the pool over the local border to yield more specialized classifiers, and by learning the dynamic selection task in an end-to-end manner from the classifiers’ interactions and the local data relations to boost the search for local experts. Thus, by leveraging the information from the local data distribution, the dynamic selection techniques’ ability to find local experts may be enhanced, improving its robustness and performance over complex problems.
In Chapter 2, the Online Local Pool (OLP) technique is proposed to tackle the difficulty the dynamic selection techniques present in searching for local experts in overlap areas. To that end, the OLP technique generates several linear models in the vicinity of the query instance with different locality degrees to produce classifiers able to recognize the local border. To identify the class overlap areas, an instance hardness measure is computed in memorization for all the available samples, and the classifiers are so that they fully “cover” the target region. Experimental results demonstrate that using the generated local pool provided an improvement to the evaluated dynamic classifier selection techniques compared to a globally generated pool, suggesting an advantage in having locally specialized classifiers in the pool for the dynamic selection task. The proposed approach also performs similarly to several state-of-the-art learning methods.
In Chapter 3, a local ensemble method based on the OLP was proposed, the OLP++, to address the limitations the former presented over high dimensional data due to its local region definition being susceptible to the effects of the curse of dimensionality. To that end, the OLP++ approach leverages the data partitions obtained from tree-based algorithms for the locality definition, and then produces the local experts over the different impure nodes from the decision path that a given query instance traverses in the tree(s), therefore introducing an increasingly wider local context to the local ensemble. Experimental results show that the OLP++’s recursive partition-based region definition successfully identified borderline instances more often than the OLP’s nearest neighbors-based region definition, suggesting an improvement in the data distribution used to learn the local linear rules. The OLP++’s region definition also leads to a more diverse local ensemble and a statistically superior performance compared to the OLP over the high dimensional data. The OLP++ also outperforms the random forest baseline and several local-based dynamic selection techniques, further suggesting the advantages of the proposed approach for dealing with high dimensional data in the context of dynamic selection.
Lastly, in Chapter 4, a novel dynamic multiple classifier system is proposed to deal with sparse and overlapped data as the OLP++ presents a shortcoming due to its reliance on pre-defined partitions that were not optimized for the dynamic selection task. The proposed Graph Neural Network Dynamic Ensemble Selection (GNN-DES) technique addresses this issue by learning the dynamic selection task in an end-to-end manner using a multi-label Graph Neural Network (GNN), that is responsible for the selection of the local experts. By learning from the samples’ local relationships, represented in a graph, and the classifiers’ inter-dependencies, modeled in the meta-labels, the GNN may implicitly learn an embedded space where the locality assumption is stronger without requiring an explicit local region definition. Experimental results demonstrate that the classical dynamic selection techniques generally struggle over sparse and overlapped data, and that the GNN-DES outperforms the static selection baseline and several techniques based on similarities in the feature space. Further analysis also shows the GNN-DES better deals with performed better than the contending techniques over the problems where the locality assumption is weaker in the presence of class overlap, suggesting that leveraging the local data distribution and the classifiers’ interactions can aid the dynamic selection task in challenging scenarios.</abstract><pub>École de technologie supérieure</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | |
ispartof | |
issn | |
language | eng |
recordid | cdi_etsmtl_espace_oai_espace_etsmtl_ca_3297 |
source | Espace ETS |
subjects | apprentissage local chevauchement de classes dureté des instances méta-apprentissage rareté des données réseaux de neurones graphiques systèmes de classification multiples sélection dynamique |
title | Local learning for dynamic ensemble selection |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T21%3A19%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-etsmtl_EYLGR&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft.genre=dissertation&rft.btitle=Local%20learning%20for%20dynamic%20ensemble%20selection&rft.au=De%20Araujo%20Souza,%20Mariana&rft.date=2023-07-28&rft_id=info:doi/&rft_dat=%3Cetsmtl_EYLGR%3Eoai_espace_etsmtl_ca_3297%3C/etsmtl_EYLGR%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |