Ant-TD: Ant colony optimization plus temporal difference reinforcement learning for multi-label feature selection
In recent years, multi-label learning becomes a trending topic in machine learning and data mining. This type of learning deals with data that each instance is associated with more than one label. Feature selection is a pre-processing method, which can significantly improve the performance of the mu...
Gespeichert in:
Veröffentlicht in: | Swarm and evolutionary computation 2021-07, Vol.64, p.100892, Article 100892 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In recent years, multi-label learning becomes a trending topic in machine learning and data mining. This type of learning deals with data that each instance is associated with more than one label. Feature selection is a pre-processing method, which can significantly improve the performance of the multi-label classification. In this paper, we propose a new multi-label feature selection method based on Ant Colony Optimization (ACO). The proposed method makes a significant difference among all ACO-based feature selection methods so that instead of using a static heuristic function, it uses a heuristic learning approach. Because heuristic functions influence the decision-making process of the ACO during the search process, using a heuristic learning approach can help the algorithm to search better in the search space. Here we propose a heuristic learning approach for the ACO heuristic function, which learns from experiences directly by using the Temporal Difference (TD) reinforcement learning algorithm. For heuristic learning, we need to model the ACO problem into a reinforcement learning problem. Thus, we model the feature selection search space into a Markovian Decision Process (MDP) where features represent the states (S), and selecting the unvisited features by each ant represents a set of actions (A). Reward signals (R) are composed of two criteria when ants take action. The ACO state transition rule, which is a combination of both probabilistic and greedy rules, forms the transition function (T) in MDP. In addition to the pheromone that is updated by the “global updating rule” in the ACO, the state-value function (V) is directly updated by the temporal difference formulation to form a learned heuristic function. We conduct various experiments on nine benchmark data and compare the classification performance over three multi-label evaluation measures against nine state-of-the-art multi-label feature selection methods. The results show that the proposed method outperforms competing methods significantly. |
---|---|
ISSN: | 2210-6502 |
DOI: | 10.1016/j.swevo.2021.100892 |