A Variable Selection Method Based on Fast Nondominated Sorting Genetic Algorithm for Qualitative Discrimination of Near Infrared Spectroscopy

A reliable and effective qualitative near-infrared (NIR) spectroscopy discrimination method is critical for excellent model building, yet the performance of models built by these methods is highly dependent on valid feature extraction. The goal of feature selection is to associate the selected varia...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of spectroscopy (Hindawi) 2022-06, Vol.2022, p.1-8
Hauptverfasser: Liu, Hubin, Liu, Na, Yuan, Yuhui, Zhang, Cihai, Zhao, Longlian, Li, Junhui
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A reliable and effective qualitative near-infrared (NIR) spectroscopy discrimination method is critical for excellent model building, yet the performance of models built by these methods is highly dependent on valid feature extraction. The goal of feature selection is to associate the selected variables with the property of interest, which many have done successfully. However, many of selection methods focus only on strong association with the analytes or properties of interest, neglecting correlations between variables. A variable selection method based on a fast nondominated-ranking genetic algorithm (NSGA-II) was proposed in this paper for qualitative discrimination of NIR spectra. The method had two objective functions: (1) maximizing the sum of ratios of interclass variance to intraclass variance, (2) minimizing the sum of correlation coefficients between the selected variables. FT-NIR spectra of a total of 124 tobacco samples from different origins and parts in Guizhou Province, China, were used as the experimental objects, and the part-grade discrimination models of tobacco leaves were established by combining this method with partial least squares-based discriminant analysis (PLS-DA), and compared with PLS-DA model based on the full spectrum. The results showed that the performance of PLS-DA model with the NSGA-II was improved, with a comparable or better correct discrimination rate and reasonable discrimination rate, and could discriminate different parts of the tobacco leaves well. It indicates that the NSGA-II can select a few and effective feature variables to build a high-performance qualitative discrimination model and is proved to be a promising algorithm. In addition, the method is not designed exclusively for spectral data. It is a general strategy that could be used for variable selection for other types of data.
ISSN:2314-4920
2314-4939
DOI:10.1155/2022/2141872