Combining wavelength importance ranking to the random forest classifier to analyze multiclass spectral data

•The integration of wavelength importance rankings with the Random Forest classifier is proposed to analyze spectral data.•Six different wavelength importance rankings are assessed and the best performing is recommended.•Propositions are validated in six binary and multiclass datasets.•The method ou...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Forensic science international 2021-11, Vol.328, p.110998-110998, Article 110998
Hauptverfasser:	de Abreu Fontes, Juliana, Anzanello, Michel José, Brito, João B.G., Bucco, Guilherme Brandelli, Fogliatto, Flavio Sanson, Puglia, Fábio do Prado
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Chi-square test Chi-Squared Classification Classifiers Cocaine Counterfeiting Data Analysis Datasets Feature selection Food Forensic sciences Fruits Humans Hypotheses Infrared analysis Laboratories Near infrared radiation Olive oil Pharmaceuticals Random Forest classifier Ranking Ratings & rankings Spectroscopy Spectrum analysis Statistical analysis Statistical methods Variables Wavelength Wavelength selection Wavelengths
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•The integration of wavelength importance rankings with the Random Forest classifier is proposed to analyze spectral data.•Six different wavelength importance rankings are assessed and the best performing is recommended.•Propositions are validated in six binary and multiclass datasets.•The method outperformed competing approaches in terms of percentage of retained wavelengths. Near Infrared (NIR) is a type of vibrational spectroscopy widely used in different areas to characterize substances. NIR datasets are comprised of absorbance measures on a range of wavelengths (λ). Typically noisy and correlated, the use of such datasets tend to compromise the performance of several statistical techniques; one way to overcome that is to select portions of the spectra in which wavelengths are more informative. In this paper we investigate the performance of the Random Forest (RF) classifier associated with several wavelength importance ranking approaches on the task of classifying product samples into categories, such as quality levels or authenticity. Our propositions are tested using six NIR datasets comprised of two or more classes of food and pharmaceutical products, as well as illegal drugs. Our proposed classification model, an integration of the χ2 ranking score and the RF classifier, substantially reduced the number of wavelengths in the dataset, while increasing the classification accuracy when compared to the use of complete datasets. Our propositions also presented good performance when compared to competing methods available in the literature.
ISSN:	0379-0738 1872-6283
DOI:	10.1016/j.forsciint.2021.110998