Combining wavelength importance ranking to the random forest classifier to analyze multiclass spectral data
•The integration of wavelength importance rankings with the Random Forest classifier is proposed to analyze spectral data.•Six different wavelength importance rankings are assessed and the best performing is recommended.•Propositions are validated in six binary and multiclass datasets.•The method ou...
Gespeichert in:
Veröffentlicht in: | Forensic science international 2021-11, Vol.328, p.110998-110998, Article 110998 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •The integration of wavelength importance rankings with the Random Forest classifier is proposed to analyze spectral data.•Six different wavelength importance rankings are assessed and the best performing is recommended.•Propositions are validated in six binary and multiclass datasets.•The method outperformed competing approaches in terms of percentage of retained wavelengths.
Near Infrared (NIR) is a type of vibrational spectroscopy widely used in different areas to characterize substances. NIR datasets are comprised of absorbance measures on a range of wavelengths (λ). Typically noisy and correlated, the use of such datasets tend to compromise the performance of several statistical techniques; one way to overcome that is to select portions of the spectra in which wavelengths are more informative. In this paper we investigate the performance of the Random Forest (RF) classifier associated with several wavelength importance ranking approaches on the task of classifying product samples into categories, such as quality levels or authenticity. Our propositions are tested using six NIR datasets comprised of two or more classes of food and pharmaceutical products, as well as illegal drugs. Our proposed classification model, an integration of the χ2 ranking score and the RF classifier, substantially reduced the number of wavelengths in the dataset, while increasing the classification accuracy when compared to the use of complete datasets. Our propositions also presented good performance when compared to competing methods available in the literature. |
---|---|
ISSN: | 0379-0738 1872-6283 |
DOI: | 10.1016/j.forsciint.2021.110998 |