Evaluating the performance of machine learning and variable selection methods to identify document paper using infrared spectral data

[Display omitted] •Developed robust machine learning models (SVM, FNN, RF) to classify document paper manufacturers using IR spectral data.•Applied second-derivative transformation of IR spectra to enhance model performance and interpretability.•Variable importance analysis identified the 1500–800 c...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy Molecular and biomolecular spectroscopy, 2025-02, Vol.327, p.125299, Article 125299
Hauptverfasser:	Lee, Yong Ju, Kweon, Soon Wan, Jeong, Chang Woo, Kim, Hyoung Jin
Format:	Artikel
Sprache:	eng
Schlagworte:	Feature importance Feed-forward neural network (FNN) Questioned document Random forest (RF) Support vector machine (SVM)
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	[Display omitted] •Developed robust machine learning models (SVM, FNN, RF) to classify document paper manufacturers using IR spectral data.•Applied second-derivative transformation of IR spectra to enhance model performance and interpretability.•Variable importance analysis identified the 1500–800 cm−1 range as critical for model optimization.•RF models achieved the highest classification accuracy with F1-scores of 1.000 using key spectral regions.•Demonstrated that spectral preprocessing and feature selection significantly reduce computational costs while maintaining high model accuracy. Infrared spectroscopy is a valuable tool for forensic examinations because it realizes nondestructive and rapid analysis. Recent advancements in machine learning have facilitated the development of chemometrics, extending to applications in questioned document examination. In this study, support vector machine (SVM), feedforward neural network (FNN), and random forest (RF) models were constructed using the infrared spectral data of document paper samples to identify the manufacturer of document paper products. For model training, the infrared (IR) spectral regions were selected based on their variable importance as determined by the RF models. Narrowing the IR spectral data within the range of 1500–800 cm−1 (selected according to variable importance measures) proved effective in terms of enhancing model performance while minimizing computational costs. The FNN and RF models trained on the second-derivative IR spectra in this range obtained F1-scores of 0.978 and 1.000, respectively. The findings of this study confirm the potential of machine learning methods for extracting and examining forensic features in document paper, resulting in robust models with low computational overhead.
ISSN:	1386-1425 1873-3557
DOI:	10.1016/j.saa.2024.125299