Comparison of preprocessing and machine learning methods for identifying writing inks using mid-infrared hyperspectral imaging and machine learning

We compared preprocessing and machine learning methods for identifying writing inks on paper using mid-infrared hyperspectral imaging and machine learning. We collected training data by making a blackened circle on paper with a pen and measuring it using hyperspectral imaging. We attempted to identi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Infrared physics & technology 2022-12, Vol.127, p.104357, Article 104357
1. Verfasser:	Sugawara, Shigeru
Format:	Artikel
Sprache:	eng
Schlagworte:	Crossed lines Document examination FTIR imaging Ink identification Machine learning Preprocessing
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We compared preprocessing and machine learning methods for identifying writing inks on paper using mid-infrared hyperspectral imaging and machine learning. We collected training data by making a blackened circle on paper with a pen and measuring it using hyperspectral imaging. We attempted to identify-five types of black ink that are difficult to identify using visible and near-infrared spectroscopy and can only be identified using mid-infrared spectroscopy. We initially analyzed the spectra using principal component analysis, and used the scores as a substitute for the spectrum. As an overall trend, standardization of the data for each variable had little effect on improving the discrimination rate. By contrast, using the difference spectrum from the average spectrum of the paper was effective for improving the discrimination rate. The discrimination rate was higher for the second-order derivative than for the spectrum itself, and for the first-order derivative than for the second-order derivative. Furthermore, the combination of the three had the highest discrimination rate. We tested three supervised machine learning methods: decision trees, discriminant analysis, and k-nearest neighbors. The highest classification accuracy (97.6%) was obtained for second-order discriminant analysis. Considering the discrimination rate and learning time, second-order discriminant analysis was the best method. For the measurement data of a sample with blackened circles using all the writing inks, this method identified the different inks quite well. For the measurement data of crossed line samples, using the proposed method, we clearly identified each ink. At the intersection of two lines, the line written later was detected more strongly.
ISSN:	1350-4495 1879-0275
DOI:	10.1016/j.infrared.2022.104357