The effects of column-wise manipulations on accuracy of classical classifiers with high-dimensional spectral data

Column-wise manipulations (CWM), a group of data pre-processing (DP) techniques composed of mean-centering, Pareto scaling (PS), variance scaling and auto-scaling; are often applied individually or in combination. It has been applied like a norm without thoughtful considerations partly attributed to...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Lee, Loong Chuen, Liong, Choong-Yeun, Jemain, Abdul Aziz
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Column-wise manipulations (CWM), a group of data pre-processing (DP) techniques composed of mean-centering, Pareto scaling (PS), variance scaling and auto-scaling; are often applied individually or in combination. It has been applied like a norm without thoughtful considerations partly attributed to its simplicity and ease of applications. Theoretically, all variables in IR spectrum are measured on the same scale and seldom have different means and as such rarely require CWM as compared to normalization. This preliminary paper aims to investigate the real needs of each aforementioned CWM in infrared (IR) spectroscopic dataset that is derived from white copy paper. The untreated and pre-processed IR data is then processed with Principal Component Analysis plus Linear Discriminant Analysis (PCA-DA). The impact of CWM on test accuracy of the different PCA-DA models is then compared according to different IR wavenumber intervals. Error of the predictive models is determined via nonparametric bootstrap. Results show that an in-formative spectrum (i.e. highly discriminatory) can, even in its raw form, achieve high classification accuracy if optimum numbers of principal components are included. It is concluded that selection of CWM for IR spectrum depends on its in-herent quality such that a discriminatory IR spectrum might not need any CWM at all.
ISSN:0094-243X
1551-7616
DOI:10.1063/1.4980992