Letter to the Editor: Ending the Use of Obsolete Data Analysis Methods

Aerosol and Air Quality Research is taking an editorial stand with regards to outdated data analysis methods, specifically principal component analysis (PCA) and related techniques and enrichment factors (EF). In both cases, they have been replaced with more quantitative data analytical tools that p...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Aerosol and Air Quality Research 2020-04, Vol.20 (4), p.688-689
Hauptverfasser: Hopke, Philip K., Jaffe, Daniel A.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Aerosol and Air Quality Research is taking an editorial stand with regards to outdated data analysis methods, specifically principal component analysis (PCA) and related techniques and enrichment factors (EF). In both cases, they have been replaced with more quantitative data analytical tools that provide much greater information on sources of variation in the data. Enrichment factors were first used in the 1960s when we basically did not have computers. It was a simple way of using double ratios to see if an element were substantially enriched over crustal abundances that had been reported by one of several authors. However, the information is quite crude since it simply says that the element is higher than typical crustal values and does not account for local variations in elemental abundances. When we now have the capabilities to look at correlations, statistical assessment of differences in means or medians, etc., we should provide appropriate quantitative estimates of significance of differences among samples. In the case of PCA and other eigenvector-based methods, it has been shown by Lawson and Hanson (1974) and Malinowski (2002) that an eigenvector analysis is an unweighted least-square fit to the data. Such fits are going to create problems with heteroskedastic data such as is commonly encountered in atmospheric measurements. Typically, the measurement uncertainties are proportional to the measured values rather than a fixed value for all of the measurements (homoscedastic data). Thus, unweighted least squares fits will not provide the best estimators of the parameters of interest. PCA also typically uses a default of subtracting the mean value from the data points and scales them by the variance such that it apportions the variance rather than the variation of the actual measured concentrations. Although methods like Target-Transformation Factor Analysis (TTFA) avoided subtracting the mean, it still suffers from the problem of improper (absence of) data point weights. In the 1970s when mainframe computing power was less than what we carry in our pockets as a telephone, it was necessary to use simplifying methods like eigenvector decompositions to be able to obtain results in a reasonable time. However, we have long since gained sufficient computing power in personal computers to be able to perform full, explicit least-squares fits with proper data weighting. Factor analysis tools like non-negative constrained alternating least square (Tauler et a
ISSN:1680-8584
2071-1409
DOI:10.4209/aaqr.2020.01.0001