Efficient data preprocessing, episode classification, and source apportionment of particle number concentrations

Number concentration is an important index to measure atmospheric particle pollution. However, tailored methods for data preprocessing and characteristic and source analyses of particle number concentrations (PNC) are rare and interpreting the data is time-consuming and inefficient. In this method-o...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Science of the total environment 2020-11, Vol.744, p.140923-140923, Article 140923
Hauptverfasser: Liang, Chun-Sheng, Wu, Hao, Li, Hai-Yan, Zhang, Qiang, Li, Zhanqing, He, Ke-Bin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Number concentration is an important index to measure atmospheric particle pollution. However, tailored methods for data preprocessing and characteristic and source analyses of particle number concentrations (PNC) are rare and interpreting the data is time-consuming and inefficient. In this method-oriented study, we develop and investigate some techniques via flexible conditions, C++ optimized algorithms, and parallel computing in R (an open source software for statistics and graphics) to tackle these challenges. The data preprocessing methods include deletions of variables and observations, outlier removal, and interpolation for missing values (NA). They do better in cleaning data and keeping samples and generate no new outliers after interpolation, compared with previous methods. Besides, automatic division of PNC pollution events based on relative values suites PNC properties and highlights the pollution characteristics related to sources and mechanisms. Additionally, basic functions of k-means clustering, Principal Component Analysis (PCA), Factor Analysis (FA), Positive Matrix Factorization (PMF), and a newly-introduced model NMF (Non-negative Matrix Factorization) were tested and compared in analyzing PNC sources. Only PMF and NMF can identify coal heating and produce more explicable results, meanwhile NMF apportions more distinctly and runs 11–28 times faster than PMF. Traffic is interannually stable in non-heating periods and always dominant. Coal heating's contribution has decreased by 40%–86% in recent 5 heating periods, reflecting the effectiveness of coal burning control. [Display omitted] •Auto-identification of consecutive NA via moving averages benefits interpolation.•Point-by-point weighed outlier removal by conditional extremum saves non-outliers.•Auto-division of episodes via threshold windows, durations, and trend constraints•Performance rank of source apportionment models is NMF > PMF > FA ≈ PCA > k-means.•Traffic kept dominant while coal heating decreased by 40%–86% over recent 5 years.
ISSN:0048-9697
1879-1026
DOI:10.1016/j.scitotenv.2020.140923