A novel PCA-based calibration algorithm for classification of challenging laser-induced breakdown spectroscopy soil sample data

Accurate classification of soil types and contamination is crucial for crops' productivity. Among the soil analysis techniques, laser-induced breakdown spectroscopy (LIBS) has become a prominent technology for real-time characterization of soil properties. LIBS coupled with supervised machine l...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Spectrochimica acta. Part B: Atomic spectroscopy 2022-07, Vol.193, p.106451, Article 106451
Hauptverfasser: Huang, Yingchao, Bais, Abdul
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page 106451
container_title Spectrochimica acta. Part B: Atomic spectroscopy
container_volume 193
creator Huang, Yingchao
Bais, Abdul
description Accurate classification of soil types and contamination is crucial for crops' productivity. Among the soil analysis techniques, laser-induced breakdown spectroscopy (LIBS) has become a prominent technology for real-time characterization of soil properties. LIBS coupled with supervised machine learning and chemometrics methods (e.g., partial least squares discriminate analysis (PLS-DA), principal component analysis (PCA)) has demonstrated great capabilities for soils classification. However, when the training and test spectra have different distribution and not representative of each other, there are generalization issues, which make the model trained on training spectra hard to adapt to test spectra. In this work we propose a method to calibrate the test spectra using the median of principal components (PCs). PCA is used to analyze the spectra distribution. We independently compute the median of both training's and test's PCs, and then the test's median is adjusted based on its differences with training's. With the calibrated PCs, the test spectra is reconstructed accordingly. To test the performance of the proposed calibration algorithm, we conduct experiments on a publicly available challenging LIBS dataset. We compare our calibration algorithm with the current best performing calibration method on the same test set, using the same machine learning (ML) algorithm, PLS-DA, trained with the same training set. Our method improves the test accuracy by 1.2%. The reason using PLS-DA for performance comparison is that it is currently the best performing ML algorithm. To further improve the test accuracy, other ML algorithms are investigated. Convolutional neural networks (CNN) have achieved good accuracy in lithological classification with LIBS recently. Therefore, it is extended in this work to soil classification. We use CNN as a tool for feature extraction and as an end-to-end classifier. We use the CNN based extraction mechanism with other classifiers, such as support vector machine (SVM) and random forest (RF), for soil classification. The performance of CNN models on the calibrated test spectra is compared, which concludes that CNN combined with SVM achieves the best accuracy and improves the test accuracy by 3.1% compared to the best performing ML algorithm PLS-DA. [Display omitted] •When training and test spectra are not representative of each other, there will be generalization issues.•Test set calibration helps with the model generalization.•PCA-based
doi_str_mv 10.1016/j.sab.2022.106451
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2688585080</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0584854722000957</els_id><sourcerecordid>2688585080</sourcerecordid><originalsourceid>FETCH-LOGICAL-c255t-69f2e0ee70ef20599a32360ddcfcd08aaadd6daa9a6efa81f098edcb6bea9cf53</originalsourceid><addsrcrecordid>eNp9kEFv2zAMhYWhA5Zm-wG7CejZmSxHioyegmDtBgRoD91ZoCUqkadYnuS0yGl_fQq8c08E8fgeyY-QrzVb1ayW3_pVhm7FGeell2tRfyCLWm2aqhFS3JAFE2pdKbHefCK3OfeMMS64WJC_WzrEVwz0ebetOshoqYHguwSTjwOFcIjJT8cTdTFREyBn77yZxeioOUIIOBz8cKBFxFT5wZ5NSekSwm8b3waaRzRTitnE8UJz9IFmOI0BqYUJPpOPDkLGL__rkvx6-P6y-1Htnx5_7rb7ynAhpkq2jiND3DB0nIm2hYY3kllrnLFMAYC10gK0INGBqh1rFVrTyQ6hNU40S3I3544p_jljnnQfz2koKzWXSgklmGJlqp6nTLk3J3R6TP4E6aJrpq-cda8LZ33lrGfOxXM_e7Cc_-ox6Ww8DoWBT-VxbaN_x_0P3RmJfA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2688585080</pqid></control><display><type>article</type><title>A novel PCA-based calibration algorithm for classification of challenging laser-induced breakdown spectroscopy soil sample data</title><source>Elsevier ScienceDirect Journals</source><creator>Huang, Yingchao ; Bais, Abdul</creator><creatorcontrib>Huang, Yingchao ; Bais, Abdul</creatorcontrib><description>Accurate classification of soil types and contamination is crucial for crops' productivity. Among the soil analysis techniques, laser-induced breakdown spectroscopy (LIBS) has become a prominent technology for real-time characterization of soil properties. LIBS coupled with supervised machine learning and chemometrics methods (e.g., partial least squares discriminate analysis (PLS-DA), principal component analysis (PCA)) has demonstrated great capabilities for soils classification. However, when the training and test spectra have different distribution and not representative of each other, there are generalization issues, which make the model trained on training spectra hard to adapt to test spectra. In this work we propose a method to calibrate the test spectra using the median of principal components (PCs). PCA is used to analyze the spectra distribution. We independently compute the median of both training's and test's PCs, and then the test's median is adjusted based on its differences with training's. With the calibrated PCs, the test spectra is reconstructed accordingly. To test the performance of the proposed calibration algorithm, we conduct experiments on a publicly available challenging LIBS dataset. We compare our calibration algorithm with the current best performing calibration method on the same test set, using the same machine learning (ML) algorithm, PLS-DA, trained with the same training set. Our method improves the test accuracy by 1.2%. The reason using PLS-DA for performance comparison is that it is currently the best performing ML algorithm. To further improve the test accuracy, other ML algorithms are investigated. Convolutional neural networks (CNN) have achieved good accuracy in lithological classification with LIBS recently. Therefore, it is extended in this work to soil classification. We use CNN as a tool for feature extraction and as an end-to-end classifier. We use the CNN based extraction mechanism with other classifiers, such as support vector machine (SVM) and random forest (RF), for soil classification. The performance of CNN models on the calibrated test spectra is compared, which concludes that CNN combined with SVM achieves the best accuracy and improves the test accuracy by 3.1% compared to the best performing ML algorithm PLS-DA. [Display omitted] •When training and test spectra are not representative of each other, there will be generalization issues.•Test set calibration helps with the model generalization.•PCA-based calibration algorithm adjusts the test principal components (PCs) by its median difference with training's PCs.•PCA-based calibration algorithm reduces the differences between training and test set, which improves the test accuracy.•CNN, working for feature extraction, combined with SVM, improves the test performance.</description><identifier>ISSN: 0584-8547</identifier><identifier>EISSN: 1873-3565</identifier><identifier>DOI: 10.1016/j.sab.2022.106451</identifier><language>eng</language><publisher>Oxford: Elsevier B.V</publisher><subject>Accuracy ; Algorithms ; Analysis ; Analytical methods ; Artificial neural networks ; Calibration ; Classification ; Classifiers ; CNN ; Contamination ; Data calibration ; Discriminant analysis ; Distribution ; Feature extraction ; Laser induced breakdown spectroscopy ; Lasers ; Learning algorithms ; LIBS ; Lithology ; Machine learning ; Neural networks ; PCA median ; Principal components analysis ; Soil analysis ; Soil classification ; Soil contamination ; Soil pollution ; Soil properties ; Soil types ; Spectra ; Spectroscopy ; Spectrum analysis ; Support vector machines ; Training</subject><ispartof>Spectrochimica acta. Part B: Atomic spectroscopy, 2022-07, Vol.193, p.106451, Article 106451</ispartof><rights>2022 Elsevier B.V.</rights><rights>Copyright Elsevier BV Jul 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c255t-69f2e0ee70ef20599a32360ddcfcd08aaadd6daa9a6efa81f098edcb6bea9cf53</citedby><cites>FETCH-LOGICAL-c255t-69f2e0ee70ef20599a32360ddcfcd08aaadd6daa9a6efa81f098edcb6bea9cf53</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0584854722000957$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids></links><search><creatorcontrib>Huang, Yingchao</creatorcontrib><creatorcontrib>Bais, Abdul</creatorcontrib><title>A novel PCA-based calibration algorithm for classification of challenging laser-induced breakdown spectroscopy soil sample data</title><title>Spectrochimica acta. Part B: Atomic spectroscopy</title><description>Accurate classification of soil types and contamination is crucial for crops' productivity. Among the soil analysis techniques, laser-induced breakdown spectroscopy (LIBS) has become a prominent technology for real-time characterization of soil properties. LIBS coupled with supervised machine learning and chemometrics methods (e.g., partial least squares discriminate analysis (PLS-DA), principal component analysis (PCA)) has demonstrated great capabilities for soils classification. However, when the training and test spectra have different distribution and not representative of each other, there are generalization issues, which make the model trained on training spectra hard to adapt to test spectra. In this work we propose a method to calibrate the test spectra using the median of principal components (PCs). PCA is used to analyze the spectra distribution. We independently compute the median of both training's and test's PCs, and then the test's median is adjusted based on its differences with training's. With the calibrated PCs, the test spectra is reconstructed accordingly. To test the performance of the proposed calibration algorithm, we conduct experiments on a publicly available challenging LIBS dataset. We compare our calibration algorithm with the current best performing calibration method on the same test set, using the same machine learning (ML) algorithm, PLS-DA, trained with the same training set. Our method improves the test accuracy by 1.2%. The reason using PLS-DA for performance comparison is that it is currently the best performing ML algorithm. To further improve the test accuracy, other ML algorithms are investigated. Convolutional neural networks (CNN) have achieved good accuracy in lithological classification with LIBS recently. Therefore, it is extended in this work to soil classification. We use CNN as a tool for feature extraction and as an end-to-end classifier. We use the CNN based extraction mechanism with other classifiers, such as support vector machine (SVM) and random forest (RF), for soil classification. The performance of CNN models on the calibrated test spectra is compared, which concludes that CNN combined with SVM achieves the best accuracy and improves the test accuracy by 3.1% compared to the best performing ML algorithm PLS-DA. [Display omitted] •When training and test spectra are not representative of each other, there will be generalization issues.•Test set calibration helps with the model generalization.•PCA-based calibration algorithm adjusts the test principal components (PCs) by its median difference with training's PCs.•PCA-based calibration algorithm reduces the differences between training and test set, which improves the test accuracy.•CNN, working for feature extraction, combined with SVM, improves the test performance.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Analysis</subject><subject>Analytical methods</subject><subject>Artificial neural networks</subject><subject>Calibration</subject><subject>Classification</subject><subject>Classifiers</subject><subject>CNN</subject><subject>Contamination</subject><subject>Data calibration</subject><subject>Discriminant analysis</subject><subject>Distribution</subject><subject>Feature extraction</subject><subject>Laser induced breakdown spectroscopy</subject><subject>Lasers</subject><subject>Learning algorithms</subject><subject>LIBS</subject><subject>Lithology</subject><subject>Machine learning</subject><subject>Neural networks</subject><subject>PCA median</subject><subject>Principal components analysis</subject><subject>Soil analysis</subject><subject>Soil classification</subject><subject>Soil contamination</subject><subject>Soil pollution</subject><subject>Soil properties</subject><subject>Soil types</subject><subject>Spectra</subject><subject>Spectroscopy</subject><subject>Spectrum analysis</subject><subject>Support vector machines</subject><subject>Training</subject><issn>0584-8547</issn><issn>1873-3565</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kEFv2zAMhYWhA5Zm-wG7CejZmSxHioyegmDtBgRoD91ZoCUqkadYnuS0yGl_fQq8c08E8fgeyY-QrzVb1ayW3_pVhm7FGeell2tRfyCLWm2aqhFS3JAFE2pdKbHefCK3OfeMMS64WJC_WzrEVwz0ebetOshoqYHguwSTjwOFcIjJT8cTdTFREyBn77yZxeioOUIIOBz8cKBFxFT5wZ5NSekSwm8b3waaRzRTitnE8UJz9IFmOI0BqYUJPpOPDkLGL__rkvx6-P6y-1Htnx5_7rb7ynAhpkq2jiND3DB0nIm2hYY3kllrnLFMAYC10gK0INGBqh1rFVrTyQ6hNU40S3I3544p_jljnnQfz2koKzWXSgklmGJlqp6nTLk3J3R6TP4E6aJrpq-cda8LZ33lrGfOxXM_e7Cc_-ox6Ww8DoWBT-VxbaN_x_0P3RmJfA</recordid><startdate>202207</startdate><enddate>202207</enddate><creator>Huang, Yingchao</creator><creator>Bais, Abdul</creator><general>Elsevier B.V</general><general>Elsevier BV</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7QH</scope><scope>7SR</scope><scope>7U5</scope><scope>7UA</scope><scope>8FD</scope><scope>C1K</scope><scope>F1W</scope><scope>H97</scope><scope>JG9</scope><scope>L.G</scope><scope>L7M</scope></search><sort><creationdate>202207</creationdate><title>A novel PCA-based calibration algorithm for classification of challenging laser-induced breakdown spectroscopy soil sample data</title><author>Huang, Yingchao ; Bais, Abdul</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c255t-69f2e0ee70ef20599a32360ddcfcd08aaadd6daa9a6efa81f098edcb6bea9cf53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Analysis</topic><topic>Analytical methods</topic><topic>Artificial neural networks</topic><topic>Calibration</topic><topic>Classification</topic><topic>Classifiers</topic><topic>CNN</topic><topic>Contamination</topic><topic>Data calibration</topic><topic>Discriminant analysis</topic><topic>Distribution</topic><topic>Feature extraction</topic><topic>Laser induced breakdown spectroscopy</topic><topic>Lasers</topic><topic>Learning algorithms</topic><topic>LIBS</topic><topic>Lithology</topic><topic>Machine learning</topic><topic>Neural networks</topic><topic>PCA median</topic><topic>Principal components analysis</topic><topic>Soil analysis</topic><topic>Soil classification</topic><topic>Soil contamination</topic><topic>Soil pollution</topic><topic>Soil properties</topic><topic>Soil types</topic><topic>Spectra</topic><topic>Spectroscopy</topic><topic>Spectrum analysis</topic><topic>Support vector machines</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Huang, Yingchao</creatorcontrib><creatorcontrib>Bais, Abdul</creatorcontrib><collection>CrossRef</collection><collection>Aqualine</collection><collection>Engineered Materials Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>Water Resources Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ASFA: Aquatic Sciences and Fisheries Abstracts</collection><collection>Aquatic Science &amp; Fisheries Abstracts (ASFA) 3: Aquatic Pollution &amp; Environmental Quality</collection><collection>Materials Research Database</collection><collection>Aquatic Science &amp; Fisheries Abstracts (ASFA) Professional</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>Spectrochimica acta. Part B: Atomic spectroscopy</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Huang, Yingchao</au><au>Bais, Abdul</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A novel PCA-based calibration algorithm for classification of challenging laser-induced breakdown spectroscopy soil sample data</atitle><jtitle>Spectrochimica acta. Part B: Atomic spectroscopy</jtitle><date>2022-07</date><risdate>2022</risdate><volume>193</volume><spage>106451</spage><pages>106451-</pages><artnum>106451</artnum><issn>0584-8547</issn><eissn>1873-3565</eissn><abstract>Accurate classification of soil types and contamination is crucial for crops' productivity. Among the soil analysis techniques, laser-induced breakdown spectroscopy (LIBS) has become a prominent technology for real-time characterization of soil properties. LIBS coupled with supervised machine learning and chemometrics methods (e.g., partial least squares discriminate analysis (PLS-DA), principal component analysis (PCA)) has demonstrated great capabilities for soils classification. However, when the training and test spectra have different distribution and not representative of each other, there are generalization issues, which make the model trained on training spectra hard to adapt to test spectra. In this work we propose a method to calibrate the test spectra using the median of principal components (PCs). PCA is used to analyze the spectra distribution. We independently compute the median of both training's and test's PCs, and then the test's median is adjusted based on its differences with training's. With the calibrated PCs, the test spectra is reconstructed accordingly. To test the performance of the proposed calibration algorithm, we conduct experiments on a publicly available challenging LIBS dataset. We compare our calibration algorithm with the current best performing calibration method on the same test set, using the same machine learning (ML) algorithm, PLS-DA, trained with the same training set. Our method improves the test accuracy by 1.2%. The reason using PLS-DA for performance comparison is that it is currently the best performing ML algorithm. To further improve the test accuracy, other ML algorithms are investigated. Convolutional neural networks (CNN) have achieved good accuracy in lithological classification with LIBS recently. Therefore, it is extended in this work to soil classification. We use CNN as a tool for feature extraction and as an end-to-end classifier. We use the CNN based extraction mechanism with other classifiers, such as support vector machine (SVM) and random forest (RF), for soil classification. The performance of CNN models on the calibrated test spectra is compared, which concludes that CNN combined with SVM achieves the best accuracy and improves the test accuracy by 3.1% compared to the best performing ML algorithm PLS-DA. [Display omitted] •When training and test spectra are not representative of each other, there will be generalization issues.•Test set calibration helps with the model generalization.•PCA-based calibration algorithm adjusts the test principal components (PCs) by its median difference with training's PCs.•PCA-based calibration algorithm reduces the differences between training and test set, which improves the test accuracy.•CNN, working for feature extraction, combined with SVM, improves the test performance.</abstract><cop>Oxford</cop><pub>Elsevier B.V</pub><doi>10.1016/j.sab.2022.106451</doi></addata></record>
fulltext fulltext
identifier ISSN: 0584-8547
ispartof Spectrochimica acta. Part B: Atomic spectroscopy, 2022-07, Vol.193, p.106451, Article 106451
issn 0584-8547
1873-3565
language eng
recordid cdi_proquest_journals_2688585080
source Elsevier ScienceDirect Journals
subjects Accuracy
Algorithms
Analysis
Analytical methods
Artificial neural networks
Calibration
Classification
Classifiers
CNN
Contamination
Data calibration
Discriminant analysis
Distribution
Feature extraction
Laser induced breakdown spectroscopy
Lasers
Learning algorithms
LIBS
Lithology
Machine learning
Neural networks
PCA median
Principal components analysis
Soil analysis
Soil classification
Soil contamination
Soil pollution
Soil properties
Soil types
Spectra
Spectroscopy
Spectrum analysis
Support vector machines
Training
title A novel PCA-based calibration algorithm for classification of challenging laser-induced breakdown spectroscopy soil sample data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T07%3A15%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20novel%20PCA-based%20calibration%20algorithm%20for%20classification%20of%20challenging%20laser-induced%20breakdown%20spectroscopy%20soil%20sample%20data&rft.jtitle=Spectrochimica%20acta.%20Part%20B:%20Atomic%20spectroscopy&rft.au=Huang,%20Yingchao&rft.date=2022-07&rft.volume=193&rft.spage=106451&rft.pages=106451-&rft.artnum=106451&rft.issn=0584-8547&rft.eissn=1873-3565&rft_id=info:doi/10.1016/j.sab.2022.106451&rft_dat=%3Cproquest_cross%3E2688585080%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2688585080&rft_id=info:pmid/&rft_els_id=S0584854722000957&rfr_iscdi=true