A novel PCA-based calibration algorithm for classification of challenging laser-induced breakdown spectroscopy soil sample data
Accurate classification of soil types and contamination is crucial for crops' productivity. Among the soil analysis techniques, laser-induced breakdown spectroscopy (LIBS) has become a prominent technology for real-time characterization of soil properties. LIBS coupled with supervised machine l...
Gespeichert in:
Veröffentlicht in: | Spectrochimica acta. Part B: Atomic spectroscopy 2022-07, Vol.193, p.106451, Article 106451 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | 106451 |
container_title | Spectrochimica acta. Part B: Atomic spectroscopy |
container_volume | 193 |
creator | Huang, Yingchao Bais, Abdul |
description | Accurate classification of soil types and contamination is crucial for crops' productivity. Among the soil analysis techniques, laser-induced breakdown spectroscopy (LIBS) has become a prominent technology for real-time characterization of soil properties. LIBS coupled with supervised machine learning and chemometrics methods (e.g., partial least squares discriminate analysis (PLS-DA), principal component analysis (PCA)) has demonstrated great capabilities for soils classification. However, when the training and test spectra have different distribution and not representative of each other, there are generalization issues, which make the model trained on training spectra hard to adapt to test spectra. In this work we propose a method to calibrate the test spectra using the median of principal components (PCs). PCA is used to analyze the spectra distribution. We independently compute the median of both training's and test's PCs, and then the test's median is adjusted based on its differences with training's. With the calibrated PCs, the test spectra is reconstructed accordingly. To test the performance of the proposed calibration algorithm, we conduct experiments on a publicly available challenging LIBS dataset. We compare our calibration algorithm with the current best performing calibration method on the same test set, using the same machine learning (ML) algorithm, PLS-DA, trained with the same training set. Our method improves the test accuracy by 1.2%. The reason using PLS-DA for performance comparison is that it is currently the best performing ML algorithm. To further improve the test accuracy, other ML algorithms are investigated. Convolutional neural networks (CNN) have achieved good accuracy in lithological classification with LIBS recently. Therefore, it is extended in this work to soil classification. We use CNN as a tool for feature extraction and as an end-to-end classifier. We use the CNN based extraction mechanism with other classifiers, such as support vector machine (SVM) and random forest (RF), for soil classification. The performance of CNN models on the calibrated test spectra is compared, which concludes that CNN combined with SVM achieves the best accuracy and improves the test accuracy by 3.1% compared to the best performing ML algorithm PLS-DA.
[Display omitted]
•When training and test spectra are not representative of each other, there will be generalization issues.•Test set calibration helps with the model generalization.•PCA-based |
doi_str_mv | 10.1016/j.sab.2022.106451 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2688585080</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0584854722000957</els_id><sourcerecordid>2688585080</sourcerecordid><originalsourceid>FETCH-LOGICAL-c255t-69f2e0ee70ef20599a32360ddcfcd08aaadd6daa9a6efa81f098edcb6bea9cf53</originalsourceid><addsrcrecordid>eNp9kEFv2zAMhYWhA5Zm-wG7CejZmSxHioyegmDtBgRoD91ZoCUqkadYnuS0yGl_fQq8c08E8fgeyY-QrzVb1ayW3_pVhm7FGeell2tRfyCLWm2aqhFS3JAFE2pdKbHefCK3OfeMMS64WJC_WzrEVwz0ebetOshoqYHguwSTjwOFcIjJT8cTdTFREyBn77yZxeioOUIIOBz8cKBFxFT5wZ5NSekSwm8b3waaRzRTitnE8UJz9IFmOI0BqYUJPpOPDkLGL__rkvx6-P6y-1Htnx5_7rb7ynAhpkq2jiND3DB0nIm2hYY3kllrnLFMAYC10gK0INGBqh1rFVrTyQ6hNU40S3I3544p_jljnnQfz2koKzWXSgklmGJlqp6nTLk3J3R6TP4E6aJrpq-cda8LZ33lrGfOxXM_e7Cc_-ox6Ww8DoWBT-VxbaN_x_0P3RmJfA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2688585080</pqid></control><display><type>article</type><title>A novel PCA-based calibration algorithm for classification of challenging laser-induced breakdown spectroscopy soil sample data</title><source>Elsevier ScienceDirect Journals</source><creator>Huang, Yingchao ; Bais, Abdul</creator><creatorcontrib>Huang, Yingchao ; Bais, Abdul</creatorcontrib><description>Accurate classification of soil types and contamination is crucial for crops' productivity. Among the soil analysis techniques, laser-induced breakdown spectroscopy (LIBS) has become a prominent technology for real-time characterization of soil properties. LIBS coupled with supervised machine learning and chemometrics methods (e.g., partial least squares discriminate analysis (PLS-DA), principal component analysis (PCA)) has demonstrated great capabilities for soils classification. However, when the training and test spectra have different distribution and not representative of each other, there are generalization issues, which make the model trained on training spectra hard to adapt to test spectra. In this work we propose a method to calibrate the test spectra using the median of principal components (PCs). PCA is used to analyze the spectra distribution. We independently compute the median of both training's and test's PCs, and then the test's median is adjusted based on its differences with training's. With the calibrated PCs, the test spectra is reconstructed accordingly. To test the performance of the proposed calibration algorithm, we conduct experiments on a publicly available challenging LIBS dataset. We compare our calibration algorithm with the current best performing calibration method on the same test set, using the same machine learning (ML) algorithm, PLS-DA, trained with the same training set. Our method improves the test accuracy by 1.2%. The reason using PLS-DA for performance comparison is that it is currently the best performing ML algorithm. To further improve the test accuracy, other ML algorithms are investigated. Convolutional neural networks (CNN) have achieved good accuracy in lithological classification with LIBS recently. Therefore, it is extended in this work to soil classification. We use CNN as a tool for feature extraction and as an end-to-end classifier. We use the CNN based extraction mechanism with other classifiers, such as support vector machine (SVM) and random forest (RF), for soil classification. The performance of CNN models on the calibrated test spectra is compared, which concludes that CNN combined with SVM achieves the best accuracy and improves the test accuracy by 3.1% compared to the best performing ML algorithm PLS-DA.
[Display omitted]
•When training and test spectra are not representative of each other, there will be generalization issues.•Test set calibration helps with the model generalization.•PCA-based calibration algorithm adjusts the test principal components (PCs) by its median difference with training's PCs.•PCA-based calibration algorithm reduces the differences between training and test set, which improves the test accuracy.•CNN, working for feature extraction, combined with SVM, improves the test performance.</description><identifier>ISSN: 0584-8547</identifier><identifier>EISSN: 1873-3565</identifier><identifier>DOI: 10.1016/j.sab.2022.106451</identifier><language>eng</language><publisher>Oxford: Elsevier B.V</publisher><subject>Accuracy ; Algorithms ; Analysis ; Analytical methods ; Artificial neural networks ; Calibration ; Classification ; Classifiers ; CNN ; Contamination ; Data calibration ; Discriminant analysis ; Distribution ; Feature extraction ; Laser induced breakdown spectroscopy ; Lasers ; Learning algorithms ; LIBS ; Lithology ; Machine learning ; Neural networks ; PCA median ; Principal components analysis ; Soil analysis ; Soil classification ; Soil contamination ; Soil pollution ; Soil properties ; Soil types ; Spectra ; Spectroscopy ; Spectrum analysis ; Support vector machines ; Training</subject><ispartof>Spectrochimica acta. Part B: Atomic spectroscopy, 2022-07, Vol.193, p.106451, Article 106451</ispartof><rights>2022 Elsevier B.V.</rights><rights>Copyright Elsevier BV Jul 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c255t-69f2e0ee70ef20599a32360ddcfcd08aaadd6daa9a6efa81f098edcb6bea9cf53</citedby><cites>FETCH-LOGICAL-c255t-69f2e0ee70ef20599a32360ddcfcd08aaadd6daa9a6efa81f098edcb6bea9cf53</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0584854722000957$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids></links><search><creatorcontrib>Huang, Yingchao</creatorcontrib><creatorcontrib>Bais, Abdul</creatorcontrib><title>A novel PCA-based calibration algorithm for classification of challenging laser-induced breakdown spectroscopy soil sample data</title><title>Spectrochimica acta. Part B: Atomic spectroscopy</title><description>Accurate classification of soil types and contamination is crucial for crops' productivity. Among the soil analysis techniques, laser-induced breakdown spectroscopy (LIBS) has become a prominent technology for real-time characterization of soil properties. LIBS coupled with supervised machine learning and chemometrics methods (e.g., partial least squares discriminate analysis (PLS-DA), principal component analysis (PCA)) has demonstrated great capabilities for soils classification. However, when the training and test spectra have different distribution and not representative of each other, there are generalization issues, which make the model trained on training spectra hard to adapt to test spectra. In this work we propose a method to calibrate the test spectra using the median of principal components (PCs). PCA is used to analyze the spectra distribution. We independently compute the median of both training's and test's PCs, and then the test's median is adjusted based on its differences with training's. With the calibrated PCs, the test spectra is reconstructed accordingly. To test the performance of the proposed calibration algorithm, we conduct experiments on a publicly available challenging LIBS dataset. We compare our calibration algorithm with the current best performing calibration method on the same test set, using the same machine learning (ML) algorithm, PLS-DA, trained with the same training set. Our method improves the test accuracy by 1.2%. The reason using PLS-DA for performance comparison is that it is currently the best performing ML algorithm. To further improve the test accuracy, other ML algorithms are investigated. Convolutional neural networks (CNN) have achieved good accuracy in lithological classification with LIBS recently. Therefore, it is extended in this work to soil classification. We use CNN as a tool for feature extraction and as an end-to-end classifier. We use the CNN based extraction mechanism with other classifiers, such as support vector machine (SVM) and random forest (RF), for soil classification. The performance of CNN models on the calibrated test spectra is compared, which concludes that CNN combined with SVM achieves the best accuracy and improves the test accuracy by 3.1% compared to the best performing ML algorithm PLS-DA.
[Display omitted]
•When training and test spectra are not representative of each other, there will be generalization issues.•Test set calibration helps with the model generalization.•PCA-based calibration algorithm adjusts the test principal components (PCs) by its median difference with training's PCs.•PCA-based calibration algorithm reduces the differences between training and test set, which improves the test accuracy.•CNN, working for feature extraction, combined with SVM, improves the test performance.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Analysis</subject><subject>Analytical methods</subject><subject>Artificial neural networks</subject><subject>Calibration</subject><subject>Classification</subject><subject>Classifiers</subject><subject>CNN</subject><subject>Contamination</subject><subject>Data calibration</subject><subject>Discriminant analysis</subject><subject>Distribution</subject><subject>Feature extraction</subject><subject>Laser induced breakdown spectroscopy</subject><subject>Lasers</subject><subject>Learning algorithms</subject><subject>LIBS</subject><subject>Lithology</subject><subject>Machine learning</subject><subject>Neural networks</subject><subject>PCA median</subject><subject>Principal components analysis</subject><subject>Soil analysis</subject><subject>Soil classification</subject><subject>Soil contamination</subject><subject>Soil pollution</subject><subject>Soil properties</subject><subject>Soil types</subject><subject>Spectra</subject><subject>Spectroscopy</subject><subject>Spectrum analysis</subject><subject>Support vector machines</subject><subject>Training</subject><issn>0584-8547</issn><issn>1873-3565</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kEFv2zAMhYWhA5Zm-wG7CejZmSxHioyegmDtBgRoD91ZoCUqkadYnuS0yGl_fQq8c08E8fgeyY-QrzVb1ayW3_pVhm7FGeell2tRfyCLWm2aqhFS3JAFE2pdKbHefCK3OfeMMS64WJC_WzrEVwz0ebetOshoqYHguwSTjwOFcIjJT8cTdTFREyBn77yZxeioOUIIOBz8cKBFxFT5wZ5NSekSwm8b3waaRzRTitnE8UJz9IFmOI0BqYUJPpOPDkLGL__rkvx6-P6y-1Htnx5_7rb7ynAhpkq2jiND3DB0nIm2hYY3kllrnLFMAYC10gK0INGBqh1rFVrTyQ6hNU40S3I3544p_jljnnQfz2koKzWXSgklmGJlqp6nTLk3J3R6TP4E6aJrpq-cda8LZ33lrGfOxXM_e7Cc_-ox6Ww8DoWBT-VxbaN_x_0P3RmJfA</recordid><startdate>202207</startdate><enddate>202207</enddate><creator>Huang, Yingchao</creator><creator>Bais, Abdul</creator><general>Elsevier B.V</general><general>Elsevier BV</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7QH</scope><scope>7SR</scope><scope>7U5</scope><scope>7UA</scope><scope>8FD</scope><scope>C1K</scope><scope>F1W</scope><scope>H97</scope><scope>JG9</scope><scope>L.G</scope><scope>L7M</scope></search><sort><creationdate>202207</creationdate><title>A novel PCA-based calibration algorithm for classification of challenging laser-induced breakdown spectroscopy soil sample data</title><author>Huang, Yingchao ; Bais, Abdul</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c255t-69f2e0ee70ef20599a32360ddcfcd08aaadd6daa9a6efa81f098edcb6bea9cf53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Analysis</topic><topic>Analytical methods</topic><topic>Artificial neural networks</topic><topic>Calibration</topic><topic>Classification</topic><topic>Classifiers</topic><topic>CNN</topic><topic>Contamination</topic><topic>Data calibration</topic><topic>Discriminant analysis</topic><topic>Distribution</topic><topic>Feature extraction</topic><topic>Laser induced breakdown spectroscopy</topic><topic>Lasers</topic><topic>Learning algorithms</topic><topic>LIBS</topic><topic>Lithology</topic><topic>Machine learning</topic><topic>Neural networks</topic><topic>PCA median</topic><topic>Principal components analysis</topic><topic>Soil analysis</topic><topic>Soil classification</topic><topic>Soil contamination</topic><topic>Soil pollution</topic><topic>Soil properties</topic><topic>Soil types</topic><topic>Spectra</topic><topic>Spectroscopy</topic><topic>Spectrum analysis</topic><topic>Support vector machines</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Huang, Yingchao</creatorcontrib><creatorcontrib>Bais, Abdul</creatorcontrib><collection>CrossRef</collection><collection>Aqualine</collection><collection>Engineered Materials Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>Water Resources Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ASFA: Aquatic Sciences and Fisheries Abstracts</collection><collection>Aquatic Science & Fisheries Abstracts (ASFA) 3: Aquatic Pollution & Environmental Quality</collection><collection>Materials Research Database</collection><collection>Aquatic Science & Fisheries Abstracts (ASFA) Professional</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>Spectrochimica acta. Part B: Atomic spectroscopy</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Huang, Yingchao</au><au>Bais, Abdul</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A novel PCA-based calibration algorithm for classification of challenging laser-induced breakdown spectroscopy soil sample data</atitle><jtitle>Spectrochimica acta. Part B: Atomic spectroscopy</jtitle><date>2022-07</date><risdate>2022</risdate><volume>193</volume><spage>106451</spage><pages>106451-</pages><artnum>106451</artnum><issn>0584-8547</issn><eissn>1873-3565</eissn><abstract>Accurate classification of soil types and contamination is crucial for crops' productivity. Among the soil analysis techniques, laser-induced breakdown spectroscopy (LIBS) has become a prominent technology for real-time characterization of soil properties. LIBS coupled with supervised machine learning and chemometrics methods (e.g., partial least squares discriminate analysis (PLS-DA), principal component analysis (PCA)) has demonstrated great capabilities for soils classification. However, when the training and test spectra have different distribution and not representative of each other, there are generalization issues, which make the model trained on training spectra hard to adapt to test spectra. In this work we propose a method to calibrate the test spectra using the median of principal components (PCs). PCA is used to analyze the spectra distribution. We independently compute the median of both training's and test's PCs, and then the test's median is adjusted based on its differences with training's. With the calibrated PCs, the test spectra is reconstructed accordingly. To test the performance of the proposed calibration algorithm, we conduct experiments on a publicly available challenging LIBS dataset. We compare our calibration algorithm with the current best performing calibration method on the same test set, using the same machine learning (ML) algorithm, PLS-DA, trained with the same training set. Our method improves the test accuracy by 1.2%. The reason using PLS-DA for performance comparison is that it is currently the best performing ML algorithm. To further improve the test accuracy, other ML algorithms are investigated. Convolutional neural networks (CNN) have achieved good accuracy in lithological classification with LIBS recently. Therefore, it is extended in this work to soil classification. We use CNN as a tool for feature extraction and as an end-to-end classifier. We use the CNN based extraction mechanism with other classifiers, such as support vector machine (SVM) and random forest (RF), for soil classification. The performance of CNN models on the calibrated test spectra is compared, which concludes that CNN combined with SVM achieves the best accuracy and improves the test accuracy by 3.1% compared to the best performing ML algorithm PLS-DA.
[Display omitted]
•When training and test spectra are not representative of each other, there will be generalization issues.•Test set calibration helps with the model generalization.•PCA-based calibration algorithm adjusts the test principal components (PCs) by its median difference with training's PCs.•PCA-based calibration algorithm reduces the differences between training and test set, which improves the test accuracy.•CNN, working for feature extraction, combined with SVM, improves the test performance.</abstract><cop>Oxford</cop><pub>Elsevier B.V</pub><doi>10.1016/j.sab.2022.106451</doi></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0584-8547 |
ispartof | Spectrochimica acta. Part B: Atomic spectroscopy, 2022-07, Vol.193, p.106451, Article 106451 |
issn | 0584-8547 1873-3565 |
language | eng |
recordid | cdi_proquest_journals_2688585080 |
source | Elsevier ScienceDirect Journals |
subjects | Accuracy Algorithms Analysis Analytical methods Artificial neural networks Calibration Classification Classifiers CNN Contamination Data calibration Discriminant analysis Distribution Feature extraction Laser induced breakdown spectroscopy Lasers Learning algorithms LIBS Lithology Machine learning Neural networks PCA median Principal components analysis Soil analysis Soil classification Soil contamination Soil pollution Soil properties Soil types Spectra Spectroscopy Spectrum analysis Support vector machines Training |
title | A novel PCA-based calibration algorithm for classification of challenging laser-induced breakdown spectroscopy soil sample data |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T07%3A15%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20novel%20PCA-based%20calibration%20algorithm%20for%20classification%20of%20challenging%20laser-induced%20breakdown%20spectroscopy%20soil%20sample%20data&rft.jtitle=Spectrochimica%20acta.%20Part%20B:%20Atomic%20spectroscopy&rft.au=Huang,%20Yingchao&rft.date=2022-07&rft.volume=193&rft.spage=106451&rft.pages=106451-&rft.artnum=106451&rft.issn=0584-8547&rft.eissn=1873-3565&rft_id=info:doi/10.1016/j.sab.2022.106451&rft_dat=%3Cproquest_cross%3E2688585080%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2688585080&rft_id=info:pmid/&rft_els_id=S0584854722000957&rfr_iscdi=true |