Data reduction by randomization subsampling for the study of large hyperspectral datasets
Large amount of information in hyperspectral images (HSI) generally makes their analysis (e.g., principal component analysis, PCA) time consuming and often requires a lot of random access memory (RAM) and high computing power. This is particularly problematic for analysis of large images, containing...
Gespeichert in:
Veröffentlicht in: | Analytica chimica acta 2022-05, Vol.1209, p.339793-339793, Article 339793 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 339793 |
---|---|
container_issue | |
container_start_page | 339793 |
container_title | Analytica chimica acta |
container_volume | 1209 |
creator | Cruz-Tirado, J.P. Amigo, José Manuel Barbin, Douglas Fernandes Kucheryavskiy, Sergey |
description | Large amount of information in hyperspectral images (HSI) generally makes their analysis (e.g., principal component analysis, PCA) time consuming and often requires a lot of random access memory (RAM) and high computing power. This is particularly problematic for analysis of large images, containing millions of pixels, which can be created by augmenting series of single images (e.g., in time series analysis). This tutorial explores how data reduction can be used to analyze time series hyperspectral images much faster without losing crucial analytical information. Two of the most common data reduction methods have been chosen from the recent research. The first one uses a simple randomization method called randomized sub-sampling PCA (RSPCA). The second implies a more robust randomization method based on local-rank approximations (rPCA). This manuscript exposes the major benefits and drawbacks of both methods with the spirit of being as didactical as possible for a reader. A comprehensive comparison is made considering the amount of information retained by the PCA models at different compression degrees and the performance time. Extrapolation is also made to the case where the effect of time and any other factor are to be studied simultaneously.
[Display omitted]
•Reduced PCA by randomization saves computing time and RAM memory.•The numerical accuracy of reduced models is as reliable as the full models.•Hyperspectral time series analysis studied in a fraction of computing time and effort.•Two reduced models tested in this manuscript with outstanding results. |
doi_str_mv | 10.1016/j.aca.2022.339793 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2665108731</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0003267022003646</els_id><sourcerecordid>2665108731</sourcerecordid><originalsourceid>FETCH-LOGICAL-c396t-482d5f999293af2ba166967ed7163ce80f356d5239a78cc1021f66809b2be5e43</originalsourceid><addsrcrecordid>eNp9kMtOwzAQRS0EoqXwAWyQl2xS_EicWKwQb6kSG1iwshx70rrKCztBKl-PSwtLVqMZnXulOQidUzKnhIqr9VwbPWeEsTnnMpf8AE1pkfMk5Sw9RFNCCE-YyMkEnYSwjiujJD1GE55lQhZpNkXvd3rQ2IMdzeC6Fpcb7HVru8Z96Z9DGMugm7527RJXncfDCnAYRrvBXYVr7ZeAV5sefOjBDF7X2MbCAEM4RUeVrgOc7ecMvT3cv94-JYuXx-fbm0ViuBRDkhbMZpWUkkmuK1ZqKoQUOdicCm6gIBXPhM0YlzovjKHxhUqIgsiSlZBBymfoctfb--5jhDCoxgUDda1b6MagmBAZJdEKjSjdocZ3IXioVO9do_1GUaK2RtVaRaNqa1TtjMbMxb5-LBuwf4lfhRG43gEQn_x04FUwDloD1vmoRNnO_VP_DX_yho0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2665108731</pqid></control><display><type>article</type><title>Data reduction by randomization subsampling for the study of large hyperspectral datasets</title><source>MEDLINE</source><source>Elsevier ScienceDirect Journals</source><creator>Cruz-Tirado, J.P. ; Amigo, José Manuel ; Barbin, Douglas Fernandes ; Kucheryavskiy, Sergey</creator><creatorcontrib>Cruz-Tirado, J.P. ; Amigo, José Manuel ; Barbin, Douglas Fernandes ; Kucheryavskiy, Sergey</creatorcontrib><description>Large amount of information in hyperspectral images (HSI) generally makes their analysis (e.g., principal component analysis, PCA) time consuming and often requires a lot of random access memory (RAM) and high computing power. This is particularly problematic for analysis of large images, containing millions of pixels, which can be created by augmenting series of single images (e.g., in time series analysis). This tutorial explores how data reduction can be used to analyze time series hyperspectral images much faster without losing crucial analytical information. Two of the most common data reduction methods have been chosen from the recent research. The first one uses a simple randomization method called randomized sub-sampling PCA (RSPCA). The second implies a more robust randomization method based on local-rank approximations (rPCA). This manuscript exposes the major benefits and drawbacks of both methods with the spirit of being as didactical as possible for a reader. A comprehensive comparison is made considering the amount of information retained by the PCA models at different compression degrees and the performance time. Extrapolation is also made to the case where the effect of time and any other factor are to be studied simultaneously.
[Display omitted]
•Reduced PCA by randomization saves computing time and RAM memory.•The numerical accuracy of reduced models is as reliable as the full models.•Hyperspectral time series analysis studied in a fraction of computing time and effort.•Two reduced models tested in this manuscript with outstanding results.</description><identifier>ISSN: 0003-2670</identifier><identifier>EISSN: 1873-4324</identifier><identifier>DOI: 10.1016/j.aca.2022.339793</identifier><identifier>PMID: 35569845</identifier><language>eng</language><publisher>Netherlands: Elsevier B.V</publisher><subject>Data reduction ; Hyperspectral imaging ; Principal Component Analysis ; Random Allocation ; Randomization ; Sub-sampling ; Time series</subject><ispartof>Analytica chimica acta, 2022-05, Vol.1209, p.339793-339793, Article 339793</ispartof><rights>2022 The Author(s)</rights><rights>Copyright © 2022 The Author(s). Published by Elsevier B.V. All rights reserved.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c396t-482d5f999293af2ba166967ed7163ce80f356d5239a78cc1021f66809b2be5e43</citedby><cites>FETCH-LOGICAL-c396t-482d5f999293af2ba166967ed7163ce80f356d5239a78cc1021f66809b2be5e43</cites><orcidid>0000-0003-1319-1312</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0003267022003646$$EHTML$$P50$$Gelsevier$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35569845$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Cruz-Tirado, J.P.</creatorcontrib><creatorcontrib>Amigo, José Manuel</creatorcontrib><creatorcontrib>Barbin, Douglas Fernandes</creatorcontrib><creatorcontrib>Kucheryavskiy, Sergey</creatorcontrib><title>Data reduction by randomization subsampling for the study of large hyperspectral datasets</title><title>Analytica chimica acta</title><addtitle>Anal Chim Acta</addtitle><description>Large amount of information in hyperspectral images (HSI) generally makes their analysis (e.g., principal component analysis, PCA) time consuming and often requires a lot of random access memory (RAM) and high computing power. This is particularly problematic for analysis of large images, containing millions of pixels, which can be created by augmenting series of single images (e.g., in time series analysis). This tutorial explores how data reduction can be used to analyze time series hyperspectral images much faster without losing crucial analytical information. Two of the most common data reduction methods have been chosen from the recent research. The first one uses a simple randomization method called randomized sub-sampling PCA (RSPCA). The second implies a more robust randomization method based on local-rank approximations (rPCA). This manuscript exposes the major benefits and drawbacks of both methods with the spirit of being as didactical as possible for a reader. A comprehensive comparison is made considering the amount of information retained by the PCA models at different compression degrees and the performance time. Extrapolation is also made to the case where the effect of time and any other factor are to be studied simultaneously.
[Display omitted]
•Reduced PCA by randomization saves computing time and RAM memory.•The numerical accuracy of reduced models is as reliable as the full models.•Hyperspectral time series analysis studied in a fraction of computing time and effort.•Two reduced models tested in this manuscript with outstanding results.</description><subject>Data reduction</subject><subject>Hyperspectral imaging</subject><subject>Principal Component Analysis</subject><subject>Random Allocation</subject><subject>Randomization</subject><subject>Sub-sampling</subject><subject>Time series</subject><issn>0003-2670</issn><issn>1873-4324</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp9kMtOwzAQRS0EoqXwAWyQl2xS_EicWKwQb6kSG1iwshx70rrKCztBKl-PSwtLVqMZnXulOQidUzKnhIqr9VwbPWeEsTnnMpf8AE1pkfMk5Sw9RFNCCE-YyMkEnYSwjiujJD1GE55lQhZpNkXvd3rQ2IMdzeC6Fpcb7HVru8Z96Z9DGMugm7527RJXncfDCnAYRrvBXYVr7ZeAV5sefOjBDF7X2MbCAEM4RUeVrgOc7ecMvT3cv94-JYuXx-fbm0ViuBRDkhbMZpWUkkmuK1ZqKoQUOdicCm6gIBXPhM0YlzovjKHxhUqIgsiSlZBBymfoctfb--5jhDCoxgUDda1b6MagmBAZJdEKjSjdocZ3IXioVO9do_1GUaK2RtVaRaNqa1TtjMbMxb5-LBuwf4lfhRG43gEQn_x04FUwDloD1vmoRNnO_VP_DX_yho0</recordid><startdate>20220529</startdate><enddate>20220529</enddate><creator>Cruz-Tirado, J.P.</creator><creator>Amigo, José Manuel</creator><creator>Barbin, Douglas Fernandes</creator><creator>Kucheryavskiy, Sergey</creator><general>Elsevier B.V</general><scope>6I.</scope><scope>AAFTH</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-1319-1312</orcidid></search><sort><creationdate>20220529</creationdate><title>Data reduction by randomization subsampling for the study of large hyperspectral datasets</title><author>Cruz-Tirado, J.P. ; Amigo, José Manuel ; Barbin, Douglas Fernandes ; Kucheryavskiy, Sergey</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c396t-482d5f999293af2ba166967ed7163ce80f356d5239a78cc1021f66809b2be5e43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Data reduction</topic><topic>Hyperspectral imaging</topic><topic>Principal Component Analysis</topic><topic>Random Allocation</topic><topic>Randomization</topic><topic>Sub-sampling</topic><topic>Time series</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cruz-Tirado, J.P.</creatorcontrib><creatorcontrib>Amigo, José Manuel</creatorcontrib><creatorcontrib>Barbin, Douglas Fernandes</creatorcontrib><creatorcontrib>Kucheryavskiy, Sergey</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Analytica chimica acta</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cruz-Tirado, J.P.</au><au>Amigo, José Manuel</au><au>Barbin, Douglas Fernandes</au><au>Kucheryavskiy, Sergey</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Data reduction by randomization subsampling for the study of large hyperspectral datasets</atitle><jtitle>Analytica chimica acta</jtitle><addtitle>Anal Chim Acta</addtitle><date>2022-05-29</date><risdate>2022</risdate><volume>1209</volume><spage>339793</spage><epage>339793</epage><pages>339793-339793</pages><artnum>339793</artnum><issn>0003-2670</issn><eissn>1873-4324</eissn><abstract>Large amount of information in hyperspectral images (HSI) generally makes their analysis (e.g., principal component analysis, PCA) time consuming and often requires a lot of random access memory (RAM) and high computing power. This is particularly problematic for analysis of large images, containing millions of pixels, which can be created by augmenting series of single images (e.g., in time series analysis). This tutorial explores how data reduction can be used to analyze time series hyperspectral images much faster without losing crucial analytical information. Two of the most common data reduction methods have been chosen from the recent research. The first one uses a simple randomization method called randomized sub-sampling PCA (RSPCA). The second implies a more robust randomization method based on local-rank approximations (rPCA). This manuscript exposes the major benefits and drawbacks of both methods with the spirit of being as didactical as possible for a reader. A comprehensive comparison is made considering the amount of information retained by the PCA models at different compression degrees and the performance time. Extrapolation is also made to the case where the effect of time and any other factor are to be studied simultaneously.
[Display omitted]
•Reduced PCA by randomization saves computing time and RAM memory.•The numerical accuracy of reduced models is as reliable as the full models.•Hyperspectral time series analysis studied in a fraction of computing time and effort.•Two reduced models tested in this manuscript with outstanding results.</abstract><cop>Netherlands</cop><pub>Elsevier B.V</pub><pmid>35569845</pmid><doi>10.1016/j.aca.2022.339793</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0003-1319-1312</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0003-2670 |
ispartof | Analytica chimica acta, 2022-05, Vol.1209, p.339793-339793, Article 339793 |
issn | 0003-2670 1873-4324 |
language | eng |
recordid | cdi_proquest_miscellaneous_2665108731 |
source | MEDLINE; Elsevier ScienceDirect Journals |
subjects | Data reduction Hyperspectral imaging Principal Component Analysis Random Allocation Randomization Sub-sampling Time series |
title | Data reduction by randomization subsampling for the study of large hyperspectral datasets |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T23%3A55%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Data%20reduction%20by%20randomization%20subsampling%20for%20the%20study%20of%20large%20hyperspectral%20datasets&rft.jtitle=Analytica%20chimica%20acta&rft.au=Cruz-Tirado,%20J.P.&rft.date=2022-05-29&rft.volume=1209&rft.spage=339793&rft.epage=339793&rft.pages=339793-339793&rft.artnum=339793&rft.issn=0003-2670&rft.eissn=1873-4324&rft_id=info:doi/10.1016/j.aca.2022.339793&rft_dat=%3Cproquest_cross%3E2665108731%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2665108731&rft_id=info:pmid/35569845&rft_els_id=S0003267022003646&rfr_iscdi=true |