LVM-StARS: Large Vision Model Soft Adaption for Remote Sensing Scene Classification
Recently, both large language models and large vision models (LVM) have gained significant attention. Trained on large-scale datasets, these large models have showcased remarkable capabilities across various research domains. To enhance the accuracy of remote sensing (RS) scene classification, LVM-b...
Gespeichert in:
Veröffentlicht in: | IEEE geoscience and remote sensing letters 2024-01, Vol.21, p.1-1 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1 |
---|---|
container_issue | |
container_start_page | 1 |
container_title | IEEE geoscience and remote sensing letters |
container_volume | 21 |
creator | Yang, Bohan Chen, Yushi Ghamisi, Pedram |
description | Recently, both large language models and large vision models (LVM) have gained significant attention. Trained on large-scale datasets, these large models have showcased remarkable capabilities across various research domains. To enhance the accuracy of remote sensing (RS) scene classification, LVM-based methods are explored in this letter. Due to the differences between RS images and natural images, simply transferring LVMs to RS tasks is impractical. Therefore, we conducted research on relevant techniques and appended learnable prompt tokens to the input tokens while freezing the backbone weights, reducing the parameter scale and making the LVM weights easier to harness and to transfer. In consideration of latent catastrophic forgetting issues induced by ordinary finetuning techniques and the inherent complexity and redundancy of RS images, we introduced soft adaption mechanisms between backbone layers based on prompt tuning technique, and implemented the first LVM tuning method, namely the LVM-StARS-Deep and the LVM-StARS-Shallow to make LVMs more suitable for RS scene classification tasks. The proposed methods are evaluated on two popular RS scene classification datasets and the experimental results indicate that the proposed method outperforms other state-of-the-art methods. The experimental results demonstrate that our proposed method enhances overall accuracy by 1.71% to 3.94%, while updating only 0.1% to 0.5% of the parameters compared to full finetuning. Furthermore, our method outperforms existing methods. |
doi_str_mv | 10.1109/LGRS.2024.3432069 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_3094515217</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10633741</ieee_id><sourcerecordid>3094515217</sourcerecordid><originalsourceid>FETCH-LOGICAL-c176t-99daf0e504953c66e785131da95b5dcf1d530931d5657d94a9d36c0026ece62f3</originalsourceid><addsrcrecordid>eNpNkE1LAzEQhoMoWD9-gOAh4HlrstnJbryVolXYInS1eAsxmZQtdVOT7cF_7y714GmG4XlnhoeQG86mnDN1Xy9WzTRneTEVhciZVCdkwgGqjEHJT8e-gAxU9XFOLlLasoGsqnJCmnq9zJp-tmoeaG3iBum6TW3o6DI43NEm-J7OnNn348yHSFf4FXqkDXap7Ta0sdghne9MSq1vrRm5K3LmzS7h9V-9JO9Pj2_z56x-XbzMZ3VmeSn7TClnPENghQJhpcSyAi64Mwo-wVnPHQimhgFIKJ0qjHJC2uFziRZl7sUluTvu3cfwfcDU6204xG44qYdgARxyXg4UP1I2hpQier2P7ZeJP5ozPbrTozs9utN_7obM7THTIuI_XgpRFlz8Am6KaS0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3094515217</pqid></control><display><type>article</type><title>LVM-StARS: Large Vision Model Soft Adaption for Remote Sensing Scene Classification</title><source>IEEE Electronic Library (IEL)</source><creator>Yang, Bohan ; Chen, Yushi ; Ghamisi, Pedram</creator><creatorcontrib>Yang, Bohan ; Chen, Yushi ; Ghamisi, Pedram</creatorcontrib><description>Recently, both large language models and large vision models (LVM) have gained significant attention. Trained on large-scale datasets, these large models have showcased remarkable capabilities across various research domains. To enhance the accuracy of remote sensing (RS) scene classification, LVM-based methods are explored in this letter. Due to the differences between RS images and natural images, simply transferring LVMs to RS tasks is impractical. Therefore, we conducted research on relevant techniques and appended learnable prompt tokens to the input tokens while freezing the backbone weights, reducing the parameter scale and making the LVM weights easier to harness and to transfer. In consideration of latent catastrophic forgetting issues induced by ordinary finetuning techniques and the inherent complexity and redundancy of RS images, we introduced soft adaption mechanisms between backbone layers based on prompt tuning technique, and implemented the first LVM tuning method, namely the LVM-StARS-Deep and the LVM-StARS-Shallow to make LVMs more suitable for RS scene classification tasks. The proposed methods are evaluated on two popular RS scene classification datasets and the experimental results indicate that the proposed method outperforms other state-of-the-art methods. The experimental results demonstrate that our proposed method enhances overall accuracy by 1.71% to 3.94%, while updating only 0.1% to 0.5% of the parameters compared to full finetuning. Furthermore, our method outperforms existing methods.</description><identifier>ISSN: 1545-598X</identifier><identifier>EISSN: 1558-0571</identifier><identifier>DOI: 10.1109/LGRS.2024.3432069</identifier><identifier>CODEN: IGRSBY</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Accuracy ; Adaptation models ; Classification ; Datasets ; Freezing ; Image enhancement ; Large language models ; large vision models (LVM) ; Methods ; parameter efficient transfer learning (PETL) ; Parameters ; Redundancy ; Remote sensing ; remote sensing (RS) scene classification ; Scene classification ; Stars ; State-of-the-art reviews ; Task analysis ; Task complexity ; Transformers ; Tuning ; Vision Transformers (ViT)</subject><ispartof>IEEE geoscience and remote sensing letters, 2024-01, Vol.21, p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c176t-99daf0e504953c66e785131da95b5dcf1d530931d5657d94a9d36c0026ece62f3</cites><orcidid>0000-0003-1203-741X ; 0000-0003-2421-0996</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10633741$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,777,781,793,27905,27906,54739</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10633741$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Yang, Bohan</creatorcontrib><creatorcontrib>Chen, Yushi</creatorcontrib><creatorcontrib>Ghamisi, Pedram</creatorcontrib><title>LVM-StARS: Large Vision Model Soft Adaption for Remote Sensing Scene Classification</title><title>IEEE geoscience and remote sensing letters</title><addtitle>LGRS</addtitle><description>Recently, both large language models and large vision models (LVM) have gained significant attention. Trained on large-scale datasets, these large models have showcased remarkable capabilities across various research domains. To enhance the accuracy of remote sensing (RS) scene classification, LVM-based methods are explored in this letter. Due to the differences between RS images and natural images, simply transferring LVMs to RS tasks is impractical. Therefore, we conducted research on relevant techniques and appended learnable prompt tokens to the input tokens while freezing the backbone weights, reducing the parameter scale and making the LVM weights easier to harness and to transfer. In consideration of latent catastrophic forgetting issues induced by ordinary finetuning techniques and the inherent complexity and redundancy of RS images, we introduced soft adaption mechanisms between backbone layers based on prompt tuning technique, and implemented the first LVM tuning method, namely the LVM-StARS-Deep and the LVM-StARS-Shallow to make LVMs more suitable for RS scene classification tasks. The proposed methods are evaluated on two popular RS scene classification datasets and the experimental results indicate that the proposed method outperforms other state-of-the-art methods. The experimental results demonstrate that our proposed method enhances overall accuracy by 1.71% to 3.94%, while updating only 0.1% to 0.5% of the parameters compared to full finetuning. Furthermore, our method outperforms existing methods.</description><subject>Accuracy</subject><subject>Adaptation models</subject><subject>Classification</subject><subject>Datasets</subject><subject>Freezing</subject><subject>Image enhancement</subject><subject>Large language models</subject><subject>large vision models (LVM)</subject><subject>Methods</subject><subject>parameter efficient transfer learning (PETL)</subject><subject>Parameters</subject><subject>Redundancy</subject><subject>Remote sensing</subject><subject>remote sensing (RS) scene classification</subject><subject>Scene classification</subject><subject>Stars</subject><subject>State-of-the-art reviews</subject><subject>Task analysis</subject><subject>Task complexity</subject><subject>Transformers</subject><subject>Tuning</subject><subject>Vision Transformers (ViT)</subject><issn>1545-598X</issn><issn>1558-0571</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkE1LAzEQhoMoWD9-gOAh4HlrstnJbryVolXYInS1eAsxmZQtdVOT7cF_7y714GmG4XlnhoeQG86mnDN1Xy9WzTRneTEVhciZVCdkwgGqjEHJT8e-gAxU9XFOLlLasoGsqnJCmnq9zJp-tmoeaG3iBum6TW3o6DI43NEm-J7OnNn348yHSFf4FXqkDXap7Ta0sdghne9MSq1vrRm5K3LmzS7h9V-9JO9Pj2_z56x-XbzMZ3VmeSn7TClnPENghQJhpcSyAi64Mwo-wVnPHQimhgFIKJ0qjHJC2uFziRZl7sUluTvu3cfwfcDU6204xG44qYdgARxyXg4UP1I2hpQier2P7ZeJP5ozPbrTozs9utN_7obM7THTIuI_XgpRFlz8Am6KaS0</recordid><startdate>20240101</startdate><enddate>20240101</enddate><creator>Yang, Bohan</creator><creator>Chen, Yushi</creator><creator>Ghamisi, Pedram</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TG</scope><scope>7UA</scope><scope>8FD</scope><scope>C1K</scope><scope>F1W</scope><scope>FR3</scope><scope>H8D</scope><scope>H96</scope><scope>JQ2</scope><scope>KL.</scope><scope>KR7</scope><scope>L.G</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-1203-741X</orcidid><orcidid>https://orcid.org/0000-0003-2421-0996</orcidid></search><sort><creationdate>20240101</creationdate><title>LVM-StARS: Large Vision Model Soft Adaption for Remote Sensing Scene Classification</title><author>Yang, Bohan ; Chen, Yushi ; Ghamisi, Pedram</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c176t-99daf0e504953c66e785131da95b5dcf1d530931d5657d94a9d36c0026ece62f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Adaptation models</topic><topic>Classification</topic><topic>Datasets</topic><topic>Freezing</topic><topic>Image enhancement</topic><topic>Large language models</topic><topic>large vision models (LVM)</topic><topic>Methods</topic><topic>parameter efficient transfer learning (PETL)</topic><topic>Parameters</topic><topic>Redundancy</topic><topic>Remote sensing</topic><topic>remote sensing (RS) scene classification</topic><topic>Scene classification</topic><topic>Stars</topic><topic>State-of-the-art reviews</topic><topic>Task analysis</topic><topic>Task complexity</topic><topic>Transformers</topic><topic>Tuning</topic><topic>Vision Transformers (ViT)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yang, Bohan</creatorcontrib><creatorcontrib>Chen, Yushi</creatorcontrib><creatorcontrib>Ghamisi, Pedram</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Meteorological & Geoastrophysical Abstracts</collection><collection>Water Resources Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ASFA: Aquatic Sciences and Fisheries Abstracts</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Aquatic Science & Fisheries Abstracts (ASFA) 2: Ocean Technology, Policy & Non-Living Resources</collection><collection>ProQuest Computer Science Collection</collection><collection>Meteorological & Geoastrophysical Abstracts - Academic</collection><collection>Civil Engineering Abstracts</collection><collection>Aquatic Science & Fisheries Abstracts (ASFA) Professional</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE geoscience and remote sensing letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Yang, Bohan</au><au>Chen, Yushi</au><au>Ghamisi, Pedram</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>LVM-StARS: Large Vision Model Soft Adaption for Remote Sensing Scene Classification</atitle><jtitle>IEEE geoscience and remote sensing letters</jtitle><stitle>LGRS</stitle><date>2024-01-01</date><risdate>2024</risdate><volume>21</volume><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>1545-598X</issn><eissn>1558-0571</eissn><coden>IGRSBY</coden><abstract>Recently, both large language models and large vision models (LVM) have gained significant attention. Trained on large-scale datasets, these large models have showcased remarkable capabilities across various research domains. To enhance the accuracy of remote sensing (RS) scene classification, LVM-based methods are explored in this letter. Due to the differences between RS images and natural images, simply transferring LVMs to RS tasks is impractical. Therefore, we conducted research on relevant techniques and appended learnable prompt tokens to the input tokens while freezing the backbone weights, reducing the parameter scale and making the LVM weights easier to harness and to transfer. In consideration of latent catastrophic forgetting issues induced by ordinary finetuning techniques and the inherent complexity and redundancy of RS images, we introduced soft adaption mechanisms between backbone layers based on prompt tuning technique, and implemented the first LVM tuning method, namely the LVM-StARS-Deep and the LVM-StARS-Shallow to make LVMs more suitable for RS scene classification tasks. The proposed methods are evaluated on two popular RS scene classification datasets and the experimental results indicate that the proposed method outperforms other state-of-the-art methods. The experimental results demonstrate that our proposed method enhances overall accuracy by 1.71% to 3.94%, while updating only 0.1% to 0.5% of the parameters compared to full finetuning. Furthermore, our method outperforms existing methods.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/LGRS.2024.3432069</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0003-1203-741X</orcidid><orcidid>https://orcid.org/0000-0003-2421-0996</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1545-598X |
ispartof | IEEE geoscience and remote sensing letters, 2024-01, Vol.21, p.1-1 |
issn | 1545-598X 1558-0571 |
language | eng |
recordid | cdi_proquest_journals_3094515217 |
source | IEEE Electronic Library (IEL) |
subjects | Accuracy Adaptation models Classification Datasets Freezing Image enhancement Large language models large vision models (LVM) Methods parameter efficient transfer learning (PETL) Parameters Redundancy Remote sensing remote sensing (RS) scene classification Scene classification Stars State-of-the-art reviews Task analysis Task complexity Transformers Tuning Vision Transformers (ViT) |
title | LVM-StARS: Large Vision Model Soft Adaption for Remote Sensing Scene Classification |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T00%3A45%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=LVM-StARS:%20Large%20Vision%20Model%20Soft%20Adaption%20for%20Remote%20Sensing%20Scene%20Classification&rft.jtitle=IEEE%20geoscience%20and%20remote%20sensing%20letters&rft.au=Yang,%20Bohan&rft.date=2024-01-01&rft.volume=21&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=1545-598X&rft.eissn=1558-0571&rft.coden=IGRSBY&rft_id=info:doi/10.1109/LGRS.2024.3432069&rft_dat=%3Cproquest_RIE%3E3094515217%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3094515217&rft_id=info:pmid/&rft_ieee_id=10633741&rfr_iscdi=true |