LVM-StARS: Large Vision Model Soft Adaption for Remote Sensing Scene Classification

Recently, both large language models and large vision models (LVM) have gained significant attention. Trained on large-scale datasets, these large models have showcased remarkable capabilities across various research domains. To enhance the accuracy of remote sensing (RS) scene classification, LVM-b...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE geoscience and remote sensing letters 2024-01, Vol.21, p.1-1
Hauptverfasser: Yang, Bohan, Chen, Yushi, Ghamisi, Pedram
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1
container_issue
container_start_page 1
container_title IEEE geoscience and remote sensing letters
container_volume 21
creator Yang, Bohan
Chen, Yushi
Ghamisi, Pedram
description Recently, both large language models and large vision models (LVM) have gained significant attention. Trained on large-scale datasets, these large models have showcased remarkable capabilities across various research domains. To enhance the accuracy of remote sensing (RS) scene classification, LVM-based methods are explored in this letter. Due to the differences between RS images and natural images, simply transferring LVMs to RS tasks is impractical. Therefore, we conducted research on relevant techniques and appended learnable prompt tokens to the input tokens while freezing the backbone weights, reducing the parameter scale and making the LVM weights easier to harness and to transfer. In consideration of latent catastrophic forgetting issues induced by ordinary finetuning techniques and the inherent complexity and redundancy of RS images, we introduced soft adaption mechanisms between backbone layers based on prompt tuning technique, and implemented the first LVM tuning method, namely the LVM-StARS-Deep and the LVM-StARS-Shallow to make LVMs more suitable for RS scene classification tasks. The proposed methods are evaluated on two popular RS scene classification datasets and the experimental results indicate that the proposed method outperforms other state-of-the-art methods. The experimental results demonstrate that our proposed method enhances overall accuracy by 1.71% to 3.94%, while updating only 0.1% to 0.5% of the parameters compared to full finetuning. Furthermore, our method outperforms existing methods.
doi_str_mv 10.1109/LGRS.2024.3432069
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_3094515217</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10633741</ieee_id><sourcerecordid>3094515217</sourcerecordid><originalsourceid>FETCH-LOGICAL-c176t-99daf0e504953c66e785131da95b5dcf1d530931d5657d94a9d36c0026ece62f3</originalsourceid><addsrcrecordid>eNpNkE1LAzEQhoMoWD9-gOAh4HlrstnJbryVolXYInS1eAsxmZQtdVOT7cF_7y714GmG4XlnhoeQG86mnDN1Xy9WzTRneTEVhciZVCdkwgGqjEHJT8e-gAxU9XFOLlLasoGsqnJCmnq9zJp-tmoeaG3iBum6TW3o6DI43NEm-J7OnNn348yHSFf4FXqkDXap7Ta0sdghne9MSq1vrRm5K3LmzS7h9V-9JO9Pj2_z56x-XbzMZ3VmeSn7TClnPENghQJhpcSyAi64Mwo-wVnPHQimhgFIKJ0qjHJC2uFziRZl7sUluTvu3cfwfcDU6204xG44qYdgARxyXg4UP1I2hpQier2P7ZeJP5ozPbrTozs9utN_7obM7THTIuI_XgpRFlz8Am6KaS0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3094515217</pqid></control><display><type>article</type><title>LVM-StARS: Large Vision Model Soft Adaption for Remote Sensing Scene Classification</title><source>IEEE Electronic Library (IEL)</source><creator>Yang, Bohan ; Chen, Yushi ; Ghamisi, Pedram</creator><creatorcontrib>Yang, Bohan ; Chen, Yushi ; Ghamisi, Pedram</creatorcontrib><description>Recently, both large language models and large vision models (LVM) have gained significant attention. Trained on large-scale datasets, these large models have showcased remarkable capabilities across various research domains. To enhance the accuracy of remote sensing (RS) scene classification, LVM-based methods are explored in this letter. Due to the differences between RS images and natural images, simply transferring LVMs to RS tasks is impractical. Therefore, we conducted research on relevant techniques and appended learnable prompt tokens to the input tokens while freezing the backbone weights, reducing the parameter scale and making the LVM weights easier to harness and to transfer. In consideration of latent catastrophic forgetting issues induced by ordinary finetuning techniques and the inherent complexity and redundancy of RS images, we introduced soft adaption mechanisms between backbone layers based on prompt tuning technique, and implemented the first LVM tuning method, namely the LVM-StARS-Deep and the LVM-StARS-Shallow to make LVMs more suitable for RS scene classification tasks. The proposed methods are evaluated on two popular RS scene classification datasets and the experimental results indicate that the proposed method outperforms other state-of-the-art methods. The experimental results demonstrate that our proposed method enhances overall accuracy by 1.71% to 3.94%, while updating only 0.1% to 0.5% of the parameters compared to full finetuning. Furthermore, our method outperforms existing methods.</description><identifier>ISSN: 1545-598X</identifier><identifier>EISSN: 1558-0571</identifier><identifier>DOI: 10.1109/LGRS.2024.3432069</identifier><identifier>CODEN: IGRSBY</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Accuracy ; Adaptation models ; Classification ; Datasets ; Freezing ; Image enhancement ; Large language models ; large vision models (LVM) ; Methods ; parameter efficient transfer learning (PETL) ; Parameters ; Redundancy ; Remote sensing ; remote sensing (RS) scene classification ; Scene classification ; Stars ; State-of-the-art reviews ; Task analysis ; Task complexity ; Transformers ; Tuning ; Vision Transformers (ViT)</subject><ispartof>IEEE geoscience and remote sensing letters, 2024-01, Vol.21, p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c176t-99daf0e504953c66e785131da95b5dcf1d530931d5657d94a9d36c0026ece62f3</cites><orcidid>0000-0003-1203-741X ; 0000-0003-2421-0996</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10633741$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,777,781,793,27905,27906,54739</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10633741$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Yang, Bohan</creatorcontrib><creatorcontrib>Chen, Yushi</creatorcontrib><creatorcontrib>Ghamisi, Pedram</creatorcontrib><title>LVM-StARS: Large Vision Model Soft Adaption for Remote Sensing Scene Classification</title><title>IEEE geoscience and remote sensing letters</title><addtitle>LGRS</addtitle><description>Recently, both large language models and large vision models (LVM) have gained significant attention. Trained on large-scale datasets, these large models have showcased remarkable capabilities across various research domains. To enhance the accuracy of remote sensing (RS) scene classification, LVM-based methods are explored in this letter. Due to the differences between RS images and natural images, simply transferring LVMs to RS tasks is impractical. Therefore, we conducted research on relevant techniques and appended learnable prompt tokens to the input tokens while freezing the backbone weights, reducing the parameter scale and making the LVM weights easier to harness and to transfer. In consideration of latent catastrophic forgetting issues induced by ordinary finetuning techniques and the inherent complexity and redundancy of RS images, we introduced soft adaption mechanisms between backbone layers based on prompt tuning technique, and implemented the first LVM tuning method, namely the LVM-StARS-Deep and the LVM-StARS-Shallow to make LVMs more suitable for RS scene classification tasks. The proposed methods are evaluated on two popular RS scene classification datasets and the experimental results indicate that the proposed method outperforms other state-of-the-art methods. The experimental results demonstrate that our proposed method enhances overall accuracy by 1.71% to 3.94%, while updating only 0.1% to 0.5% of the parameters compared to full finetuning. Furthermore, our method outperforms existing methods.</description><subject>Accuracy</subject><subject>Adaptation models</subject><subject>Classification</subject><subject>Datasets</subject><subject>Freezing</subject><subject>Image enhancement</subject><subject>Large language models</subject><subject>large vision models (LVM)</subject><subject>Methods</subject><subject>parameter efficient transfer learning (PETL)</subject><subject>Parameters</subject><subject>Redundancy</subject><subject>Remote sensing</subject><subject>remote sensing (RS) scene classification</subject><subject>Scene classification</subject><subject>Stars</subject><subject>State-of-the-art reviews</subject><subject>Task analysis</subject><subject>Task complexity</subject><subject>Transformers</subject><subject>Tuning</subject><subject>Vision Transformers (ViT)</subject><issn>1545-598X</issn><issn>1558-0571</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkE1LAzEQhoMoWD9-gOAh4HlrstnJbryVolXYInS1eAsxmZQtdVOT7cF_7y714GmG4XlnhoeQG86mnDN1Xy9WzTRneTEVhciZVCdkwgGqjEHJT8e-gAxU9XFOLlLasoGsqnJCmnq9zJp-tmoeaG3iBum6TW3o6DI43NEm-J7OnNn348yHSFf4FXqkDXap7Ta0sdghne9MSq1vrRm5K3LmzS7h9V-9JO9Pj2_z56x-XbzMZ3VmeSn7TClnPENghQJhpcSyAi64Mwo-wVnPHQimhgFIKJ0qjHJC2uFziRZl7sUluTvu3cfwfcDU6204xG44qYdgARxyXg4UP1I2hpQier2P7ZeJP5ozPbrTozs9utN_7obM7THTIuI_XgpRFlz8Am6KaS0</recordid><startdate>20240101</startdate><enddate>20240101</enddate><creator>Yang, Bohan</creator><creator>Chen, Yushi</creator><creator>Ghamisi, Pedram</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TG</scope><scope>7UA</scope><scope>8FD</scope><scope>C1K</scope><scope>F1W</scope><scope>FR3</scope><scope>H8D</scope><scope>H96</scope><scope>JQ2</scope><scope>KL.</scope><scope>KR7</scope><scope>L.G</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-1203-741X</orcidid><orcidid>https://orcid.org/0000-0003-2421-0996</orcidid></search><sort><creationdate>20240101</creationdate><title>LVM-StARS: Large Vision Model Soft Adaption for Remote Sensing Scene Classification</title><author>Yang, Bohan ; Chen, Yushi ; Ghamisi, Pedram</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c176t-99daf0e504953c66e785131da95b5dcf1d530931d5657d94a9d36c0026ece62f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Adaptation models</topic><topic>Classification</topic><topic>Datasets</topic><topic>Freezing</topic><topic>Image enhancement</topic><topic>Large language models</topic><topic>large vision models (LVM)</topic><topic>Methods</topic><topic>parameter efficient transfer learning (PETL)</topic><topic>Parameters</topic><topic>Redundancy</topic><topic>Remote sensing</topic><topic>remote sensing (RS) scene classification</topic><topic>Scene classification</topic><topic>Stars</topic><topic>State-of-the-art reviews</topic><topic>Task analysis</topic><topic>Task complexity</topic><topic>Transformers</topic><topic>Tuning</topic><topic>Vision Transformers (ViT)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yang, Bohan</creatorcontrib><creatorcontrib>Chen, Yushi</creatorcontrib><creatorcontrib>Ghamisi, Pedram</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Meteorological &amp; Geoastrophysical Abstracts</collection><collection>Water Resources Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ASFA: Aquatic Sciences and Fisheries Abstracts</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Aquatic Science &amp; Fisheries Abstracts (ASFA) 2: Ocean Technology, Policy &amp; Non-Living Resources</collection><collection>ProQuest Computer Science Collection</collection><collection>Meteorological &amp; Geoastrophysical Abstracts - Academic</collection><collection>Civil Engineering Abstracts</collection><collection>Aquatic Science &amp; Fisheries Abstracts (ASFA) Professional</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE geoscience and remote sensing letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Yang, Bohan</au><au>Chen, Yushi</au><au>Ghamisi, Pedram</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>LVM-StARS: Large Vision Model Soft Adaption for Remote Sensing Scene Classification</atitle><jtitle>IEEE geoscience and remote sensing letters</jtitle><stitle>LGRS</stitle><date>2024-01-01</date><risdate>2024</risdate><volume>21</volume><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>1545-598X</issn><eissn>1558-0571</eissn><coden>IGRSBY</coden><abstract>Recently, both large language models and large vision models (LVM) have gained significant attention. Trained on large-scale datasets, these large models have showcased remarkable capabilities across various research domains. To enhance the accuracy of remote sensing (RS) scene classification, LVM-based methods are explored in this letter. Due to the differences between RS images and natural images, simply transferring LVMs to RS tasks is impractical. Therefore, we conducted research on relevant techniques and appended learnable prompt tokens to the input tokens while freezing the backbone weights, reducing the parameter scale and making the LVM weights easier to harness and to transfer. In consideration of latent catastrophic forgetting issues induced by ordinary finetuning techniques and the inherent complexity and redundancy of RS images, we introduced soft adaption mechanisms between backbone layers based on prompt tuning technique, and implemented the first LVM tuning method, namely the LVM-StARS-Deep and the LVM-StARS-Shallow to make LVMs more suitable for RS scene classification tasks. The proposed methods are evaluated on two popular RS scene classification datasets and the experimental results indicate that the proposed method outperforms other state-of-the-art methods. The experimental results demonstrate that our proposed method enhances overall accuracy by 1.71% to 3.94%, while updating only 0.1% to 0.5% of the parameters compared to full finetuning. Furthermore, our method outperforms existing methods.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/LGRS.2024.3432069</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0003-1203-741X</orcidid><orcidid>https://orcid.org/0000-0003-2421-0996</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1545-598X
ispartof IEEE geoscience and remote sensing letters, 2024-01, Vol.21, p.1-1
issn 1545-598X
1558-0571
language eng
recordid cdi_proquest_journals_3094515217
source IEEE Electronic Library (IEL)
subjects Accuracy
Adaptation models
Classification
Datasets
Freezing
Image enhancement
Large language models
large vision models (LVM)
Methods
parameter efficient transfer learning (PETL)
Parameters
Redundancy
Remote sensing
remote sensing (RS) scene classification
Scene classification
Stars
State-of-the-art reviews
Task analysis
Task complexity
Transformers
Tuning
Vision Transformers (ViT)
title LVM-StARS: Large Vision Model Soft Adaption for Remote Sensing Scene Classification
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T00%3A45%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=LVM-StARS:%20Large%20Vision%20Model%20Soft%20Adaption%20for%20Remote%20Sensing%20Scene%20Classification&rft.jtitle=IEEE%20geoscience%20and%20remote%20sensing%20letters&rft.au=Yang,%20Bohan&rft.date=2024-01-01&rft.volume=21&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=1545-598X&rft.eissn=1558-0571&rft.coden=IGRSBY&rft_id=info:doi/10.1109/LGRS.2024.3432069&rft_dat=%3Cproquest_RIE%3E3094515217%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3094515217&rft_id=info:pmid/&rft_ieee_id=10633741&rfr_iscdi=true