LVM-StARS: Large Vision Model Soft Adaption for Remote Sensing Scene Classification

Recently, both large language models and large vision models (LVM) have gained significant attention. Trained on large-scale datasets, these large models have showcased remarkable capabilities across various research domains. To enhance the accuracy of remote sensing (RS) scene classification, LVM-b...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE geoscience and remote sensing letters 2024-01, Vol.21, p.1-1
Hauptverfasser:	Yang, Bohan, Chen, Yushi, Ghamisi, Pedram
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Adaptation models Classification Datasets Freezing Image enhancement Large language models large vision models (LVM) Methods parameter efficient transfer learning (PETL) Parameters Redundancy Remote sensing remote sensing (RS) scene classification Scene classification Stars State-of-the-art reviews Task analysis Task complexity Transformers Tuning Vision Transformers (ViT)
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1
container_issue
container_start_page	1
container_title	IEEE geoscience and remote sensing letters
container_volume	21
creator	Yang, Bohan Chen, Yushi Ghamisi, Pedram
description	Recently, both large language models and large vision models (LVM) have gained significant attention. Trained on large-scale datasets, these large models have showcased remarkable capabilities across various research domains. To enhance the accuracy of remote sensing (RS) scene classification, LVM-based methods are explored in this letter. Due to the differences between RS images and natural images, simply transferring LVMs to RS tasks is impractical. Therefore, we conducted research on relevant techniques and appended learnable prompt tokens to the input tokens while freezing the backbone weights, reducing the parameter scale and making the LVM weights easier to harness and to transfer. In consideration of latent catastrophic forgetting issues induced by ordinary finetuning techniques and the inherent complexity and redundancy of RS images, we introduced soft adaption mechanisms between backbone layers based on prompt tuning technique, and implemented the first LVM tuning method, namely the LVM-StARS-Deep and the LVM-StARS-Shallow to make LVMs more suitable for RS scene classification tasks. The proposed methods are evaluated on two popular RS scene classification datasets and the experimental results indicate that the proposed method outperforms other state-of-the-art methods. The experimental results demonstrate that our proposed method enhances overall accuracy by 1.71% to 3.94%, while updating only 0.1% to 0.5% of the parameters compared to full finetuning. Furthermore, our method outperforms existing methods.
doi_str_mv	10.1109/LGRS.2024.3432069
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_3094515217</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10633741</ieee_id><sourcerecordid>3094515217</sourcerecordid><originalsourceid>FETCH-LOGICAL-c176t-99daf0e504953c66e785131da95b5dcf1d530931d5657d94a9d36c0026ece62f3</originalsourceid><addsrcrecordid>eNpNkE1LAzEQhoMoWD9-gOAh4HlrstnJbryVolXYInS1eAsxmZQtdVOT7cF_7y714GmG4XlnhoeQG86mnDN1Xy9WzTRneTEVhciZVCdkwgGqjEHJT8e-gAxU9XFOLlLasoGsqnJCmnq9zJp-tmoeaG3iBum6TW3o6DI43NEm-J7OnNn348yHSFf4FXqkDXap7Ta0sdghne9MSq1vrRm5K3LmzS7h9V-9JO9Pj2_z56x-XbzMZ3VmeSn7TClnPENghQJhpcSyAi64Mwo-wVnPHQimhgFIKJ0qjHJC2uFziRZl7sUluTvu3cfwfcDU6204xG44qYdgARxyXg4UP1I2hpQier2P7ZeJP5ozPbrTozs9utN_7obM7THTIuI_XgpRFlz8Am6KaS0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3094515217</pqid></control><display><type>article</type><title>LVM-StARS: Large Vision Model Soft Adaption for Remote Sensing Scene Classification</title><source>IEEE Electronic Library (IEL)</source><creator>Yang, Bohan ; Chen, Yushi ; Ghamisi, Pedram</creator><creatorcontrib>Yang, Bohan ; Chen, Yushi ; Ghamisi, Pedram</creatorcontrib><description>Recently, both large language models and large vision models (LVM) have gained significant attention. Trained on large-scale datasets, these large models have showcased remarkable capabilities across various research domains. To enhance the accuracy of remote sensing (RS) scene classification, LVM-based methods are explored in this letter. Due to the differences between RS images and natural images, simply transferring LVMs to RS tasks is impractical. Therefore, we conducted research on relevant techniques and appended learnable prompt tokens to the input tokens while freezing the backbone weights, reducing the parameter scale and making the LVM weights easier to harness and to transfer. In consideration of latent catastrophic forgetting issues induced by ordinary finetuning techniques and the inherent complexity and redundancy of RS images, we introduced soft adaption mechanisms between backbone layers based on prompt tuning technique, and implemented the first LVM tuning method, namely the LVM-StARS-Deep and the LVM-StARS-Shallow to make LVMs more suitable for RS scene classification tasks. The proposed methods are evaluated on two popular RS scene classification datasets and the experimental results indicate that the proposed method outperforms other state-of-the-art methods. The experimental results demonstrate that our proposed method enhances overall accuracy by 1.71% to 3.94%, while updating only 0.1% to 0.5% of the parameters compared to full finetuning. Furthermore, our method outperforms existing methods.</description><identifier>ISSN: 1545-598X</identifier><identifier>EISSN: 1558-0571</identifier><identifier>DOI: 10.1109/LGRS.2024.3432069</identifier><identifier>CODEN: IGRSBY</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Accuracy ; Adaptation models ; Classification ; Datasets ; Freezing ; Image enhancement ; Large language models ; large vision models (LVM) ; Methods ; parameter efficient transfer learning (PETL) ; Parameters ; Redundancy ; Remote sensing ; remote sensing (RS) scene classification ; Scene classification ; Stars ; State-of-the-art reviews ; Task analysis ; Task complexity ; Transformers ; Tuning ; Vision Transformers (ViT)</subject><ispartof>IEEE geoscience and remote sensing letters, 2024-01, Vol.21, p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c176t-99daf0e504953c66e785131da95b5dcf1d530931d5657d94a9d36c0026ece62f3</cites><orcidid>0000-0003-1203-741X ; 0000-0003-2421-0996</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10633741$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,777,781,793,27905,27906,54739</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10633741$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Yang, Bohan</creatorcontrib><creatorcontrib>Chen, Yushi</creatorcontrib><creatorcontrib>Ghamisi, Pedram</creatorcontrib><title>LVM-StARS: Large Vision Model Soft Adaption for Remote Sensing Scene Classification</title><title>IEEE geoscience and remote sensing letters</title><addtitle>LGRS</addtitle><description>Recently, both large language models and large vision models (LVM) have gained significant attention. Trained on large-scale datasets, these large models have showcased remarkable capabilities across various research domains. To enhance the accuracy of remote sensing (RS) scene classification, LVM-based methods are explored in this letter. Due to the differences between RS images and natural images, simply transferring LVMs to RS tasks is impractical. Therefore, we conducted research on relevant techniques and appended learnable prompt tokens to the input tokens while freezing the backbone weights, reducing the parameter scale and making the LVM weights easier to harness and to transfer. In consideration of latent catastrophic forgetting issues induced by ordinary finetuning techniques and the inherent complexity and redundancy of RS images, we introduced soft adaption mechanisms between backbone layers based on prompt tuning technique, and implemented the first LVM tuning method, namely the LVM-StARS-Deep and the LVM-StARS-Shallow to make LVMs more suitable for RS scene classification tasks. The proposed methods are evaluated on two popular RS scene classification datasets and the experimental results indicate that the proposed method outperforms other state-of-the-art methods. The experimental results demonstrate that our proposed method enhances overall accuracy by 1.71% to 3.94%, while updating only 0.1% to 0.5% of the parameters compared to full finetuning. Furthermore, our method outperforms existing methods.</description><subject>Accuracy</subject><subject>Adaptation models</subject><subject>Classification</subject><subject>Datasets</subject><subject>Freezing</subject><subject>Image enhancement</subject><subject>Large language models</subject><subject>large vision models (LVM)</subject><subject>Methods</subject><subject>parameter efficient transfer learning (PETL)</subject><subject>Parameters</subject><subject>Redundancy</subject><subject>Remote sensing</subject><subject>remote sensing (RS) scene classification</subject><subject>Scene classification</subject><subject>Stars</subject><subject>State-of-the-art reviews</subject><subject>Task analysis</subject><subject>Task complexity</subject><subject>Transformers</subject><subject>Tuning</subject><subject>Vision Transformers (ViT)</subject><issn>1545-598X</issn><issn>1558-0571</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkE1LAzEQhoMoWD9-gOAh4HlrstnJbryVolXYInS1eAsxmZQtdVOT7cF_7y714GmG4XlnhoeQG86mnDN1Xy9WzTRneTEVhciZVCdkwgGqjEHJT8e-gAxU9XFOLlLasoGsqnJCmnq9zJp-tmoeaG3iBum6TW3o6DI43NEm-J7OnNn348yHSFf4FXqkDXap7Ta0sdghne9MSq1vrRm5K3LmzS7h9V-9JO9Pj2_z56x-XbzMZ3VmeSn7TClnPENghQJhpcSyAi64Mwo-wVnPHQimhgFIKJ0qjHJC2uFziRZl7sUluTvu3cfwfcDU6204xG44qYdgARxyXg4UP1I2hpQier2P7ZeJP5ozPbrTozs9utN_7obM7THTIuI_XgpRFlz8Am6KaS0</recordid><startdate>20240101</startdate><enddate>20240101</enddate><creator>Yang, Bohan</creator><creator>Chen, Yushi</creator><creator>Ghamisi, Pedram</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TG</scope><scope>7UA</scope><scope>8FD</scope><scope>C1K</scope><scope>F1W</scope><scope>FR3</scope><scope>H8D</scope><scope>H96</scope><scope>JQ2</scope><scope>KL.</scope><scope>KR7</scope><scope>L.G</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-1203-741X</orcidid><orcidid>https://orcid.org/0000-0003-2421-0996</orcidid></search><sort><creationdate>20240101</creationdate><title>LVM-StARS: Large Vision Model Soft Adaption for Remote Sensing Scene Classification</title><author>Yang, Bohan ; Chen, Yushi ; Ghamisi, Pedram</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c176t-99daf0e504953c66e785131da95b5dcf1d530931d5657d94a9d36c0026ece62f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Adaptation models</topic><topic>Classification</topic><topic>Datasets</topic><topic>Freezing</topic><topic>Image enhancement</topic><topic>Large language models</topic><topic>large vision models (LVM)</topic><topic>Methods</topic><topic>parameter efficient transfer learning (PETL)</topic><topic>Parameters</topic><topic>Redundancy</topic><topic>Remote sensing</topic><topic>remote sensing (RS) scene classification</topic><topic>Scene classification</topic><topic>Stars</topic><topic>State-of-the-art reviews</topic><topic>Task analysis</topic><topic>Task complexity</topic><topic>Transformers</topic><topic>Tuning</topic><topic>Vision Transformers (ViT)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yang, Bohan</creatorcontrib><creatorcontrib>Chen, Yushi</creatorcontrib><creatorcontrib>Ghamisi, Pedram</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Meteorological & Geoastrophysical Abstracts</collection><collection>Water Resources Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ASFA: Aquatic Sciences and Fisheries Abstracts</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Aquatic Science & Fisheries Abstracts (ASFA) 2: Ocean Technology, Policy & Non-Living Resources</collection><collection>ProQuest Computer Science Collection</collection><collection>Meteorological & Geoastrophysical Abstracts - Academic</collection><collection>Civil Engineering Abstracts</collection><collection>Aquatic Science & Fisheries Abstracts (ASFA) Professional</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE geoscience and remote sensing letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Yang, Bohan</au><au>Chen, Yushi</au><au>Ghamisi, Pedram</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>LVM-StARS: Large Vision Model Soft Adaption for Remote Sensing Scene Classification</atitle><jtitle>IEEE geoscience and remote sensing letters</jtitle><stitle>LGRS</stitle><date>2024-01-01</date><risdate>2024</risdate><volume>21</volume><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>1545-598X</issn><eissn>1558-0571</eissn><coden>IGRSBY</coden><abstract>Recently, both large language models and large vision models (LVM) have gained significant attention. Trained on large-scale datasets, these large models have showcased remarkable capabilities across various research domains. To enhance the accuracy of remote sensing (RS) scene classification, LVM-based methods are explored in this letter. Due to the differences between RS images and natural images, simply transferring LVMs to RS tasks is impractical. Therefore, we conducted research on relevant techniques and appended learnable prompt tokens to the input tokens while freezing the backbone weights, reducing the parameter scale and making the LVM weights easier to harness and to transfer. In consideration of latent catastrophic forgetting issues induced by ordinary finetuning techniques and the inherent complexity and redundancy of RS images, we introduced soft adaption mechanisms between backbone layers based on prompt tuning technique, and implemented the first LVM tuning method, namely the LVM-StARS-Deep and the LVM-StARS-Shallow to make LVMs more suitable for RS scene classification tasks. The proposed methods are evaluated on two popular RS scene classification datasets and the experimental results indicate that the proposed method outperforms other state-of-the-art methods. The experimental results demonstrate that our proposed method enhances overall accuracy by 1.71% to 3.94%, while updating only 0.1% to 0.5% of the parameters compared to full finetuning. Furthermore, our method outperforms existing methods.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/LGRS.2024.3432069</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0003-1203-741X</orcidid><orcidid>https://orcid.org/0000-0003-2421-0996</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1545-598X
ispartof	IEEE geoscience and remote sensing letters, 2024-01, Vol.21, p.1-1
issn	1545-598X 1558-0571
language	eng
recordid	cdi_proquest_journals_3094515217
source	IEEE Electronic Library (IEL)
subjects	Accuracy Adaptation models Classification Datasets Freezing Image enhancement Large language models large vision models (LVM) Methods parameter efficient transfer learning (PETL) Parameters Redundancy Remote sensing remote sensing (RS) scene classification Scene classification Stars State-of-the-art reviews Task analysis Task complexity Transformers Tuning Vision Transformers (ViT)
title	LVM-StARS: Large Vision Model Soft Adaption for Remote Sensing Scene Classification
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T00%3A45%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=LVM-StARS:%20Large%20Vision%20Model%20Soft%20Adaption%20for%20Remote%20Sensing%20Scene%20Classification&rft.jtitle=IEEE%20geoscience%20and%20remote%20sensing%20letters&rft.au=Yang,%20Bohan&rft.date=2024-01-01&rft.volume=21&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=1545-598X&rft.eissn=1558-0571&rft.coden=IGRSBY&rft_id=info:doi/10.1109/LGRS.2024.3432069&rft_dat=%3Cproquest_RIE%3E3094515217%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3094515217&rft_id=info:pmid/&rft_ieee_id=10633741&rfr_iscdi=true