Enhancing Parameter Efficiency in Model Inference Using an Ultralight Inter-Transformer Linear Structure

Pre-trained language models are the cornerstone of modern natural language processing and information retrieval. However, fine-tuning all the parameters reduces the efficiency of models both in training and inference owing to their increasingly heavy structures. Existing methods for parameter effici...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2024, Vol.12, p.43734-43746
Hauptverfasser: Shi, Haoxiang, Sakai, Tetsuya
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 43746
container_issue
container_start_page 43734
container_title IEEE access
container_volume 12
creator Shi, Haoxiang
Sakai, Tetsuya
description Pre-trained language models are the cornerstone of modern natural language processing and information retrieval. However, fine-tuning all the parameters reduces the efficiency of models both in training and inference owing to their increasingly heavy structures. Existing methods for parameter efficiency still require approximately 1 MB of storage and have approximately 10^{7} operations during model deployment and inference. This puts a strain on the storage and processor capacity of end devices such as smartphones and IoT equipment, and slow model inference adversely affecting the user experience. To achieve more efficient and storage-friendly inference compared to mainstream methods, such as low-rank adaptation (LoRA) and Adapter, LayerConnect (hyper-network-assisted interlayer connectors) is proposed in this paper. Extensive experiments were conducted to validate the performance of LayerConnect for two essential tasks with completely different learning frameworks and purposes: natural language understanding (using the general language understanding evaluation (GLUE) benchmark) and information retrieval (using the a contextualized inverted list (COIL) framework). For both tasks, our LayerConnect saves up to 95.31% and 91.18% of parameters in LoRA and Adapter, respectively. In contrast, LayerConnect maintains performance degradation for GLUE and COIL to less than 8% and 3%, compared to LoRA. When compared to Adapter, the numbers become 5% and 3%, for GLUE and COIL, respectively. In addition, LayerConnect required approximately 100 kB of storage per task-specific trained model for both tasks and reduced the number of operations in the model inference by four orders of magnitude, reaching approximately 10^{3} .
doi_str_mv 10.1109/ACCESS.2024.3378518
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3015057797</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10474022</ieee_id><doaj_id>oai_doaj_org_article_2204231175cb4a8d982a5ffed312c364</doaj_id><sourcerecordid>3015057797</sourcerecordid><originalsourceid>FETCH-LOGICAL-c359t-1d3f8a50975b3754ffa42721203e84a631fae0b07b3c6330d437088b876d79033</originalsourceid><addsrcrecordid>eNpNUcFu2zAMNYoNaNH2C7aDgZ6dSaJkycciyNYAGTogzVmgZSpR4Mit7Bz691PmYigvJB7feyTwiuIbZwvOWfPjcblcbbcLwYRcAGijuLkqbgSvmwoU1F8-zdfF_TgeWS6TIaVvisMqHjC6EPflH0x4oolSufI-uEDRvZchlr-HjvpyHT2lDFG5Gy9sjOWunxL2YX-Y8jbrqpeEcfRDOmWPTYiEqdxO6eymc6K74qvHfqT7j35b7H6uXpZP1eb513r5uKkcqGaqeAfeoGKNVi1oJb1HKbTgggEZiTVwj8RapltwNQDrJGhmTGt03emGAdwW69m3G_BoX1M4YXq3Awb7DxjS3mKaguvJCsGkAM61cq1E0zVGoPKeOuDCQS2z18Ps9ZqGtzONkz0O5xTz-xYYV0xp3ejMgpnl0jCOifz_q5zZS0J2TsheErIfCWXV91kViOiTQmrJhIC_t_iLcg</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3015057797</pqid></control><display><type>article</type><title>Enhancing Parameter Efficiency in Model Inference Using an Ultralight Inter-Transformer Linear Structure</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Shi, Haoxiang ; Sakai, Tetsuya</creator><creatorcontrib>Shi, Haoxiang ; Sakai, Tetsuya</creatorcontrib><description><![CDATA[Pre-trained language models are the cornerstone of modern natural language processing and information retrieval. However, fine-tuning all the parameters reduces the efficiency of models both in training and inference owing to their increasingly heavy structures. Existing methods for parameter efficiency still require approximately 1 MB of storage and have approximately <inline-formula> <tex-math notation="LaTeX">10^{7} </tex-math></inline-formula> operations during model deployment and inference. This puts a strain on the storage and processor capacity of end devices such as smartphones and IoT equipment, and slow model inference adversely affecting the user experience. To achieve more efficient and storage-friendly inference compared to mainstream methods, such as low-rank adaptation (LoRA) and Adapter, LayerConnect (hyper-network-assisted interlayer connectors) is proposed in this paper. Extensive experiments were conducted to validate the performance of LayerConnect for two essential tasks with completely different learning frameworks and purposes: natural language understanding (using the general language understanding evaluation (GLUE) benchmark) and information retrieval (using the a contextualized inverted list (COIL) framework). For both tasks, our LayerConnect saves up to 95.31% and 91.18% of parameters in LoRA and Adapter, respectively. In contrast, LayerConnect maintains performance degradation for GLUE and COIL to less than 8% and 3%, compared to LoRA. When compared to Adapter, the numbers become 5% and 3%, for GLUE and COIL, respectively. In addition, LayerConnect required approximately 100 kB of storage per task-specific trained model for both tasks and reduced the number of operations in the model inference by four orders of magnitude, reaching approximately <inline-formula> <tex-math notation="LaTeX">10^{3} </tex-math></inline-formula>.]]></description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2024.3378518</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Adaptation models ; Adapters ; Coils ; Connectors ; Efficiency ; green AI ; hypernetwork ; Inference ; Information retrieval ; Interlayers ; Mathematical models ; Microprocessors ; model inference ; Natural language ; Natural language processing ; Parameter efficiency ; Parameters ; Performance degradation ; pretrained language model ; Task analysis ; Training ; Transformers ; Tuning ; User experience</subject><ispartof>IEEE access, 2024, Vol.12, p.43734-43746</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c359t-1d3f8a50975b3754ffa42721203e84a631fae0b07b3c6330d437088b876d79033</cites><orcidid>0009-0002-9204-0351 ; 0000-0002-6720-963X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10474022$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2102,4024,27633,27923,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Shi, Haoxiang</creatorcontrib><creatorcontrib>Sakai, Tetsuya</creatorcontrib><title>Enhancing Parameter Efficiency in Model Inference Using an Ultralight Inter-Transformer Linear Structure</title><title>IEEE access</title><addtitle>Access</addtitle><description><![CDATA[Pre-trained language models are the cornerstone of modern natural language processing and information retrieval. However, fine-tuning all the parameters reduces the efficiency of models both in training and inference owing to their increasingly heavy structures. Existing methods for parameter efficiency still require approximately 1 MB of storage and have approximately <inline-formula> <tex-math notation="LaTeX">10^{7} </tex-math></inline-formula> operations during model deployment and inference. This puts a strain on the storage and processor capacity of end devices such as smartphones and IoT equipment, and slow model inference adversely affecting the user experience. To achieve more efficient and storage-friendly inference compared to mainstream methods, such as low-rank adaptation (LoRA) and Adapter, LayerConnect (hyper-network-assisted interlayer connectors) is proposed in this paper. Extensive experiments were conducted to validate the performance of LayerConnect for two essential tasks with completely different learning frameworks and purposes: natural language understanding (using the general language understanding evaluation (GLUE) benchmark) and information retrieval (using the a contextualized inverted list (COIL) framework). For both tasks, our LayerConnect saves up to 95.31% and 91.18% of parameters in LoRA and Adapter, respectively. In contrast, LayerConnect maintains performance degradation for GLUE and COIL to less than 8% and 3%, compared to LoRA. When compared to Adapter, the numbers become 5% and 3%, for GLUE and COIL, respectively. In addition, LayerConnect required approximately 100 kB of storage per task-specific trained model for both tasks and reduced the number of operations in the model inference by four orders of magnitude, reaching approximately <inline-formula> <tex-math notation="LaTeX">10^{3} </tex-math></inline-formula>.]]></description><subject>Adaptation models</subject><subject>Adapters</subject><subject>Coils</subject><subject>Connectors</subject><subject>Efficiency</subject><subject>green AI</subject><subject>hypernetwork</subject><subject>Inference</subject><subject>Information retrieval</subject><subject>Interlayers</subject><subject>Mathematical models</subject><subject>Microprocessors</subject><subject>model inference</subject><subject>Natural language</subject><subject>Natural language processing</subject><subject>Parameter efficiency</subject><subject>Parameters</subject><subject>Performance degradation</subject><subject>pretrained language model</subject><subject>Task analysis</subject><subject>Training</subject><subject>Transformers</subject><subject>Tuning</subject><subject>User experience</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUcFu2zAMNYoNaNH2C7aDgZ6dSaJkycciyNYAGTogzVmgZSpR4Mit7Bz691PmYigvJB7feyTwiuIbZwvOWfPjcblcbbcLwYRcAGijuLkqbgSvmwoU1F8-zdfF_TgeWS6TIaVvisMqHjC6EPflH0x4oolSufI-uEDRvZchlr-HjvpyHT2lDFG5Gy9sjOWunxL2YX-Y8jbrqpeEcfRDOmWPTYiEqdxO6eymc6K74qvHfqT7j35b7H6uXpZP1eb513r5uKkcqGaqeAfeoGKNVi1oJb1HKbTgggEZiTVwj8RapltwNQDrJGhmTGt03emGAdwW69m3G_BoX1M4YXq3Awb7DxjS3mKaguvJCsGkAM61cq1E0zVGoPKeOuDCQS2z18Ps9ZqGtzONkz0O5xTz-xYYV0xp3ejMgpnl0jCOifz_q5zZS0J2TsheErIfCWXV91kViOiTQmrJhIC_t_iLcg</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Shi, Haoxiang</creator><creator>Sakai, Tetsuya</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0009-0002-9204-0351</orcidid><orcidid>https://orcid.org/0000-0002-6720-963X</orcidid></search><sort><creationdate>2024</creationdate><title>Enhancing Parameter Efficiency in Model Inference Using an Ultralight Inter-Transformer Linear Structure</title><author>Shi, Haoxiang ; Sakai, Tetsuya</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c359t-1d3f8a50975b3754ffa42721203e84a631fae0b07b3c6330d437088b876d79033</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Adaptation models</topic><topic>Adapters</topic><topic>Coils</topic><topic>Connectors</topic><topic>Efficiency</topic><topic>green AI</topic><topic>hypernetwork</topic><topic>Inference</topic><topic>Information retrieval</topic><topic>Interlayers</topic><topic>Mathematical models</topic><topic>Microprocessors</topic><topic>model inference</topic><topic>Natural language</topic><topic>Natural language processing</topic><topic>Parameter efficiency</topic><topic>Parameters</topic><topic>Performance degradation</topic><topic>pretrained language model</topic><topic>Task analysis</topic><topic>Training</topic><topic>Transformers</topic><topic>Tuning</topic><topic>User experience</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Shi, Haoxiang</creatorcontrib><creatorcontrib>Sakai, Tetsuya</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Shi, Haoxiang</au><au>Sakai, Tetsuya</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Enhancing Parameter Efficiency in Model Inference Using an Ultralight Inter-Transformer Linear Structure</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2024</date><risdate>2024</risdate><volume>12</volume><spage>43734</spage><epage>43746</epage><pages>43734-43746</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract><![CDATA[Pre-trained language models are the cornerstone of modern natural language processing and information retrieval. However, fine-tuning all the parameters reduces the efficiency of models both in training and inference owing to their increasingly heavy structures. Existing methods for parameter efficiency still require approximately 1 MB of storage and have approximately <inline-formula> <tex-math notation="LaTeX">10^{7} </tex-math></inline-formula> operations during model deployment and inference. This puts a strain on the storage and processor capacity of end devices such as smartphones and IoT equipment, and slow model inference adversely affecting the user experience. To achieve more efficient and storage-friendly inference compared to mainstream methods, such as low-rank adaptation (LoRA) and Adapter, LayerConnect (hyper-network-assisted interlayer connectors) is proposed in this paper. Extensive experiments were conducted to validate the performance of LayerConnect for two essential tasks with completely different learning frameworks and purposes: natural language understanding (using the general language understanding evaluation (GLUE) benchmark) and information retrieval (using the a contextualized inverted list (COIL) framework). For both tasks, our LayerConnect saves up to 95.31% and 91.18% of parameters in LoRA and Adapter, respectively. In contrast, LayerConnect maintains performance degradation for GLUE and COIL to less than 8% and 3%, compared to LoRA. When compared to Adapter, the numbers become 5% and 3%, for GLUE and COIL, respectively. In addition, LayerConnect required approximately 100 kB of storage per task-specific trained model for both tasks and reduced the number of operations in the model inference by four orders of magnitude, reaching approximately <inline-formula> <tex-math notation="LaTeX">10^{3} </tex-math></inline-formula>.]]></abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2024.3378518</doi><tpages>13</tpages><orcidid>https://orcid.org/0009-0002-9204-0351</orcidid><orcidid>https://orcid.org/0000-0002-6720-963X</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2024, Vol.12, p.43734-43746
issn 2169-3536
2169-3536
language eng
recordid cdi_proquest_journals_3015057797
source IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects Adaptation models
Adapters
Coils
Connectors
Efficiency
green AI
hypernetwork
Inference
Information retrieval
Interlayers
Mathematical models
Microprocessors
model inference
Natural language
Natural language processing
Parameter efficiency
Parameters
Performance degradation
pretrained language model
Task analysis
Training
Transformers
Tuning
User experience
title Enhancing Parameter Efficiency in Model Inference Using an Ultralight Inter-Transformer Linear Structure
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T19%3A11%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Enhancing%20Parameter%20Efficiency%20in%20Model%20Inference%20Using%20an%20Ultralight%20Inter-Transformer%20Linear%20Structure&rft.jtitle=IEEE%20access&rft.au=Shi,%20Haoxiang&rft.date=2024&rft.volume=12&rft.spage=43734&rft.epage=43746&rft.pages=43734-43746&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2024.3378518&rft_dat=%3Cproquest_cross%3E3015057797%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3015057797&rft_id=info:pmid/&rft_ieee_id=10474022&rft_doaj_id=oai_doaj_org_article_2204231175cb4a8d982a5ffed312c364&rfr_iscdi=true