Enhancing Parameter Efficiency in Model Inference Using an Ultralight Inter-Transformer Linear Structure

Pre-trained language models are the cornerstone of modern natural language processing and information retrieval. However, fine-tuning all the parameters reduces the efficiency of models both in training and inference owing to their increasingly heavy structures. Existing methods for parameter effici...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2024, Vol.12, p.43734-43746
Hauptverfasser:	Shi, Haoxiang, Sakai, Tetsuya
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptation models Adapters Coils Connectors Efficiency green AI hypernetwork Inference Information retrieval Interlayers Mathematical models Microprocessors model inference Natural language Natural language processing Parameter efficiency Parameters Performance degradation pretrained language model Task analysis Training Transformers Tuning User experience
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	43746
container_issue
container_start_page	43734
container_title	IEEE access
container_volume	12
creator	Shi, Haoxiang Sakai, Tetsuya
description	Pre-trained language models are the cornerstone of modern natural language processing and information retrieval. However, fine-tuning all the parameters reduces the efficiency of models both in training and inference owing to their increasingly heavy structures. Existing methods for parameter efficiency still require approximately 1 MB of storage and have approximately 10^{7} operations during model deployment and inference. This puts a strain on the storage and processor capacity of end devices such as smartphones and IoT equipment, and slow model inference adversely affecting the user experience. To achieve more efficient and storage-friendly inference compared to mainstream methods, such as low-rank adaptation (LoRA) and Adapter, LayerConnect (hyper-network-assisted interlayer connectors) is proposed in this paper. Extensive experiments were conducted to validate the performance of LayerConnect for two essential tasks with completely different learning frameworks and purposes: natural language understanding (using the general language understanding evaluation (GLUE) benchmark) and information retrieval (using the a contextualized inverted list (COIL) framework). For both tasks, our LayerConnect saves up to 95.31% and 91.18% of parameters in LoRA and Adapter, respectively. In contrast, LayerConnect maintains performance degradation for GLUE and COIL to less than 8% and 3%, compared to LoRA. When compared to Adapter, the numbers become 5% and 3%, for GLUE and COIL, respectively. In addition, LayerConnect required approximately 100 kB of storage per task-specific trained model for both tasks and reduced the number of operations in the model inference by four orders of magnitude, reaching approximately 10^{3} .
doi_str_mv	10.1109/ACCESS.2024.3378518
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3015057797</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10474022</ieee_id><doaj_id>oai_doaj_org_article_2204231175cb4a8d982a5ffed312c364</doaj_id><sourcerecordid>3015057797</sourcerecordid><originalsourceid>FETCH-LOGICAL-c359t-1d3f8a50975b3754ffa42721203e84a631fae0b07b3c6330d437088b876d79033</originalsourceid><addsrcrecordid>eNpNUcFu2zAMNYoNaNH2C7aDgZ6dSaJkycciyNYAGTogzVmgZSpR4Mit7Bz691PmYigvJB7feyTwiuIbZwvOWfPjcblcbbcLwYRcAGijuLkqbgSvmwoU1F8-zdfF_TgeWS6TIaVvisMqHjC6EPflH0x4oolSufI-uEDRvZchlr-HjvpyHT2lDFG5Gy9sjOWunxL2YX-Y8jbrqpeEcfRDOmWPTYiEqdxO6eymc6K74qvHfqT7j35b7H6uXpZP1eb513r5uKkcqGaqeAfeoGKNVi1oJb1HKbTgggEZiTVwj8RapltwNQDrJGhmTGt03emGAdwW69m3G_BoX1M4YXq3Awb7DxjS3mKaguvJCsGkAM61cq1E0zVGoPKeOuDCQS2z18Ps9ZqGtzONkz0O5xTz-xYYV0xp3ejMgpnl0jCOifz_q5zZS0J2TsheErIfCWXV91kViOiTQmrJhIC_t_iLcg</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3015057797</pqid></control><display><type>article</type><title>Enhancing Parameter Efficiency in Model Inference Using an Ultralight Inter-Transformer Linear Structure</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Shi, Haoxiang ; Sakai, Tetsuya</creator><creatorcontrib>Shi, Haoxiang ; Sakai, Tetsuya</creatorcontrib><description><![CDATA[Pre-trained language models are the cornerstone of modern natural language processing and information retrieval. However, fine-tuning all the parameters reduces the efficiency of models both in training and inference owing to their increasingly heavy structures. Existing methods for parameter efficiency still require approximately 1 MB of storage and have approximately <inline-formula> <tex-math notation="LaTeX">10^{7} </tex-math></inline-formula> operations during model deployment and inference. This puts a strain on the storage and processor capacity of end devices such as smartphones and IoT equipment, and slow model inference adversely affecting the user experience. To achieve more efficient and storage-friendly inference compared to mainstream methods, such as low-rank adaptation (LoRA) and Adapter, LayerConnect (hyper-network-assisted interlayer connectors) is proposed in this paper. Extensive experiments were conducted to validate the performance of LayerConnect for two essential tasks with completely different learning frameworks and purposes: natural language understanding (using the general language understanding evaluation (GLUE) benchmark) and information retrieval (using the a contextualized inverted list (COIL) framework). For both tasks, our LayerConnect saves up to 95.31% and 91.18% of parameters in LoRA and Adapter, respectively. In contrast, LayerConnect maintains performance degradation for GLUE and COIL to less than 8% and 3%, compared to LoRA. When compared to Adapter, the numbers become 5% and 3%, for GLUE and COIL, respectively. In addition, LayerConnect required approximately 100 kB of storage per task-specific trained model for both tasks and reduced the number of operations in the model inference by four orders of magnitude, reaching approximately <inline-formula> <tex-math notation="LaTeX">10^{3} </tex-math></inline-formula>.]]></description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2024.3378518</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Adaptation models ; Adapters ; Coils ; Connectors ; Efficiency ; green AI ; hypernetwork ; Inference ; Information retrieval ; Interlayers ; Mathematical models ; Microprocessors ; model inference ; Natural language ; Natural language processing ; Parameter efficiency ; Parameters ; Performance degradation ; pretrained language model ; Task analysis ; Training ; Transformers ; Tuning ; User experience</subject><ispartof>IEEE access, 2024, Vol.12, p.43734-43746</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c359t-1d3f8a50975b3754ffa42721203e84a631fae0b07b3c6330d437088b876d79033</cites><orcidid>0009-0002-9204-0351 ; 0000-0002-6720-963X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10474022$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2102,4024,27633,27923,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Shi, Haoxiang</creatorcontrib><creatorcontrib>Sakai, Tetsuya</creatorcontrib><title>Enhancing Parameter Efficiency in Model Inference Using an Ultralight Inter-Transformer Linear Structure</title><title>IEEE access</title><addtitle>Access</addtitle><description><![CDATA[Pre-trained language models are the cornerstone of modern natural language processing and information retrieval. However, fine-tuning all the parameters reduces the efficiency of models both in training and inference owing to their increasingly heavy structures. Existing methods for parameter efficiency still require approximately 1 MB of storage and have approximately <inline-formula> <tex-math notation="LaTeX">10^{7} </tex-math></inline-formula> operations during model deployment and inference. This puts a strain on the storage and processor capacity of end devices such as smartphones and IoT equipment, and slow model inference adversely affecting the user experience. To achieve more efficient and storage-friendly inference compared to mainstream methods, such as low-rank adaptation (LoRA) and Adapter, LayerConnect (hyper-network-assisted interlayer connectors) is proposed in this paper. Extensive experiments were conducted to validate the performance of LayerConnect for two essential tasks with completely different learning frameworks and purposes: natural language understanding (using the general language understanding evaluation (GLUE) benchmark) and information retrieval (using the a contextualized inverted list (COIL) framework). For both tasks, our LayerConnect saves up to 95.31% and 91.18% of parameters in LoRA and Adapter, respectively. In contrast, LayerConnect maintains performance degradation for GLUE and COIL to less than 8% and 3%, compared to LoRA. When compared to Adapter, the numbers become 5% and 3%, for GLUE and COIL, respectively. In addition, LayerConnect required approximately 100 kB of storage per task-specific trained model for both tasks and reduced the number of operations in the model inference by four orders of magnitude, reaching approximately <inline-formula> <tex-math notation="LaTeX">10^{3} </tex-math></inline-formula>.]]></description><subject>Adaptation models</subject><subject>Adapters</subject><subject>Coils</subject><subject>Connectors</subject><subject>Efficiency</subject><subject>green AI</subject><subject>hypernetwork</subject><subject>Inference</subject><subject>Information retrieval</subject><subject>Interlayers</subject><subject>Mathematical models</subject><subject>Microprocessors</subject><subject>model inference</subject><subject>Natural language</subject><subject>Natural language processing</subject><subject>Parameter efficiency</subject><subject>Parameters</subject><subject>Performance degradation</subject><subject>pretrained language model</subject><subject>Task analysis</subject><subject>Training</subject><subject>Transformers</subject><subject>Tuning</subject><subject>User experience</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUcFu2zAMNYoNaNH2C7aDgZ6dSaJkycciyNYAGTogzVmgZSpR4Mit7Bz691PmYigvJB7feyTwiuIbZwvOWfPjcblcbbcLwYRcAGijuLkqbgSvmwoU1F8-zdfF_TgeWS6TIaVvisMqHjC6EPflH0x4oolSufI-uEDRvZchlr-HjvpyHT2lDFG5Gy9sjOWunxL2YX-Y8jbrqpeEcfRDOmWPTYiEqdxO6eymc6K74qvHfqT7j35b7H6uXpZP1eb513r5uKkcqGaqeAfeoGKNVi1oJb1HKbTgggEZiTVwj8RapltwNQDrJGhmTGt03emGAdwW69m3G_BoX1M4YXq3Awb7DxjS3mKaguvJCsGkAM61cq1E0zVGoPKeOuDCQS2z18Ps9ZqGtzONkz0O5xTz-xYYV0xp3ejMgpnl0jCOifz_q5zZS0J2TsheErIfCWXV91kViOiTQmrJhIC_t_iLcg</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Shi, Haoxiang</creator><creator>Sakai, Tetsuya</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0009-0002-9204-0351</orcidid><orcidid>https://orcid.org/0000-0002-6720-963X</orcidid></search><sort><creationdate>2024</creationdate><title>Enhancing Parameter Efficiency in Model Inference Using an Ultralight Inter-Transformer Linear Structure</title><author>Shi, Haoxiang ; Sakai, Tetsuya</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c359t-1d3f8a50975b3754ffa42721203e84a631fae0b07b3c6330d437088b876d79033</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Adaptation models</topic><topic>Adapters</topic><topic>Coils</topic><topic>Connectors</topic><topic>Efficiency</topic><topic>green AI</topic><topic>hypernetwork</topic><topic>Inference</topic><topic>Information retrieval</topic><topic>Interlayers</topic><topic>Mathematical models</topic><topic>Microprocessors</topic><topic>model inference</topic><topic>Natural language</topic><topic>Natural language processing</topic><topic>Parameter efficiency</topic><topic>Parameters</topic><topic>Performance degradation</topic><topic>pretrained language model</topic><topic>Task analysis</topic><topic>Training</topic><topic>Transformers</topic><topic>Tuning</topic><topic>User experience</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Shi, Haoxiang</creatorcontrib><creatorcontrib>Sakai, Tetsuya</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Shi, Haoxiang</au><au>Sakai, Tetsuya</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Enhancing Parameter Efficiency in Model Inference Using an Ultralight Inter-Transformer Linear Structure</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2024</date><risdate>2024</risdate><volume>12</volume><spage>43734</spage><epage>43746</epage><pages>43734-43746</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract><![CDATA[Pre-trained language models are the cornerstone of modern natural language processing and information retrieval. However, fine-tuning all the parameters reduces the efficiency of models both in training and inference owing to their increasingly heavy structures. Existing methods for parameter efficiency still require approximately 1 MB of storage and have approximately <inline-formula> <tex-math notation="LaTeX">10^{7} </tex-math></inline-formula> operations during model deployment and inference. This puts a strain on the storage and processor capacity of end devices such as smartphones and IoT equipment, and slow model inference adversely affecting the user experience. To achieve more efficient and storage-friendly inference compared to mainstream methods, such as low-rank adaptation (LoRA) and Adapter, LayerConnect (hyper-network-assisted interlayer connectors) is proposed in this paper. Extensive experiments were conducted to validate the performance of LayerConnect for two essential tasks with completely different learning frameworks and purposes: natural language understanding (using the general language understanding evaluation (GLUE) benchmark) and information retrieval (using the a contextualized inverted list (COIL) framework). For both tasks, our LayerConnect saves up to 95.31% and 91.18% of parameters in LoRA and Adapter, respectively. In contrast, LayerConnect maintains performance degradation for GLUE and COIL to less than 8% and 3%, compared to LoRA. When compared to Adapter, the numbers become 5% and 3%, for GLUE and COIL, respectively. In addition, LayerConnect required approximately 100 kB of storage per task-specific trained model for both tasks and reduced the number of operations in the model inference by four orders of magnitude, reaching approximately <inline-formula> <tex-math notation="LaTeX">10^{3} </tex-math></inline-formula>.]]></abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2024.3378518</doi><tpages>13</tpages><orcidid>https://orcid.org/0009-0002-9204-0351</orcidid><orcidid>https://orcid.org/0000-0002-6720-963X</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2169-3536
ispartof	IEEE access, 2024, Vol.12, p.43734-43746
issn	2169-3536 2169-3536
language	eng
recordid	cdi_proquest_journals_3015057797
source	IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects	Adaptation models Adapters Coils Connectors Efficiency green AI hypernetwork Inference Information retrieval Interlayers Mathematical models Microprocessors model inference Natural language Natural language processing Parameter efficiency Parameters Performance degradation pretrained language model Task analysis Training Transformers Tuning User experience
title	Enhancing Parameter Efficiency in Model Inference Using an Ultralight Inter-Transformer Linear Structure
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T19%3A11%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Enhancing%20Parameter%20Efficiency%20in%20Model%20Inference%20Using%20an%20Ultralight%20Inter-Transformer%20Linear%20Structure&rft.jtitle=IEEE%20access&rft.au=Shi,%20Haoxiang&rft.date=2024&rft.volume=12&rft.spage=43734&rft.epage=43746&rft.pages=43734-43746&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2024.3378518&rft_dat=%3Cproquest_cross%3E3015057797%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3015057797&rft_id=info:pmid/&rft_ieee_id=10474022&rft_doaj_id=oai_doaj_org_article_2204231175cb4a8d982a5ffed312c364&rfr_iscdi=true