Enhancing Parameter Efficiency in Model Inference Using an Ultralight Inter-Transformer Linear Structure
Pre-trained language models are the cornerstone of modern natural language processing and information retrieval. However, fine-tuning all the parameters reduces the efficiency of models both in training and inference owing to their increasingly heavy structures. Existing methods for parameter effici...
Gespeichert in:
Veröffentlicht in: | IEEE access 2024, Vol.12, p.43734-43746 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 43746 |
---|---|
container_issue | |
container_start_page | 43734 |
container_title | IEEE access |
container_volume | 12 |
creator | Shi, Haoxiang Sakai, Tetsuya |
description | Pre-trained language models are the cornerstone of modern natural language processing and information retrieval. However, fine-tuning all the parameters reduces the efficiency of models both in training and inference owing to their increasingly heavy structures. Existing methods for parameter efficiency still require approximately 1 MB of storage and have approximately 10^{7} operations during model deployment and inference. This puts a strain on the storage and processor capacity of end devices such as smartphones and IoT equipment, and slow model inference adversely affecting the user experience. To achieve more efficient and storage-friendly inference compared to mainstream methods, such as low-rank adaptation (LoRA) and Adapter, LayerConnect (hyper-network-assisted interlayer connectors) is proposed in this paper. Extensive experiments were conducted to validate the performance of LayerConnect for two essential tasks with completely different learning frameworks and purposes: natural language understanding (using the general language understanding evaluation (GLUE) benchmark) and information retrieval (using the a contextualized inverted list (COIL) framework). For both tasks, our LayerConnect saves up to 95.31% and 91.18% of parameters in LoRA and Adapter, respectively. In contrast, LayerConnect maintains performance degradation for GLUE and COIL to less than 8% and 3%, compared to LoRA. When compared to Adapter, the numbers become 5% and 3%, for GLUE and COIL, respectively. In addition, LayerConnect required approximately 100 kB of storage per task-specific trained model for both tasks and reduced the number of operations in the model inference by four orders of magnitude, reaching approximately 10^{3} . |
doi_str_mv | 10.1109/ACCESS.2024.3378518 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3015057797</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10474022</ieee_id><doaj_id>oai_doaj_org_article_2204231175cb4a8d982a5ffed312c364</doaj_id><sourcerecordid>3015057797</sourcerecordid><originalsourceid>FETCH-LOGICAL-c359t-1d3f8a50975b3754ffa42721203e84a631fae0b07b3c6330d437088b876d79033</originalsourceid><addsrcrecordid>eNpNUcFu2zAMNYoNaNH2C7aDgZ6dSaJkycciyNYAGTogzVmgZSpR4Mit7Bz691PmYigvJB7feyTwiuIbZwvOWfPjcblcbbcLwYRcAGijuLkqbgSvmwoU1F8-zdfF_TgeWS6TIaVvisMqHjC6EPflH0x4oolSufI-uEDRvZchlr-HjvpyHT2lDFG5Gy9sjOWunxL2YX-Y8jbrqpeEcfRDOmWPTYiEqdxO6eymc6K74qvHfqT7j35b7H6uXpZP1eb513r5uKkcqGaqeAfeoGKNVi1oJb1HKbTgggEZiTVwj8RapltwNQDrJGhmTGt03emGAdwW69m3G_BoX1M4YXq3Awb7DxjS3mKaguvJCsGkAM61cq1E0zVGoPKeOuDCQS2z18Ps9ZqGtzONkz0O5xTz-xYYV0xp3ejMgpnl0jCOifz_q5zZS0J2TsheErIfCWXV91kViOiTQmrJhIC_t_iLcg</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3015057797</pqid></control><display><type>article</type><title>Enhancing Parameter Efficiency in Model Inference Using an Ultralight Inter-Transformer Linear Structure</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Shi, Haoxiang ; Sakai, Tetsuya</creator><creatorcontrib>Shi, Haoxiang ; Sakai, Tetsuya</creatorcontrib><description><![CDATA[Pre-trained language models are the cornerstone of modern natural language processing and information retrieval. However, fine-tuning all the parameters reduces the efficiency of models both in training and inference owing to their increasingly heavy structures. Existing methods for parameter efficiency still require approximately 1 MB of storage and have approximately <inline-formula> <tex-math notation="LaTeX">10^{7} </tex-math></inline-formula> operations during model deployment and inference. This puts a strain on the storage and processor capacity of end devices such as smartphones and IoT equipment, and slow model inference adversely affecting the user experience. To achieve more efficient and storage-friendly inference compared to mainstream methods, such as low-rank adaptation (LoRA) and Adapter, LayerConnect (hyper-network-assisted interlayer connectors) is proposed in this paper. Extensive experiments were conducted to validate the performance of LayerConnect for two essential tasks with completely different learning frameworks and purposes: natural language understanding (using the general language understanding evaluation (GLUE) benchmark) and information retrieval (using the a contextualized inverted list (COIL) framework). For both tasks, our LayerConnect saves up to 95.31% and 91.18% of parameters in LoRA and Adapter, respectively. In contrast, LayerConnect maintains performance degradation for GLUE and COIL to less than 8% and 3%, compared to LoRA. When compared to Adapter, the numbers become 5% and 3%, for GLUE and COIL, respectively. In addition, LayerConnect required approximately 100 kB of storage per task-specific trained model for both tasks and reduced the number of operations in the model inference by four orders of magnitude, reaching approximately <inline-formula> <tex-math notation="LaTeX">10^{3} </tex-math></inline-formula>.]]></description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2024.3378518</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Adaptation models ; Adapters ; Coils ; Connectors ; Efficiency ; green AI ; hypernetwork ; Inference ; Information retrieval ; Interlayers ; Mathematical models ; Microprocessors ; model inference ; Natural language ; Natural language processing ; Parameter efficiency ; Parameters ; Performance degradation ; pretrained language model ; Task analysis ; Training ; Transformers ; Tuning ; User experience</subject><ispartof>IEEE access, 2024, Vol.12, p.43734-43746</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c359t-1d3f8a50975b3754ffa42721203e84a631fae0b07b3c6330d437088b876d79033</cites><orcidid>0009-0002-9204-0351 ; 0000-0002-6720-963X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10474022$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2102,4024,27633,27923,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Shi, Haoxiang</creatorcontrib><creatorcontrib>Sakai, Tetsuya</creatorcontrib><title>Enhancing Parameter Efficiency in Model Inference Using an Ultralight Inter-Transformer Linear Structure</title><title>IEEE access</title><addtitle>Access</addtitle><description><![CDATA[Pre-trained language models are the cornerstone of modern natural language processing and information retrieval. However, fine-tuning all the parameters reduces the efficiency of models both in training and inference owing to their increasingly heavy structures. Existing methods for parameter efficiency still require approximately 1 MB of storage and have approximately <inline-formula> <tex-math notation="LaTeX">10^{7} </tex-math></inline-formula> operations during model deployment and inference. This puts a strain on the storage and processor capacity of end devices such as smartphones and IoT equipment, and slow model inference adversely affecting the user experience. To achieve more efficient and storage-friendly inference compared to mainstream methods, such as low-rank adaptation (LoRA) and Adapter, LayerConnect (hyper-network-assisted interlayer connectors) is proposed in this paper. Extensive experiments were conducted to validate the performance of LayerConnect for two essential tasks with completely different learning frameworks and purposes: natural language understanding (using the general language understanding evaluation (GLUE) benchmark) and information retrieval (using the a contextualized inverted list (COIL) framework). For both tasks, our LayerConnect saves up to 95.31% and 91.18% of parameters in LoRA and Adapter, respectively. In contrast, LayerConnect maintains performance degradation for GLUE and COIL to less than 8% and 3%, compared to LoRA. When compared to Adapter, the numbers become 5% and 3%, for GLUE and COIL, respectively. In addition, LayerConnect required approximately 100 kB of storage per task-specific trained model for both tasks and reduced the number of operations in the model inference by four orders of magnitude, reaching approximately <inline-formula> <tex-math notation="LaTeX">10^{3} </tex-math></inline-formula>.]]></description><subject>Adaptation models</subject><subject>Adapters</subject><subject>Coils</subject><subject>Connectors</subject><subject>Efficiency</subject><subject>green AI</subject><subject>hypernetwork</subject><subject>Inference</subject><subject>Information retrieval</subject><subject>Interlayers</subject><subject>Mathematical models</subject><subject>Microprocessors</subject><subject>model inference</subject><subject>Natural language</subject><subject>Natural language processing</subject><subject>Parameter efficiency</subject><subject>Parameters</subject><subject>Performance degradation</subject><subject>pretrained language model</subject><subject>Task analysis</subject><subject>Training</subject><subject>Transformers</subject><subject>Tuning</subject><subject>User experience</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUcFu2zAMNYoNaNH2C7aDgZ6dSaJkycciyNYAGTogzVmgZSpR4Mit7Bz691PmYigvJB7feyTwiuIbZwvOWfPjcblcbbcLwYRcAGijuLkqbgSvmwoU1F8-zdfF_TgeWS6TIaVvisMqHjC6EPflH0x4oolSufI-uEDRvZchlr-HjvpyHT2lDFG5Gy9sjOWunxL2YX-Y8jbrqpeEcfRDOmWPTYiEqdxO6eymc6K74qvHfqT7j35b7H6uXpZP1eb513r5uKkcqGaqeAfeoGKNVi1oJb1HKbTgggEZiTVwj8RapltwNQDrJGhmTGt03emGAdwW69m3G_BoX1M4YXq3Awb7DxjS3mKaguvJCsGkAM61cq1E0zVGoPKeOuDCQS2z18Ps9ZqGtzONkz0O5xTz-xYYV0xp3ejMgpnl0jCOifz_q5zZS0J2TsheErIfCWXV91kViOiTQmrJhIC_t_iLcg</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Shi, Haoxiang</creator><creator>Sakai, Tetsuya</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0009-0002-9204-0351</orcidid><orcidid>https://orcid.org/0000-0002-6720-963X</orcidid></search><sort><creationdate>2024</creationdate><title>Enhancing Parameter Efficiency in Model Inference Using an Ultralight Inter-Transformer Linear Structure</title><author>Shi, Haoxiang ; Sakai, Tetsuya</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c359t-1d3f8a50975b3754ffa42721203e84a631fae0b07b3c6330d437088b876d79033</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Adaptation models</topic><topic>Adapters</topic><topic>Coils</topic><topic>Connectors</topic><topic>Efficiency</topic><topic>green AI</topic><topic>hypernetwork</topic><topic>Inference</topic><topic>Information retrieval</topic><topic>Interlayers</topic><topic>Mathematical models</topic><topic>Microprocessors</topic><topic>model inference</topic><topic>Natural language</topic><topic>Natural language processing</topic><topic>Parameter efficiency</topic><topic>Parameters</topic><topic>Performance degradation</topic><topic>pretrained language model</topic><topic>Task analysis</topic><topic>Training</topic><topic>Transformers</topic><topic>Tuning</topic><topic>User experience</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Shi, Haoxiang</creatorcontrib><creatorcontrib>Sakai, Tetsuya</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Shi, Haoxiang</au><au>Sakai, Tetsuya</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Enhancing Parameter Efficiency in Model Inference Using an Ultralight Inter-Transformer Linear Structure</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2024</date><risdate>2024</risdate><volume>12</volume><spage>43734</spage><epage>43746</epage><pages>43734-43746</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract><![CDATA[Pre-trained language models are the cornerstone of modern natural language processing and information retrieval. However, fine-tuning all the parameters reduces the efficiency of models both in training and inference owing to their increasingly heavy structures. Existing methods for parameter efficiency still require approximately 1 MB of storage and have approximately <inline-formula> <tex-math notation="LaTeX">10^{7} </tex-math></inline-formula> operations during model deployment and inference. This puts a strain on the storage and processor capacity of end devices such as smartphones and IoT equipment, and slow model inference adversely affecting the user experience. To achieve more efficient and storage-friendly inference compared to mainstream methods, such as low-rank adaptation (LoRA) and Adapter, LayerConnect (hyper-network-assisted interlayer connectors) is proposed in this paper. Extensive experiments were conducted to validate the performance of LayerConnect for two essential tasks with completely different learning frameworks and purposes: natural language understanding (using the general language understanding evaluation (GLUE) benchmark) and information retrieval (using the a contextualized inverted list (COIL) framework). For both tasks, our LayerConnect saves up to 95.31% and 91.18% of parameters in LoRA and Adapter, respectively. In contrast, LayerConnect maintains performance degradation for GLUE and COIL to less than 8% and 3%, compared to LoRA. When compared to Adapter, the numbers become 5% and 3%, for GLUE and COIL, respectively. In addition, LayerConnect required approximately 100 kB of storage per task-specific trained model for both tasks and reduced the number of operations in the model inference by four orders of magnitude, reaching approximately <inline-formula> <tex-math notation="LaTeX">10^{3} </tex-math></inline-formula>.]]></abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2024.3378518</doi><tpages>13</tpages><orcidid>https://orcid.org/0009-0002-9204-0351</orcidid><orcidid>https://orcid.org/0000-0002-6720-963X</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2169-3536 |
ispartof | IEEE access, 2024, Vol.12, p.43734-43746 |
issn | 2169-3536 2169-3536 |
language | eng |
recordid | cdi_proquest_journals_3015057797 |
source | IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals |
subjects | Adaptation models Adapters Coils Connectors Efficiency green AI hypernetwork Inference Information retrieval Interlayers Mathematical models Microprocessors model inference Natural language Natural language processing Parameter efficiency Parameters Performance degradation pretrained language model Task analysis Training Transformers Tuning User experience |
title | Enhancing Parameter Efficiency in Model Inference Using an Ultralight Inter-Transformer Linear Structure |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T19%3A11%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Enhancing%20Parameter%20Efficiency%20in%20Model%20Inference%20Using%20an%20Ultralight%20Inter-Transformer%20Linear%20Structure&rft.jtitle=IEEE%20access&rft.au=Shi,%20Haoxiang&rft.date=2024&rft.volume=12&rft.spage=43734&rft.epage=43746&rft.pages=43734-43746&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2024.3378518&rft_dat=%3Cproquest_cross%3E3015057797%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3015057797&rft_id=info:pmid/&rft_ieee_id=10474022&rft_doaj_id=oai_doaj_org_article_2204231175cb4a8d982a5ffed312c364&rfr_iscdi=true |