Measure-to-measure interpolation using Transformers

Transformers are deep neural network architectures that underpin the recent successes of large language models. Unlike more classical architectures that can be viewed as point-to-point maps, a Transformer acts as a measure-to-measure map implemented as specific interacting particle system on the uni...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-11
Hauptverfasser: Geshkovski, Borjan, Rigollet, Philippe, Ruiz-Balet, Domènec
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Geshkovski, Borjan
Rigollet, Philippe
Ruiz-Balet, Domènec
description Transformers are deep neural network architectures that underpin the recent successes of large language models. Unlike more classical architectures that can be viewed as point-to-point maps, a Transformer acts as a measure-to-measure map implemented as specific interacting particle system on the unit sphere: the input is the empirical measure of tokens in a prompt and its evolution is governed by the continuity equation. In fact, Transformers are not limited to empirical measures and can in principle process any input measure. As the nature of data processed by Transformers is expanding rapidly, it is important to investigate their expressive power as maps from an arbitrary measure to another arbitrary measure. To that end, we provide an explicit choice of parameters that allows a single Transformer to match \(N\) arbitrary input measures to \(N\) arbitrary target measures, under the minimal assumption that every pair of input-target measures can be matched by some transport map.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3126159725</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3126159725</sourcerecordid><originalsourceid>FETCH-proquest_journals_31261597253</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mQw9k1NLC4tStUtydfNhTAVMvNKUosK8nMSSzLz8xRKizPz0hVCihLzitPyi3JTi4p5GFjTEnOKU3mhNDeDsptriLOHbkFRfmFpanFJfFZ-aVEeUCre2NDIzNDUEmiZMXGqAH8bNR8</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3126159725</pqid></control><display><type>article</type><title>Measure-to-measure interpolation using Transformers</title><source>Freely Accessible Journals</source><creator>Geshkovski, Borjan ; Rigollet, Philippe ; Ruiz-Balet, Domènec</creator><creatorcontrib>Geshkovski, Borjan ; Rigollet, Philippe ; Ruiz-Balet, Domènec</creatorcontrib><description>Transformers are deep neural network architectures that underpin the recent successes of large language models. Unlike more classical architectures that can be viewed as point-to-point maps, a Transformer acts as a measure-to-measure map implemented as specific interacting particle system on the unit sphere: the input is the empirical measure of tokens in a prompt and its evolution is governed by the continuity equation. In fact, Transformers are not limited to empirical measures and can in principle process any input measure. As the nature of data processed by Transformers is expanding rapidly, it is important to investigate their expressive power as maps from an arbitrary measure to another arbitrary measure. To that end, we provide an explicit choice of parameters that allows a single Transformer to match \(N\) arbitrary input measures to \(N\) arbitrary target measures, under the minimal assumption that every pair of input-target measures can be matched by some transport map.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Artificial neural networks ; Continuity equation ; Large language models</subject><ispartof>arXiv.org, 2024-11</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Geshkovski, Borjan</creatorcontrib><creatorcontrib>Rigollet, Philippe</creatorcontrib><creatorcontrib>Ruiz-Balet, Domènec</creatorcontrib><title>Measure-to-measure interpolation using Transformers</title><title>arXiv.org</title><description>Transformers are deep neural network architectures that underpin the recent successes of large language models. Unlike more classical architectures that can be viewed as point-to-point maps, a Transformer acts as a measure-to-measure map implemented as specific interacting particle system on the unit sphere: the input is the empirical measure of tokens in a prompt and its evolution is governed by the continuity equation. In fact, Transformers are not limited to empirical measures and can in principle process any input measure. As the nature of data processed by Transformers is expanding rapidly, it is important to investigate their expressive power as maps from an arbitrary measure to another arbitrary measure. To that end, we provide an explicit choice of parameters that allows a single Transformer to match \(N\) arbitrary input measures to \(N\) arbitrary target measures, under the minimal assumption that every pair of input-target measures can be matched by some transport map.</description><subject>Artificial neural networks</subject><subject>Continuity equation</subject><subject>Large language models</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mQw9k1NLC4tStUtydfNhTAVMvNKUosK8nMSSzLz8xRKizPz0hVCihLzitPyi3JTi4p5GFjTEnOKU3mhNDeDsptriLOHbkFRfmFpanFJfFZ-aVEeUCre2NDIzNDUEmiZMXGqAH8bNR8</recordid><startdate>20241107</startdate><enddate>20241107</enddate><creator>Geshkovski, Borjan</creator><creator>Rigollet, Philippe</creator><creator>Ruiz-Balet, Domènec</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241107</creationdate><title>Measure-to-measure interpolation using Transformers</title><author>Geshkovski, Borjan ; Rigollet, Philippe ; Ruiz-Balet, Domènec</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31261597253</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial neural networks</topic><topic>Continuity equation</topic><topic>Large language models</topic><toplevel>online_resources</toplevel><creatorcontrib>Geshkovski, Borjan</creatorcontrib><creatorcontrib>Rigollet, Philippe</creatorcontrib><creatorcontrib>Ruiz-Balet, Domènec</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Geshkovski, Borjan</au><au>Rigollet, Philippe</au><au>Ruiz-Balet, Domènec</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Measure-to-measure interpolation using Transformers</atitle><jtitle>arXiv.org</jtitle><date>2024-11-07</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Transformers are deep neural network architectures that underpin the recent successes of large language models. Unlike more classical architectures that can be viewed as point-to-point maps, a Transformer acts as a measure-to-measure map implemented as specific interacting particle system on the unit sphere: the input is the empirical measure of tokens in a prompt and its evolution is governed by the continuity equation. In fact, Transformers are not limited to empirical measures and can in principle process any input measure. As the nature of data processed by Transformers is expanding rapidly, it is important to investigate their expressive power as maps from an arbitrary measure to another arbitrary measure. To that end, we provide an explicit choice of parameters that allows a single Transformer to match \(N\) arbitrary input measures to \(N\) arbitrary target measures, under the minimal assumption that every pair of input-target measures can be matched by some transport map.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-11
issn 2331-8422
language eng
recordid cdi_proquest_journals_3126159725
source Freely Accessible Journals
subjects Artificial neural networks
Continuity equation
Large language models
title Measure-to-measure interpolation using Transformers
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T20%3A30%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Measure-to-measure%20interpolation%20using%20Transformers&rft.jtitle=arXiv.org&rft.au=Geshkovski,%20Borjan&rft.date=2024-11-07&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3126159725%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3126159725&rft_id=info:pmid/&rfr_iscdi=true