Cospeech body motion generation using a transformer

Body language is a method for communicating across languages and cultures. Making good use of body motions in speech can enhance persuasiveness, improve personal charisma, and make speech more effective. Generating matching body motions for digital avatars and social robots based on content has beco...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied intelligence (Dordrecht, Netherlands) Netherlands), 2024-11, Vol.54 (22), p.11525-11535
Hauptverfasser: Lu, Zixiang, He, Zhitong, Hong, Jiale, Gao, Ping
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 11535
container_issue 22
container_start_page 11525
container_title Applied intelligence (Dordrecht, Netherlands)
container_volume 54
creator Lu, Zixiang
He, Zhitong
Hong, Jiale
Gao, Ping
description Body language is a method for communicating across languages and cultures. Making good use of body motions in speech can enhance persuasiveness, improve personal charisma, and make speech more effective. Generating matching body motions for digital avatars and social robots based on content has become an important topic. In this paper, we propose a transformer-based network model to generate body motions from input speech. Our model includes an audio transformer encoder, motion transformer encoder, template variational autoencoder, cross-modal transformer encoder, and motion decoder. Additionally, we propose a novel evaluation metric for describing motion change trends in terms of distance. The experimental results show that the proposed model provides higher-quality motion generation results than state-of-the-art models. As indicated by visual skeleton motions, our results are more natural and realistic than those of other methods. Additionally, the generated motions yield superior results in terms of multiple evaluation metrics.
doi_str_mv 10.1007/s10489-024-05769-4
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3106537064</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3106537064</sourcerecordid><originalsourceid>FETCH-LOGICAL-c200t-b67185c8a36ce2e93dbdf8c34b13baffafd104312ff8d0ca06fc517ad07d42f33</originalsourceid><addsrcrecordid>eNp9kM1LxDAQxYMouK7-A54KnqOTTpq0R1n8ggUvCt5Cmo_axTY1aQ_739vdCt48zcC89-bxI-SawS0DkHeJAS8rCjmnUEhRUX5CVqyQSCWv5ClZQTWfhKg-zslFSjsAQAS2IrgJaXDOfGZ1sPusC2Mb-qxxvYv6uE6p7ZtMZ2PUffIhdi5ekjOvv5K7-p1r8v748LZ5ptvXp5fN_ZaaHGCktZCsLEypURiXuwptbX1pkNcMa-299nZujSz3vrRgNAhvCia1BWl57hHX5GbJHWL4nlwa1S5MsZ9fKmQgCpQg-KzKF5WJIaXovBpi2-m4VwzUAY5a4KgZjjrCUQcTLqY0i_vGxb_of1w_pDBntg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3106537064</pqid></control><display><type>article</type><title>Cospeech body motion generation using a transformer</title><source>SpringerLink Journals - AutoHoldings</source><creator>Lu, Zixiang ; He, Zhitong ; Hong, Jiale ; Gao, Ping</creator><creatorcontrib>Lu, Zixiang ; He, Zhitong ; Hong, Jiale ; Gao, Ping</creatorcontrib><description>Body language is a method for communicating across languages and cultures. Making good use of body motions in speech can enhance persuasiveness, improve personal charisma, and make speech more effective. Generating matching body motions for digital avatars and social robots based on content has become an important topic. In this paper, we propose a transformer-based network model to generate body motions from input speech. Our model includes an audio transformer encoder, motion transformer encoder, template variational autoencoder, cross-modal transformer encoder, and motion decoder. Additionally, we propose a novel evaluation metric for describing motion change trends in terms of distance. The experimental results show that the proposed model provides higher-quality motion generation results than state-of-the-art models. As indicated by visual skeleton motions, our results are more natural and realistic than those of other methods. Additionally, the generated motions yield superior results in terms of multiple evaluation metrics.</description><identifier>ISSN: 0924-669X</identifier><identifier>EISSN: 1573-7497</identifier><identifier>DOI: 10.1007/s10489-024-05769-4</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Artificial Intelligence ; Avatars ; Coders ; Communication ; Computer Science ; Language ; Machines ; Manufacturing ; Mechanical Engineering ; Neural networks ; Processes ; Robot dynamics ; Speech ; State-of-the-art reviews ; Transformers</subject><ispartof>Applied intelligence (Dordrecht, Netherlands), 2024-11, Vol.54 (22), p.11525-11535</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c200t-b67185c8a36ce2e93dbdf8c34b13baffafd104312ff8d0ca06fc517ad07d42f33</cites><orcidid>0000-0003-2743-2017</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10489-024-05769-4$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10489-024-05769-4$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Lu, Zixiang</creatorcontrib><creatorcontrib>He, Zhitong</creatorcontrib><creatorcontrib>Hong, Jiale</creatorcontrib><creatorcontrib>Gao, Ping</creatorcontrib><title>Cospeech body motion generation using a transformer</title><title>Applied intelligence (Dordrecht, Netherlands)</title><addtitle>Appl Intell</addtitle><description>Body language is a method for communicating across languages and cultures. Making good use of body motions in speech can enhance persuasiveness, improve personal charisma, and make speech more effective. Generating matching body motions for digital avatars and social robots based on content has become an important topic. In this paper, we propose a transformer-based network model to generate body motions from input speech. Our model includes an audio transformer encoder, motion transformer encoder, template variational autoencoder, cross-modal transformer encoder, and motion decoder. Additionally, we propose a novel evaluation metric for describing motion change trends in terms of distance. The experimental results show that the proposed model provides higher-quality motion generation results than state-of-the-art models. As indicated by visual skeleton motions, our results are more natural and realistic than those of other methods. Additionally, the generated motions yield superior results in terms of multiple evaluation metrics.</description><subject>Artificial Intelligence</subject><subject>Avatars</subject><subject>Coders</subject><subject>Communication</subject><subject>Computer Science</subject><subject>Language</subject><subject>Machines</subject><subject>Manufacturing</subject><subject>Mechanical Engineering</subject><subject>Neural networks</subject><subject>Processes</subject><subject>Robot dynamics</subject><subject>Speech</subject><subject>State-of-the-art reviews</subject><subject>Transformers</subject><issn>0924-669X</issn><issn>1573-7497</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kM1LxDAQxYMouK7-A54KnqOTTpq0R1n8ggUvCt5Cmo_axTY1aQ_739vdCt48zcC89-bxI-SawS0DkHeJAS8rCjmnUEhRUX5CVqyQSCWv5ClZQTWfhKg-zslFSjsAQAS2IrgJaXDOfGZ1sPusC2Mb-qxxvYv6uE6p7ZtMZ2PUffIhdi5ekjOvv5K7-p1r8v748LZ5ptvXp5fN_ZaaHGCktZCsLEypURiXuwptbX1pkNcMa-299nZujSz3vrRgNAhvCia1BWl57hHX5GbJHWL4nlwa1S5MsZ9fKmQgCpQg-KzKF5WJIaXovBpi2-m4VwzUAY5a4KgZjjrCUQcTLqY0i_vGxb_of1w_pDBntg</recordid><startdate>20241101</startdate><enddate>20241101</enddate><creator>Lu, Zixiang</creator><creator>He, Zhitong</creator><creator>Hong, Jiale</creator><creator>Gao, Ping</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-2743-2017</orcidid></search><sort><creationdate>20241101</creationdate><title>Cospeech body motion generation using a transformer</title><author>Lu, Zixiang ; He, Zhitong ; Hong, Jiale ; Gao, Ping</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c200t-b67185c8a36ce2e93dbdf8c34b13baffafd104312ff8d0ca06fc517ad07d42f33</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial Intelligence</topic><topic>Avatars</topic><topic>Coders</topic><topic>Communication</topic><topic>Computer Science</topic><topic>Language</topic><topic>Machines</topic><topic>Manufacturing</topic><topic>Mechanical Engineering</topic><topic>Neural networks</topic><topic>Processes</topic><topic>Robot dynamics</topic><topic>Speech</topic><topic>State-of-the-art reviews</topic><topic>Transformers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lu, Zixiang</creatorcontrib><creatorcontrib>He, Zhitong</creatorcontrib><creatorcontrib>Hong, Jiale</creatorcontrib><creatorcontrib>Gao, Ping</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Applied intelligence (Dordrecht, Netherlands)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lu, Zixiang</au><au>He, Zhitong</au><au>Hong, Jiale</au><au>Gao, Ping</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Cospeech body motion generation using a transformer</atitle><jtitle>Applied intelligence (Dordrecht, Netherlands)</jtitle><stitle>Appl Intell</stitle><date>2024-11-01</date><risdate>2024</risdate><volume>54</volume><issue>22</issue><spage>11525</spage><epage>11535</epage><pages>11525-11535</pages><issn>0924-669X</issn><eissn>1573-7497</eissn><abstract>Body language is a method for communicating across languages and cultures. Making good use of body motions in speech can enhance persuasiveness, improve personal charisma, and make speech more effective. Generating matching body motions for digital avatars and social robots based on content has become an important topic. In this paper, we propose a transformer-based network model to generate body motions from input speech. Our model includes an audio transformer encoder, motion transformer encoder, template variational autoencoder, cross-modal transformer encoder, and motion decoder. Additionally, we propose a novel evaluation metric for describing motion change trends in terms of distance. The experimental results show that the proposed model provides higher-quality motion generation results than state-of-the-art models. As indicated by visual skeleton motions, our results are more natural and realistic than those of other methods. Additionally, the generated motions yield superior results in terms of multiple evaluation metrics.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10489-024-05769-4</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0003-2743-2017</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0924-669X
ispartof Applied intelligence (Dordrecht, Netherlands), 2024-11, Vol.54 (22), p.11525-11535
issn 0924-669X
1573-7497
language eng
recordid cdi_proquest_journals_3106537064
source SpringerLink Journals - AutoHoldings
subjects Artificial Intelligence
Avatars
Coders
Communication
Computer Science
Language
Machines
Manufacturing
Mechanical Engineering
Neural networks
Processes
Robot dynamics
Speech
State-of-the-art reviews
Transformers
title Cospeech body motion generation using a transformer
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T04%3A38%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Cospeech%20body%20motion%20generation%20using%20a%20transformer&rft.jtitle=Applied%20intelligence%20(Dordrecht,%20Netherlands)&rft.au=Lu,%20Zixiang&rft.date=2024-11-01&rft.volume=54&rft.issue=22&rft.spage=11525&rft.epage=11535&rft.pages=11525-11535&rft.issn=0924-669X&rft.eissn=1573-7497&rft_id=info:doi/10.1007/s10489-024-05769-4&rft_dat=%3Cproquest_cross%3E3106537064%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3106537064&rft_id=info:pmid/&rfr_iscdi=true