Cospeech body motion generation using a transformer
Body language is a method for communicating across languages and cultures. Making good use of body motions in speech can enhance persuasiveness, improve personal charisma, and make speech more effective. Generating matching body motions for digital avatars and social robots based on content has beco...
Gespeichert in:
Veröffentlicht in: | Applied intelligence (Dordrecht, Netherlands) Netherlands), 2024-11, Vol.54 (22), p.11525-11535 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 11535 |
---|---|
container_issue | 22 |
container_start_page | 11525 |
container_title | Applied intelligence (Dordrecht, Netherlands) |
container_volume | 54 |
creator | Lu, Zixiang He, Zhitong Hong, Jiale Gao, Ping |
description | Body language is a method for communicating across languages and cultures. Making good use of body motions in speech can enhance persuasiveness, improve personal charisma, and make speech more effective. Generating matching body motions for digital avatars and social robots based on content has become an important topic. In this paper, we propose a transformer-based network model to generate body motions from input speech. Our model includes an audio transformer encoder, motion transformer encoder, template variational autoencoder, cross-modal transformer encoder, and motion decoder. Additionally, we propose a novel evaluation metric for describing motion change trends in terms of distance. The experimental results show that the proposed model provides higher-quality motion generation results than state-of-the-art models. As indicated by visual skeleton motions, our results are more natural and realistic than those of other methods. Additionally, the generated motions yield superior results in terms of multiple evaluation metrics. |
doi_str_mv | 10.1007/s10489-024-05769-4 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3106537064</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3106537064</sourcerecordid><originalsourceid>FETCH-LOGICAL-c200t-b67185c8a36ce2e93dbdf8c34b13baffafd104312ff8d0ca06fc517ad07d42f33</originalsourceid><addsrcrecordid>eNp9kM1LxDAQxYMouK7-A54KnqOTTpq0R1n8ggUvCt5Cmo_axTY1aQ_739vdCt48zcC89-bxI-SawS0DkHeJAS8rCjmnUEhRUX5CVqyQSCWv5ClZQTWfhKg-zslFSjsAQAS2IrgJaXDOfGZ1sPusC2Mb-qxxvYv6uE6p7ZtMZ2PUffIhdi5ekjOvv5K7-p1r8v748LZ5ptvXp5fN_ZaaHGCktZCsLEypURiXuwptbX1pkNcMa-299nZujSz3vrRgNAhvCia1BWl57hHX5GbJHWL4nlwa1S5MsZ9fKmQgCpQg-KzKF5WJIaXovBpi2-m4VwzUAY5a4KgZjjrCUQcTLqY0i_vGxb_of1w_pDBntg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3106537064</pqid></control><display><type>article</type><title>Cospeech body motion generation using a transformer</title><source>SpringerLink Journals - AutoHoldings</source><creator>Lu, Zixiang ; He, Zhitong ; Hong, Jiale ; Gao, Ping</creator><creatorcontrib>Lu, Zixiang ; He, Zhitong ; Hong, Jiale ; Gao, Ping</creatorcontrib><description>Body language is a method for communicating across languages and cultures. Making good use of body motions in speech can enhance persuasiveness, improve personal charisma, and make speech more effective. Generating matching body motions for digital avatars and social robots based on content has become an important topic. In this paper, we propose a transformer-based network model to generate body motions from input speech. Our model includes an audio transformer encoder, motion transformer encoder, template variational autoencoder, cross-modal transformer encoder, and motion decoder. Additionally, we propose a novel evaluation metric for describing motion change trends in terms of distance. The experimental results show that the proposed model provides higher-quality motion generation results than state-of-the-art models. As indicated by visual skeleton motions, our results are more natural and realistic than those of other methods. Additionally, the generated motions yield superior results in terms of multiple evaluation metrics.</description><identifier>ISSN: 0924-669X</identifier><identifier>EISSN: 1573-7497</identifier><identifier>DOI: 10.1007/s10489-024-05769-4</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Artificial Intelligence ; Avatars ; Coders ; Communication ; Computer Science ; Language ; Machines ; Manufacturing ; Mechanical Engineering ; Neural networks ; Processes ; Robot dynamics ; Speech ; State-of-the-art reviews ; Transformers</subject><ispartof>Applied intelligence (Dordrecht, Netherlands), 2024-11, Vol.54 (22), p.11525-11535</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c200t-b67185c8a36ce2e93dbdf8c34b13baffafd104312ff8d0ca06fc517ad07d42f33</cites><orcidid>0000-0003-2743-2017</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10489-024-05769-4$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10489-024-05769-4$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Lu, Zixiang</creatorcontrib><creatorcontrib>He, Zhitong</creatorcontrib><creatorcontrib>Hong, Jiale</creatorcontrib><creatorcontrib>Gao, Ping</creatorcontrib><title>Cospeech body motion generation using a transformer</title><title>Applied intelligence (Dordrecht, Netherlands)</title><addtitle>Appl Intell</addtitle><description>Body language is a method for communicating across languages and cultures. Making good use of body motions in speech can enhance persuasiveness, improve personal charisma, and make speech more effective. Generating matching body motions for digital avatars and social robots based on content has become an important topic. In this paper, we propose a transformer-based network model to generate body motions from input speech. Our model includes an audio transformer encoder, motion transformer encoder, template variational autoencoder, cross-modal transformer encoder, and motion decoder. Additionally, we propose a novel evaluation metric for describing motion change trends in terms of distance. The experimental results show that the proposed model provides higher-quality motion generation results than state-of-the-art models. As indicated by visual skeleton motions, our results are more natural and realistic than those of other methods. Additionally, the generated motions yield superior results in terms of multiple evaluation metrics.</description><subject>Artificial Intelligence</subject><subject>Avatars</subject><subject>Coders</subject><subject>Communication</subject><subject>Computer Science</subject><subject>Language</subject><subject>Machines</subject><subject>Manufacturing</subject><subject>Mechanical Engineering</subject><subject>Neural networks</subject><subject>Processes</subject><subject>Robot dynamics</subject><subject>Speech</subject><subject>State-of-the-art reviews</subject><subject>Transformers</subject><issn>0924-669X</issn><issn>1573-7497</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kM1LxDAQxYMouK7-A54KnqOTTpq0R1n8ggUvCt5Cmo_axTY1aQ_739vdCt48zcC89-bxI-SawS0DkHeJAS8rCjmnUEhRUX5CVqyQSCWv5ClZQTWfhKg-zslFSjsAQAS2IrgJaXDOfGZ1sPusC2Mb-qxxvYv6uE6p7ZtMZ2PUffIhdi5ekjOvv5K7-p1r8v748LZ5ptvXp5fN_ZaaHGCktZCsLEypURiXuwptbX1pkNcMa-299nZujSz3vrRgNAhvCia1BWl57hHX5GbJHWL4nlwa1S5MsZ9fKmQgCpQg-KzKF5WJIaXovBpi2-m4VwzUAY5a4KgZjjrCUQcTLqY0i_vGxb_of1w_pDBntg</recordid><startdate>20241101</startdate><enddate>20241101</enddate><creator>Lu, Zixiang</creator><creator>He, Zhitong</creator><creator>Hong, Jiale</creator><creator>Gao, Ping</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-2743-2017</orcidid></search><sort><creationdate>20241101</creationdate><title>Cospeech body motion generation using a transformer</title><author>Lu, Zixiang ; He, Zhitong ; Hong, Jiale ; Gao, Ping</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c200t-b67185c8a36ce2e93dbdf8c34b13baffafd104312ff8d0ca06fc517ad07d42f33</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial Intelligence</topic><topic>Avatars</topic><topic>Coders</topic><topic>Communication</topic><topic>Computer Science</topic><topic>Language</topic><topic>Machines</topic><topic>Manufacturing</topic><topic>Mechanical Engineering</topic><topic>Neural networks</topic><topic>Processes</topic><topic>Robot dynamics</topic><topic>Speech</topic><topic>State-of-the-art reviews</topic><topic>Transformers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lu, Zixiang</creatorcontrib><creatorcontrib>He, Zhitong</creatorcontrib><creatorcontrib>Hong, Jiale</creatorcontrib><creatorcontrib>Gao, Ping</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Applied intelligence (Dordrecht, Netherlands)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lu, Zixiang</au><au>He, Zhitong</au><au>Hong, Jiale</au><au>Gao, Ping</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Cospeech body motion generation using a transformer</atitle><jtitle>Applied intelligence (Dordrecht, Netherlands)</jtitle><stitle>Appl Intell</stitle><date>2024-11-01</date><risdate>2024</risdate><volume>54</volume><issue>22</issue><spage>11525</spage><epage>11535</epage><pages>11525-11535</pages><issn>0924-669X</issn><eissn>1573-7497</eissn><abstract>Body language is a method for communicating across languages and cultures. Making good use of body motions in speech can enhance persuasiveness, improve personal charisma, and make speech more effective. Generating matching body motions for digital avatars and social robots based on content has become an important topic. In this paper, we propose a transformer-based network model to generate body motions from input speech. Our model includes an audio transformer encoder, motion transformer encoder, template variational autoencoder, cross-modal transformer encoder, and motion decoder. Additionally, we propose a novel evaluation metric for describing motion change trends in terms of distance. The experimental results show that the proposed model provides higher-quality motion generation results than state-of-the-art models. As indicated by visual skeleton motions, our results are more natural and realistic than those of other methods. Additionally, the generated motions yield superior results in terms of multiple evaluation metrics.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10489-024-05769-4</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0003-2743-2017</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0924-669X |
ispartof | Applied intelligence (Dordrecht, Netherlands), 2024-11, Vol.54 (22), p.11525-11535 |
issn | 0924-669X 1573-7497 |
language | eng |
recordid | cdi_proquest_journals_3106537064 |
source | SpringerLink Journals - AutoHoldings |
subjects | Artificial Intelligence Avatars Coders Communication Computer Science Language Machines Manufacturing Mechanical Engineering Neural networks Processes Robot dynamics Speech State-of-the-art reviews Transformers |
title | Cospeech body motion generation using a transformer |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T04%3A38%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Cospeech%20body%20motion%20generation%20using%20a%20transformer&rft.jtitle=Applied%20intelligence%20(Dordrecht,%20Netherlands)&rft.au=Lu,%20Zixiang&rft.date=2024-11-01&rft.volume=54&rft.issue=22&rft.spage=11525&rft.epage=11535&rft.pages=11525-11535&rft.issn=0924-669X&rft.eissn=1573-7497&rft_id=info:doi/10.1007/s10489-024-05769-4&rft_dat=%3Cproquest_cross%3E3106537064%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3106537064&rft_id=info:pmid/&rfr_iscdi=true |