Automatic Generation of Photorealistic 3D Inner Mouth Animation only from Frontal Images

In this paper, we propose a novel method to generate highly photorealistic three-dimensional (3D) inner mouth animation that is well-fitted to an original ready-made speech animation using only frontal captured images and small-size databases. The algorithms are composed of quasi-3D model reconstruc...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of Information Processing 2015, Vol.23(5), pp.693-703
Hauptverfasser:	Kawai, Masahide, Iwao, Tomoyori, Maejima, Akinobu, Morishima, Shigeo
Format:	Artikel
Sprache:	eng
Schlagworte:	Animation inner mouth Mouth Multi-view Detai-lization phoneme combination Photorealistic skull bone Speech speech animation Three dimensional Tongue
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	703
container_issue	5
container_start_page	693
container_title	Journal of Information Processing
container_volume	23
creator	Kawai, Masahide Iwao, Tomoyori Maejima, Akinobu Morishima, Shigeo
description	In this paper, we propose a novel method to generate highly photorealistic three-dimensional (3D) inner mouth animation that is well-fitted to an original ready-made speech animation using only frontal captured images and small-size databases. The algorithms are composed of quasi-3D model reconstruction and motion control of teeth and the tongue, and final compositing of photorealistic speech animation synthesis tailored to the original. In general, producing a satisfactory photorealistic appearance of the inner mouth that is synchronized with mouth movement is a very complicated and time-consuming task. This is because the tongue and mouth are too flexible and delicate to be modeled with the large number of meshes required. Therefore, in some cases, this process is omitted or replaced with a very simple generic model. Our proposed method, on the other hand, can automatically generate 3D inner mouth appearances by improving photorealism with only three inputs: an original tailor-made lip-sync animation, a single image of the speaker's teeth, and a syllabic decomposition of the desired speech. The key idea of our proposed method is to combine 3D reconstruction and simulation with two-dimensional (2D) image processing using only the above three inputs, as well as a tongue database and mouth database. The satisfactory performance of our proposed method is illustrated by the significant improvement in picture quality of several tailor-made animations to a degree nearly equivalent to that of camera-captured videos.
doi_str_mv	10.2197/ipsjjip.23.693
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1762115171</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1762115171</sourcerecordid><originalsourceid>FETCH-LOGICAL-c4073-c274d934054720b8ff5644b50690345e41426ace74ac74661e9b52b75f608da33</originalsourceid><addsrcrecordid>eNpNkL1PwzAQxS0EEqWwMntkSfG3kzEqtFQqggEkNstxHZoosYOdDP3vSdWo6nQnvd873XsAPGK0IDiTz1UX67rqFoQuREavwAynKUmE4OT6Yr8FdzHWCIkMcTQDP_nQ-1b3lYFr62wYN--gL-Hn3vc-WN1U8SjSF7hxow7f_dDvYe6qdkJdc4Bl8C1cBe963cBNq39tvAc3pW6ifZjmHHyvXr-Wb8n2Y71Z5tvEMCRpYohku4wyxJkkqEjLkgvGCn78jzJuGWZEaGMl00YyIbDNCk4KyUuB0p2mdA6eTne74P8GG3vVVtHYptHO-iEqLAXBmGOJR3RxQk3wMQZbqi6MMcJBYaSOFaqpQkWoGiscDfnJUMd-zHTGdRgraewlzifPWTN7HZR19B-9WH1d</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1762115171</pqid></control><display><type>article</type><title>Automatic Generation of Photorealistic 3D Inner Mouth Animation only from Frontal Images</title><source>J-STAGE (Japan Science & Technology Information Aggregator, Electronic) Freely Available Titles - Japanese</source><creator>Kawai, Masahide ; Iwao, Tomoyori ; Maejima, Akinobu ; Morishima, Shigeo</creator><creatorcontrib>Kawai, Masahide ; Iwao, Tomoyori ; Maejima, Akinobu ; Morishima, Shigeo</creatorcontrib><description>In this paper, we propose a novel method to generate highly photorealistic three-dimensional (3D) inner mouth animation that is well-fitted to an original ready-made speech animation using only frontal captured images and small-size databases. The algorithms are composed of quasi-3D model reconstruction and motion control of teeth and the tongue, and final compositing of photorealistic speech animation synthesis tailored to the original. In general, producing a satisfactory photorealistic appearance of the inner mouth that is synchronized with mouth movement is a very complicated and time-consuming task. This is because the tongue and mouth are too flexible and delicate to be modeled with the large number of meshes required. Therefore, in some cases, this process is omitted or replaced with a very simple generic model. Our proposed method, on the other hand, can automatically generate 3D inner mouth appearances by improving photorealism with only three inputs: an original tailor-made lip-sync animation, a single image of the speaker's teeth, and a syllabic decomposition of the desired speech. The key idea of our proposed method is to combine 3D reconstruction and simulation with two-dimensional (2D) image processing using only the above three inputs, as well as a tongue database and mouth database. The satisfactory performance of our proposed method is illustrated by the significant improvement in picture quality of several tailor-made animations to a degree nearly equivalent to that of camera-captured videos.</description><identifier>ISSN: 1882-6652</identifier><identifier>EISSN: 1882-6652</identifier><identifier>DOI: 10.2197/ipsjjip.23.693</identifier><language>eng</language><publisher>Information Processing Society of Japan</publisher><subject>Animation ; inner mouth ; Mouth ; Multi-view Detai-lization ; phoneme combination ; Photorealistic ; skull bone ; Speech ; speech animation ; Three dimensional ; Tongue</subject><ispartof>Journal of Information Processing, 2015, Vol.23(5), pp.693-703</ispartof><rights>2015 by the Information Processing Society of Japan</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c4073-c274d934054720b8ff5644b50690345e41426ace74ac74661e9b52b75f608da33</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,1876,4009,27902,27903,27904</link.rule.ids></links><search><creatorcontrib>Kawai, Masahide</creatorcontrib><creatorcontrib>Iwao, Tomoyori</creatorcontrib><creatorcontrib>Maejima, Akinobu</creatorcontrib><creatorcontrib>Morishima, Shigeo</creatorcontrib><title>Automatic Generation of Photorealistic 3D Inner Mouth Animation only from Frontal Images</title><title>Journal of Information Processing</title><addtitle>Journal of Information Processing</addtitle><description>In this paper, we propose a novel method to generate highly photorealistic three-dimensional (3D) inner mouth animation that is well-fitted to an original ready-made speech animation using only frontal captured images and small-size databases. The algorithms are composed of quasi-3D model reconstruction and motion control of teeth and the tongue, and final compositing of photorealistic speech animation synthesis tailored to the original. In general, producing a satisfactory photorealistic appearance of the inner mouth that is synchronized with mouth movement is a very complicated and time-consuming task. This is because the tongue and mouth are too flexible and delicate to be modeled with the large number of meshes required. Therefore, in some cases, this process is omitted or replaced with a very simple generic model. Our proposed method, on the other hand, can automatically generate 3D inner mouth appearances by improving photorealism with only three inputs: an original tailor-made lip-sync animation, a single image of the speaker's teeth, and a syllabic decomposition of the desired speech. The key idea of our proposed method is to combine 3D reconstruction and simulation with two-dimensional (2D) image processing using only the above three inputs, as well as a tongue database and mouth database. The satisfactory performance of our proposed method is illustrated by the significant improvement in picture quality of several tailor-made animations to a degree nearly equivalent to that of camera-captured videos.</description><subject>Animation</subject><subject>inner mouth</subject><subject>Mouth</subject><subject>Multi-view Detai-lization</subject><subject>phoneme combination</subject><subject>Photorealistic</subject><subject>skull bone</subject><subject>Speech</subject><subject>speech animation</subject><subject>Three dimensional</subject><subject>Tongue</subject><issn>1882-6652</issn><issn>1882-6652</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><recordid>eNpNkL1PwzAQxS0EEqWwMntkSfG3kzEqtFQqggEkNstxHZoosYOdDP3vSdWo6nQnvd873XsAPGK0IDiTz1UX67rqFoQuREavwAynKUmE4OT6Yr8FdzHWCIkMcTQDP_nQ-1b3lYFr62wYN--gL-Hn3vc-WN1U8SjSF7hxow7f_dDvYe6qdkJdc4Bl8C1cBe963cBNq39tvAc3pW6ifZjmHHyvXr-Wb8n2Y71Z5tvEMCRpYohku4wyxJkkqEjLkgvGCn78jzJuGWZEaGMl00YyIbDNCk4KyUuB0p2mdA6eTne74P8GG3vVVtHYptHO-iEqLAXBmGOJR3RxQk3wMQZbqi6MMcJBYaSOFaqpQkWoGiscDfnJUMd-zHTGdRgraewlzifPWTN7HZR19B-9WH1d</recordid><startdate>2015</startdate><enddate>2015</enddate><creator>Kawai, Masahide</creator><creator>Iwao, Tomoyori</creator><creator>Maejima, Akinobu</creator><creator>Morishima, Shigeo</creator><general>Information Processing Society of Japan</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>2015</creationdate><title>Automatic Generation of Photorealistic 3D Inner Mouth Animation only from Frontal Images</title><author>Kawai, Masahide ; Iwao, Tomoyori ; Maejima, Akinobu ; Morishima, Shigeo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c4073-c274d934054720b8ff5644b50690345e41426ace74ac74661e9b52b75f608da33</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Animation</topic><topic>inner mouth</topic><topic>Mouth</topic><topic>Multi-view Detai-lization</topic><topic>phoneme combination</topic><topic>Photorealistic</topic><topic>skull bone</topic><topic>Speech</topic><topic>speech animation</topic><topic>Three dimensional</topic><topic>Tongue</topic><toplevel>online_resources</toplevel><creatorcontrib>Kawai, Masahide</creatorcontrib><creatorcontrib>Iwao, Tomoyori</creatorcontrib><creatorcontrib>Maejima, Akinobu</creatorcontrib><creatorcontrib>Morishima, Shigeo</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of Information Processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kawai, Masahide</au><au>Iwao, Tomoyori</au><au>Maejima, Akinobu</au><au>Morishima, Shigeo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Automatic Generation of Photorealistic 3D Inner Mouth Animation only from Frontal Images</atitle><jtitle>Journal of Information Processing</jtitle><addtitle>Journal of Information Processing</addtitle><date>2015</date><risdate>2015</risdate><volume>23</volume><issue>5</issue><spage>693</spage><epage>703</epage><pages>693-703</pages><issn>1882-6652</issn><eissn>1882-6652</eissn><abstract>In this paper, we propose a novel method to generate highly photorealistic three-dimensional (3D) inner mouth animation that is well-fitted to an original ready-made speech animation using only frontal captured images and small-size databases. The algorithms are composed of quasi-3D model reconstruction and motion control of teeth and the tongue, and final compositing of photorealistic speech animation synthesis tailored to the original. In general, producing a satisfactory photorealistic appearance of the inner mouth that is synchronized with mouth movement is a very complicated and time-consuming task. This is because the tongue and mouth are too flexible and delicate to be modeled with the large number of meshes required. Therefore, in some cases, this process is omitted or replaced with a very simple generic model. Our proposed method, on the other hand, can automatically generate 3D inner mouth appearances by improving photorealism with only three inputs: an original tailor-made lip-sync animation, a single image of the speaker's teeth, and a syllabic decomposition of the desired speech. The key idea of our proposed method is to combine 3D reconstruction and simulation with two-dimensional (2D) image processing using only the above three inputs, as well as a tongue database and mouth database. The satisfactory performance of our proposed method is illustrated by the significant improvement in picture quality of several tailor-made animations to a degree nearly equivalent to that of camera-captured videos.</abstract><pub>Information Processing Society of Japan</pub><doi>10.2197/ipsjjip.23.693</doi><tpages>11</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1882-6652
ispartof	Journal of Information Processing, 2015, Vol.23(5), pp.693-703
issn	1882-6652 1882-6652
language	eng
recordid	cdi_proquest_miscellaneous_1762115171
source	J-STAGE (Japan Science & Technology Information Aggregator, Electronic) Freely Available Titles - Japanese
subjects	Animation inner mouth Mouth Multi-view Detai-lization phoneme combination Photorealistic skull bone Speech speech animation Three dimensional Tongue
title	Automatic Generation of Photorealistic 3D Inner Mouth Animation only from Frontal Images
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T17%3A21%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Automatic%20Generation%20of%20Photorealistic%203D%20Inner%20Mouth%20Animation%20only%20from%20Frontal%20Images&rft.jtitle=Journal%20of%20Information%20Processing&rft.au=Kawai,%20Masahide&rft.date=2015&rft.volume=23&rft.issue=5&rft.spage=693&rft.epage=703&rft.pages=693-703&rft.issn=1882-6652&rft.eissn=1882-6652&rft_id=info:doi/10.2197/ipsjjip.23.693&rft_dat=%3Cproquest_cross%3E1762115171%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1762115171&rft_id=info:pmid/&rfr_iscdi=true