Interaction2Code: How Far Are We From Automatic Interactive Webpage Generation?

Converting webpage design into functional UI code is a critical step for building websites, which can be labor-intensive and time-consuming. To automate this design-to-code transformation process, various automated methods using learning-based networks and multi-modal large language models (MLLMs) h...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Xiao, Jingyu, Wan, Yuxuan, Huo, Yintong, Xu, Zhiyao, Lyu, Michael R
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Xiao, Jingyu
Wan, Yuxuan
Huo, Yintong
Xu, Zhiyao
Lyu, Michael R
description Converting webpage design into functional UI code is a critical step for building websites, which can be labor-intensive and time-consuming. To automate this design-to-code transformation process, various automated methods using learning-based networks and multi-modal large language models (MLLMs) have been proposed. However, these studies were merely evaluated on a narrow range of static web pages and ignored dynamic interaction elements, making them less practical for real-world website deployment. To fill in the blank, we present the first systematic investigation of MLLMs in generating interactive webpages. Specifically, we first formulate the Interaction-to-Code task and build the Interaction2Code benchmark that contains 97 unique web pages and 213 distinct interactions, spanning 15 webpage types and 30 interaction categories. We then conduct comprehensive experiments on three state-of-the-art (SOTA) MLLMs using both automatic metrics and human evaluations, thereby summarizing six findings accordingly. Our experimental results highlight the limitations of MLLMs in generating fine-grained interactive features and managing interactions with complex transformations and subtle visual modifications. We further analyze failure cases and their underlying causes, identifying 10 common failure types and assessing their severity. Additionally, our findings reveal three critical influencing factors, i.e., prompts, visual saliency, and textual descriptions, that can enhance the interaction generation performance of MLLMs. Based on these findings, we elicit implications for researchers and developers, providing a foundation for future advancements in this field. Datasets and source code are available at https://github.com/WebPAI/Interaction2Code.
doi_str_mv 10.48550/arxiv.2411.03292
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2411_03292</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2411_03292</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2411_032923</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE01DMwNrI04mTw98wrSS1KTC7JzM8zcs5PSbVS8MgvV3BLLFJwLEpVCE9VcCvKz1VwLC3Jz00syUxWgKsvA8kmFSSmpyq4p-YBxUBG2PMwsKYl5hSn8kJpbgZ5N9cQZw9dsNXxBUWZuYlFlfEgJ8SDnWBMWAUAKpE7LA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Interaction2Code: How Far Are We From Automatic Interactive Webpage Generation?</title><source>arXiv.org</source><creator>Xiao, Jingyu ; Wan, Yuxuan ; Huo, Yintong ; Xu, Zhiyao ; Lyu, Michael R</creator><creatorcontrib>Xiao, Jingyu ; Wan, Yuxuan ; Huo, Yintong ; Xu, Zhiyao ; Lyu, Michael R</creatorcontrib><description>Converting webpage design into functional UI code is a critical step for building websites, which can be labor-intensive and time-consuming. To automate this design-to-code transformation process, various automated methods using learning-based networks and multi-modal large language models (MLLMs) have been proposed. However, these studies were merely evaluated on a narrow range of static web pages and ignored dynamic interaction elements, making them less practical for real-world website deployment. To fill in the blank, we present the first systematic investigation of MLLMs in generating interactive webpages. Specifically, we first formulate the Interaction-to-Code task and build the Interaction2Code benchmark that contains 97 unique web pages and 213 distinct interactions, spanning 15 webpage types and 30 interaction categories. We then conduct comprehensive experiments on three state-of-the-art (SOTA) MLLMs using both automatic metrics and human evaluations, thereby summarizing six findings accordingly. Our experimental results highlight the limitations of MLLMs in generating fine-grained interactive features and managing interactions with complex transformations and subtle visual modifications. We further analyze failure cases and their underlying causes, identifying 10 common failure types and assessing their severity. Additionally, our findings reveal three critical influencing factors, i.e., prompts, visual saliency, and textual descriptions, that can enhance the interaction generation performance of MLLMs. Based on these findings, we elicit implications for researchers and developers, providing a foundation for future advancements in this field. Datasets and source code are available at https://github.com/WebPAI/Interaction2Code.</description><identifier>DOI: 10.48550/arxiv.2411.03292</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Human-Computer Interaction ; Computer Science - Software Engineering</subject><creationdate>2024-11</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2411.03292$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2411.03292$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Xiao, Jingyu</creatorcontrib><creatorcontrib>Wan, Yuxuan</creatorcontrib><creatorcontrib>Huo, Yintong</creatorcontrib><creatorcontrib>Xu, Zhiyao</creatorcontrib><creatorcontrib>Lyu, Michael R</creatorcontrib><title>Interaction2Code: How Far Are We From Automatic Interactive Webpage Generation?</title><description>Converting webpage design into functional UI code is a critical step for building websites, which can be labor-intensive and time-consuming. To automate this design-to-code transformation process, various automated methods using learning-based networks and multi-modal large language models (MLLMs) have been proposed. However, these studies were merely evaluated on a narrow range of static web pages and ignored dynamic interaction elements, making them less practical for real-world website deployment. To fill in the blank, we present the first systematic investigation of MLLMs in generating interactive webpages. Specifically, we first formulate the Interaction-to-Code task and build the Interaction2Code benchmark that contains 97 unique web pages and 213 distinct interactions, spanning 15 webpage types and 30 interaction categories. We then conduct comprehensive experiments on three state-of-the-art (SOTA) MLLMs using both automatic metrics and human evaluations, thereby summarizing six findings accordingly. Our experimental results highlight the limitations of MLLMs in generating fine-grained interactive features and managing interactions with complex transformations and subtle visual modifications. We further analyze failure cases and their underlying causes, identifying 10 common failure types and assessing their severity. Additionally, our findings reveal three critical influencing factors, i.e., prompts, visual saliency, and textual descriptions, that can enhance the interaction generation performance of MLLMs. Based on these findings, we elicit implications for researchers and developers, providing a foundation for future advancements in this field. Datasets and source code are available at https://github.com/WebPAI/Interaction2Code.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Human-Computer Interaction</subject><subject>Computer Science - Software Engineering</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE01DMwNrI04mTw98wrSS1KTC7JzM8zcs5PSbVS8MgvV3BLLFJwLEpVCE9VcCvKz1VwLC3Jz00syUxWgKsvA8kmFSSmpyq4p-YBxUBG2PMwsKYl5hSn8kJpbgZ5N9cQZw9dsNXxBUWZuYlFlfEgJ8SDnWBMWAUAKpE7LA</recordid><startdate>20241105</startdate><enddate>20241105</enddate><creator>Xiao, Jingyu</creator><creator>Wan, Yuxuan</creator><creator>Huo, Yintong</creator><creator>Xu, Zhiyao</creator><creator>Lyu, Michael R</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241105</creationdate><title>Interaction2Code: How Far Are We From Automatic Interactive Webpage Generation?</title><author>Xiao, Jingyu ; Wan, Yuxuan ; Huo, Yintong ; Xu, Zhiyao ; Lyu, Michael R</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2411_032923</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Human-Computer Interaction</topic><topic>Computer Science - Software Engineering</topic><toplevel>online_resources</toplevel><creatorcontrib>Xiao, Jingyu</creatorcontrib><creatorcontrib>Wan, Yuxuan</creatorcontrib><creatorcontrib>Huo, Yintong</creatorcontrib><creatorcontrib>Xu, Zhiyao</creatorcontrib><creatorcontrib>Lyu, Michael R</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Xiao, Jingyu</au><au>Wan, Yuxuan</au><au>Huo, Yintong</au><au>Xu, Zhiyao</au><au>Lyu, Michael R</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Interaction2Code: How Far Are We From Automatic Interactive Webpage Generation?</atitle><date>2024-11-05</date><risdate>2024</risdate><abstract>Converting webpage design into functional UI code is a critical step for building websites, which can be labor-intensive and time-consuming. To automate this design-to-code transformation process, various automated methods using learning-based networks and multi-modal large language models (MLLMs) have been proposed. However, these studies were merely evaluated on a narrow range of static web pages and ignored dynamic interaction elements, making them less practical for real-world website deployment. To fill in the blank, we present the first systematic investigation of MLLMs in generating interactive webpages. Specifically, we first formulate the Interaction-to-Code task and build the Interaction2Code benchmark that contains 97 unique web pages and 213 distinct interactions, spanning 15 webpage types and 30 interaction categories. We then conduct comprehensive experiments on three state-of-the-art (SOTA) MLLMs using both automatic metrics and human evaluations, thereby summarizing six findings accordingly. Our experimental results highlight the limitations of MLLMs in generating fine-grained interactive features and managing interactions with complex transformations and subtle visual modifications. We further analyze failure cases and their underlying causes, identifying 10 common failure types and assessing their severity. Additionally, our findings reveal three critical influencing factors, i.e., prompts, visual saliency, and textual descriptions, that can enhance the interaction generation performance of MLLMs. Based on these findings, we elicit implications for researchers and developers, providing a foundation for future advancements in this field. Datasets and source code are available at https://github.com/WebPAI/Interaction2Code.</abstract><doi>10.48550/arxiv.2411.03292</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2411.03292
ispartof
issn
language eng
recordid cdi_arxiv_primary_2411_03292
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Human-Computer Interaction
Computer Science - Software Engineering
title Interaction2Code: How Far Are We From Automatic Interactive Webpage Generation?
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T10%3A17%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Interaction2Code:%20How%20Far%20Are%20We%20From%20Automatic%20Interactive%20Webpage%20Generation?&rft.au=Xiao,%20Jingyu&rft.date=2024-11-05&rft_id=info:doi/10.48550/arxiv.2411.03292&rft_dat=%3Carxiv_GOX%3E2411_03292%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true