EvoVis: A Visual Analytics Method to Understand the Labeling Iterations in Data Programming

Obtaining high-quality labeled training data poses a significant bottleneck in the domain of machine learning. Data programming has emerged as a new paradigm to address this issue by converting human knowledge into labeling functions(LFs) to quickly produce low-cost probabilistic labels. To ensure t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on visualization and computer graphics 2024-02, Vol.PP, p.1-16
Hauptverfasser:	Li, Sisi, Liu, Guanzhong, Wei, Tianxiang, Jia, Shichao, Zhang, Jiawan
Format:	Artikel
Sprache:	eng
Schlagworte:	Analytical models data labeling Data models data programming Data visualization Labeling model interpretation Programming Task analysis Visual analytics
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	16
container_issue
container_start_page	1
container_title	IEEE transactions on visualization and computer graphics
container_volume	PP
creator	Li, Sisi Liu, Guanzhong Wei, Tianxiang Jia, Shichao Zhang, Jiawan
description	Obtaining high-quality labeled training data poses a significant bottleneck in the domain of machine learning. Data programming has emerged as a new paradigm to address this issue by converting human knowledge into labeling functions(LFs) to quickly produce low-cost probabilistic labels. To ensure the quality of labeled data, data programmers commonly iterate LFs for many rounds until satisfactory performance is achieved. However, the challenge in understanding the labeling iterations stems from interpreting the intricate relationships between data programming elements, exacerbated by their many-to-many and directed characteristics, inconsistent formats, and the large scale of data typically involved in labeling tasks. These complexities may impede the evaluation of label quality, identification of areas for improvement, and the effective optimization of LFs for acquiring high-quality labeled data. In this paper, we introduce EvoVis, a visual analytics method for multi-class text labeling tasks. It seamlessly integrates relationship analysis and temporal overview to display contextual and historical information on a single screen, aiding in explaining the labeling iterations in data programming. We assessed its utility and effectiveness through case studies and user studies. The results indicate that EvoVis can effectively assist data programmers in understanding labeling iterations and improving the quality of labeled data, as evidenced by an increase of 0.16 in the average F1 score when compared to the default analysis tool.
doi_str_mv	10.1109/TVCG.2024.3370654
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_pubmed_primary_38416617</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10452847</ieee_id><sourcerecordid>2933465904</sourcerecordid><originalsourceid>FETCH-LOGICAL-c189t-975e04f0bbac3a1cac854d2c21f4cbce4b4b8c1bd37db15d02cfb63d764c26ed3</originalsourceid><addsrcrecordid>eNpNkMtOwkAUhidGI3h5ABNjZummOLdOW3cEEUkwugA2Lpq5nEJNLzgzmPD2loDG1Tkn5_v_xYfQDSUDSkn2MF-OJgNGmBhwnhAZixPUp5mgEYmJPO12kiQRk0z20IX3n4RQIdLsHPV4KqiUNOmjj_F3uyz9Ix7ibmxVhYeNqnahNB6_Qli3FocWLxoLzgfVdNca8ExpqMpmhacBnApl23hcNvhJBYXfXbtyqq679xU6K1Tl4fo4L9HieTwfvUSzt8l0NJxFhqZZiLIkBiIKorUyXFGjTBoLywyjhTDagNBCp4ZqyxOraWwJM4WW3CZSGCbB8kt0f-jduPZrCz7kdekNVJVqoN36nGWcCxlnRHQoPaDGtd47KPKNK2vldjkl-d5pvnea753mR6dd5u5Yv9U12L_Er8QOuD0AJQD8KxQxS0XCfwB-jHvr</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2933465904</pqid></control><display><type>article</type><title>EvoVis: A Visual Analytics Method to Understand the Labeling Iterations in Data Programming</title><source>IEEE Electronic Library (IEL)</source><creator>Li, Sisi ; Liu, Guanzhong ; Wei, Tianxiang ; Jia, Shichao ; Zhang, Jiawan</creator><creatorcontrib>Li, Sisi ; Liu, Guanzhong ; Wei, Tianxiang ; Jia, Shichao ; Zhang, Jiawan</creatorcontrib><description>Obtaining high-quality labeled training data poses a significant bottleneck in the domain of machine learning. Data programming has emerged as a new paradigm to address this issue by converting human knowledge into labeling functions(LFs) to quickly produce low-cost probabilistic labels. To ensure the quality of labeled data, data programmers commonly iterate LFs for many rounds until satisfactory performance is achieved. However, the challenge in understanding the labeling iterations stems from interpreting the intricate relationships between data programming elements, exacerbated by their many-to-many and directed characteristics, inconsistent formats, and the large scale of data typically involved in labeling tasks. These complexities may impede the evaluation of label quality, identification of areas for improvement, and the effective optimization of LFs for acquiring high-quality labeled data. In this paper, we introduce EvoVis, a visual analytics method for multi-class text labeling tasks. It seamlessly integrates relationship analysis and temporal overview to display contextual and historical information on a single screen, aiding in explaining the labeling iterations in data programming. We assessed its utility and effectiveness through case studies and user studies. The results indicate that EvoVis can effectively assist data programmers in understanding labeling iterations and improving the quality of labeled data, as evidenced by an increase of 0.16 in the average F1 score when compared to the default analysis tool.</description><identifier>ISSN: 1077-2626</identifier><identifier>EISSN: 1941-0506</identifier><identifier>DOI: 10.1109/TVCG.2024.3370654</identifier><identifier>PMID: 38416617</identifier><identifier>CODEN: ITVGEA</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Analytical models ; data labeling ; Data models ; data programming ; Data visualization ; Labeling ; model interpretation ; Programming ; Task analysis ; Visual analytics</subject><ispartof>IEEE transactions on visualization and computer graphics, 2024-02, Vol.PP, p.1-16</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0001-6876-9180 ; 0000-0002-0667-6744 ; 0009-0002-7678-4094 ; 0009-0007-3466-4576 ; 0000-0001-6399-176X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10452847$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27923,27924,54757</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10452847$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38416617$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Li, Sisi</creatorcontrib><creatorcontrib>Liu, Guanzhong</creatorcontrib><creatorcontrib>Wei, Tianxiang</creatorcontrib><creatorcontrib>Jia, Shichao</creatorcontrib><creatorcontrib>Zhang, Jiawan</creatorcontrib><title>EvoVis: A Visual Analytics Method to Understand the Labeling Iterations in Data Programming</title><title>IEEE transactions on visualization and computer graphics</title><addtitle>TVCG</addtitle><addtitle>IEEE Trans Vis Comput Graph</addtitle><description>Obtaining high-quality labeled training data poses a significant bottleneck in the domain of machine learning. Data programming has emerged as a new paradigm to address this issue by converting human knowledge into labeling functions(LFs) to quickly produce low-cost probabilistic labels. To ensure the quality of labeled data, data programmers commonly iterate LFs for many rounds until satisfactory performance is achieved. However, the challenge in understanding the labeling iterations stems from interpreting the intricate relationships between data programming elements, exacerbated by their many-to-many and directed characteristics, inconsistent formats, and the large scale of data typically involved in labeling tasks. These complexities may impede the evaluation of label quality, identification of areas for improvement, and the effective optimization of LFs for acquiring high-quality labeled data. In this paper, we introduce EvoVis, a visual analytics method for multi-class text labeling tasks. It seamlessly integrates relationship analysis and temporal overview to display contextual and historical information on a single screen, aiding in explaining the labeling iterations in data programming. We assessed its utility and effectiveness through case studies and user studies. The results indicate that EvoVis can effectively assist data programmers in understanding labeling iterations and improving the quality of labeled data, as evidenced by an increase of 0.16 in the average F1 score when compared to the default analysis tool.</description><subject>Analytical models</subject><subject>data labeling</subject><subject>Data models</subject><subject>data programming</subject><subject>Data visualization</subject><subject>Labeling</subject><subject>model interpretation</subject><subject>Programming</subject><subject>Task analysis</subject><subject>Visual analytics</subject><issn>1077-2626</issn><issn>1941-0506</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkMtOwkAUhidGI3h5ABNjZummOLdOW3cEEUkwugA2Lpq5nEJNLzgzmPD2loDG1Tkn5_v_xYfQDSUDSkn2MF-OJgNGmBhwnhAZixPUp5mgEYmJPO12kiQRk0z20IX3n4RQIdLsHPV4KqiUNOmjj_F3uyz9Ix7ibmxVhYeNqnahNB6_Qli3FocWLxoLzgfVdNca8ExpqMpmhacBnApl23hcNvhJBYXfXbtyqq679xU6K1Tl4fo4L9HieTwfvUSzt8l0NJxFhqZZiLIkBiIKorUyXFGjTBoLywyjhTDagNBCp4ZqyxOraWwJM4WW3CZSGCbB8kt0f-jduPZrCz7kdekNVJVqoN36nGWcCxlnRHQoPaDGtd47KPKNK2vldjkl-d5pvnea753mR6dd5u5Yv9U12L_Er8QOuD0AJQD8KxQxS0XCfwB-jHvr</recordid><startdate>20240228</startdate><enddate>20240228</enddate><creator>Li, Sisi</creator><creator>Liu, Guanzhong</creator><creator>Wei, Tianxiang</creator><creator>Jia, Shichao</creator><creator>Zhang, Jiawan</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-6876-9180</orcidid><orcidid>https://orcid.org/0000-0002-0667-6744</orcidid><orcidid>https://orcid.org/0009-0002-7678-4094</orcidid><orcidid>https://orcid.org/0009-0007-3466-4576</orcidid><orcidid>https://orcid.org/0000-0001-6399-176X</orcidid></search><sort><creationdate>20240228</creationdate><title>EvoVis: A Visual Analytics Method to Understand the Labeling Iterations in Data Programming</title><author>Li, Sisi ; Liu, Guanzhong ; Wei, Tianxiang ; Jia, Shichao ; Zhang, Jiawan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c189t-975e04f0bbac3a1cac854d2c21f4cbce4b4b8c1bd37db15d02cfb63d764c26ed3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Analytical models</topic><topic>data labeling</topic><topic>Data models</topic><topic>data programming</topic><topic>Data visualization</topic><topic>Labeling</topic><topic>model interpretation</topic><topic>Programming</topic><topic>Task analysis</topic><topic>Visual analytics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Sisi</creatorcontrib><creatorcontrib>Liu, Guanzhong</creatorcontrib><creatorcontrib>Wei, Tianxiang</creatorcontrib><creatorcontrib>Jia, Shichao</creatorcontrib><creatorcontrib>Zhang, Jiawan</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on visualization and computer graphics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Li, Sisi</au><au>Liu, Guanzhong</au><au>Wei, Tianxiang</au><au>Jia, Shichao</au><au>Zhang, Jiawan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>EvoVis: A Visual Analytics Method to Understand the Labeling Iterations in Data Programming</atitle><jtitle>IEEE transactions on visualization and computer graphics</jtitle><stitle>TVCG</stitle><addtitle>IEEE Trans Vis Comput Graph</addtitle><date>2024-02-28</date><risdate>2024</risdate><volume>PP</volume><spage>1</spage><epage>16</epage><pages>1-16</pages><issn>1077-2626</issn><eissn>1941-0506</eissn><coden>ITVGEA</coden><abstract>Obtaining high-quality labeled training data poses a significant bottleneck in the domain of machine learning. Data programming has emerged as a new paradigm to address this issue by converting human knowledge into labeling functions(LFs) to quickly produce low-cost probabilistic labels. To ensure the quality of labeled data, data programmers commonly iterate LFs for many rounds until satisfactory performance is achieved. However, the challenge in understanding the labeling iterations stems from interpreting the intricate relationships between data programming elements, exacerbated by their many-to-many and directed characteristics, inconsistent formats, and the large scale of data typically involved in labeling tasks. These complexities may impede the evaluation of label quality, identification of areas for improvement, and the effective optimization of LFs for acquiring high-quality labeled data. In this paper, we introduce EvoVis, a visual analytics method for multi-class text labeling tasks. It seamlessly integrates relationship analysis and temporal overview to display contextual and historical information on a single screen, aiding in explaining the labeling iterations in data programming. We assessed its utility and effectiveness through case studies and user studies. The results indicate that EvoVis can effectively assist data programmers in understanding labeling iterations and improving the quality of labeled data, as evidenced by an increase of 0.16 in the average F1 score when compared to the default analysis tool.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>38416617</pmid><doi>10.1109/TVCG.2024.3370654</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0001-6876-9180</orcidid><orcidid>https://orcid.org/0000-0002-0667-6744</orcidid><orcidid>https://orcid.org/0009-0002-7678-4094</orcidid><orcidid>https://orcid.org/0009-0007-3466-4576</orcidid><orcidid>https://orcid.org/0000-0001-6399-176X</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1077-2626
ispartof	IEEE transactions on visualization and computer graphics, 2024-02, Vol.PP, p.1-16
issn	1077-2626 1941-0506
language	eng
recordid	cdi_pubmed_primary_38416617
source	IEEE Electronic Library (IEL)
subjects	Analytical models data labeling Data models data programming Data visualization Labeling model interpretation Programming Task analysis Visual analytics
title	EvoVis: A Visual Analytics Method to Understand the Labeling Iterations in Data Programming
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T11%3A09%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=EvoVis:%20A%20Visual%20Analytics%20Method%20to%20Understand%20the%20Labeling%20Iterations%20in%20Data%20Programming&rft.jtitle=IEEE%20transactions%20on%20visualization%20and%20computer%20graphics&rft.au=Li,%20Sisi&rft.date=2024-02-28&rft.volume=PP&rft.spage=1&rft.epage=16&rft.pages=1-16&rft.issn=1077-2626&rft.eissn=1941-0506&rft.coden=ITVGEA&rft_id=info:doi/10.1109/TVCG.2024.3370654&rft_dat=%3Cproquest_RIE%3E2933465904%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2933465904&rft_id=info:pmid/38416617&rft_ieee_id=10452847&rfr_iscdi=true