A Hypothesis Testing-based Framework for Software Cross-modal Retrieval in Heterogeneous Semantic Spaces
Software cross-modal retrieval is a popular yet challenging direction, such as bug localization and code search. Previous studies generally map natural language texts and codes into a homogeneous semantic space for similarity measurement. However, it is not easy to accurately capture their similar s...
Gespeichert in:
Veröffentlicht in: | ACM transactions on software engineering and methodology 2023-07, Vol.32 (5), p.1-28, Article 123 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 28 |
---|---|
container_issue | 5 |
container_start_page | 1 |
container_title | ACM transactions on software engineering and methodology |
container_volume | 32 |
creator | Wei, Hongwei Su, Xiaohong Gao, Cuiyun Zheng, Weining Tao, Wenxin |
description | Software cross-modal retrieval is a popular yet challenging direction, such as bug localization and code search. Previous studies generally map natural language texts and codes into a homogeneous semantic space for similarity measurement. However, it is not easy to accurately capture their similar semantics in a homogeneous semantic space due to the semantic gap. Therefore, we propose to map the multi-modal data into heterogeneous semantic spaces to capture their unique semantics. Specifically, we propose a novel software cross-modal retrieval framework named Deep Hypothesis Testing (DeepHT). In DeepHT, to capture the unique semantics of the code’s control flow structure, all control flow paths (CFPs) in the control flow graph are mapped to a CFP sample set in the sample space. Meanwhile, the text is mapped to a CFP correlation distribution in the distribution space to model its correlation with different CFPs. The matching score is calculated according to how well the sample set obeys the distribution using hypothesis testing. The experimental results on two text-to-code retrieval tasks (i.e., bug localization and code search) and two code-to-text retrieval tasks (i.e., vulnerability knowledge retrieval and historical patch retrieval) show that DeepHT outperforms the baseline methods. |
doi_str_mv | 10.1145/3591868 |
format | Article |
fullrecord | <record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3591868</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3591868</sourcerecordid><originalsourceid>FETCH-LOGICAL-a239t-3710e72d5db43b5d2e3f7497e6c2f092e1916e30edea68f9fa673143527a89953</originalsourceid><addsrcrecordid>eNo9kEFLAzEUhIMoWKt495Sbp2iSt9lsjqVYKxQEt4K3Jd19aVe7m5JES_-9La2eZmA-hmEIuRX8QYhMPYIyosiLMzIQSmmmwcjzveeZYQDi45JcxfjJuQAuswFZjeh0t_FphbGNdI4xtf2SLWzEhk6C7XDrwxd1PtDSu7S1Aek4-BhZ5xu7pm-YQos_e9f2dIoJg19ij_470hI726e2puXG1hivyYWz64g3Jx2S98nTfDxls9fnl_FoxqwEkxhowVHLRjWLDBaqkQhOZ0ZjXkvHjURhRI7AsUGbF844m2sQGSipbWGMgiG5P_bWh5kBXbUJbWfDrhK8OhxUnQ7ak3dH0tbdP_QX_gKYamEB</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>A Hypothesis Testing-based Framework for Software Cross-modal Retrieval in Heterogeneous Semantic Spaces</title><source>ACM Digital Library</source><creator>Wei, Hongwei ; Su, Xiaohong ; Gao, Cuiyun ; Zheng, Weining ; Tao, Wenxin</creator><creatorcontrib>Wei, Hongwei ; Su, Xiaohong ; Gao, Cuiyun ; Zheng, Weining ; Tao, Wenxin</creatorcontrib><description>Software cross-modal retrieval is a popular yet challenging direction, such as bug localization and code search. Previous studies generally map natural language texts and codes into a homogeneous semantic space for similarity measurement. However, it is not easy to accurately capture their similar semantics in a homogeneous semantic space due to the semantic gap. Therefore, we propose to map the multi-modal data into heterogeneous semantic spaces to capture their unique semantics. Specifically, we propose a novel software cross-modal retrieval framework named Deep Hypothesis Testing (DeepHT). In DeepHT, to capture the unique semantics of the code’s control flow structure, all control flow paths (CFPs) in the control flow graph are mapped to a CFP sample set in the sample space. Meanwhile, the text is mapped to a CFP correlation distribution in the distribution space to model its correlation with different CFPs. The matching score is calculated according to how well the sample set obeys the distribution using hypothesis testing. The experimental results on two text-to-code retrieval tasks (i.e., bug localization and code search) and two code-to-text retrieval tasks (i.e., vulnerability knowledge retrieval and historical patch retrieval) show that DeepHT outperforms the baseline methods.</description><identifier>ISSN: 1049-331X</identifier><identifier>EISSN: 1557-7392</identifier><identifier>DOI: 10.1145/3591868</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Search-based software engineering ; Software and its engineering</subject><ispartof>ACM transactions on software engineering and methodology, 2023-07, Vol.32 (5), p.1-28, Article 123</ispartof><rights>Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-a239t-3710e72d5db43b5d2e3f7497e6c2f092e1916e30edea68f9fa673143527a89953</cites><orcidid>0000-0002-8584-0716 ; 0000-0002-5607-1065 ; 0000-0003-4774-2434 ; 0000-0001-6818-5118 ; 0000-0003-3668-3600</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://dl.acm.org/doi/pdf/10.1145/3591868$$EPDF$$P50$$Gacm$$H</linktopdf><link.rule.ids>314,777,781,2276,27905,27906,40177,75977</link.rule.ids></links><search><creatorcontrib>Wei, Hongwei</creatorcontrib><creatorcontrib>Su, Xiaohong</creatorcontrib><creatorcontrib>Gao, Cuiyun</creatorcontrib><creatorcontrib>Zheng, Weining</creatorcontrib><creatorcontrib>Tao, Wenxin</creatorcontrib><title>A Hypothesis Testing-based Framework for Software Cross-modal Retrieval in Heterogeneous Semantic Spaces</title><title>ACM transactions on software engineering and methodology</title><addtitle>ACM TOSEM</addtitle><description>Software cross-modal retrieval is a popular yet challenging direction, such as bug localization and code search. Previous studies generally map natural language texts and codes into a homogeneous semantic space for similarity measurement. However, it is not easy to accurately capture their similar semantics in a homogeneous semantic space due to the semantic gap. Therefore, we propose to map the multi-modal data into heterogeneous semantic spaces to capture their unique semantics. Specifically, we propose a novel software cross-modal retrieval framework named Deep Hypothesis Testing (DeepHT). In DeepHT, to capture the unique semantics of the code’s control flow structure, all control flow paths (CFPs) in the control flow graph are mapped to a CFP sample set in the sample space. Meanwhile, the text is mapped to a CFP correlation distribution in the distribution space to model its correlation with different CFPs. The matching score is calculated according to how well the sample set obeys the distribution using hypothesis testing. The experimental results on two text-to-code retrieval tasks (i.e., bug localization and code search) and two code-to-text retrieval tasks (i.e., vulnerability knowledge retrieval and historical patch retrieval) show that DeepHT outperforms the baseline methods.</description><subject>Search-based software engineering</subject><subject>Software and its engineering</subject><issn>1049-331X</issn><issn>1557-7392</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNo9kEFLAzEUhIMoWKt495Sbp2iSt9lsjqVYKxQEt4K3Jd19aVe7m5JES_-9La2eZmA-hmEIuRX8QYhMPYIyosiLMzIQSmmmwcjzveeZYQDi45JcxfjJuQAuswFZjeh0t_FphbGNdI4xtf2SLWzEhk6C7XDrwxd1PtDSu7S1Aek4-BhZ5xu7pm-YQos_e9f2dIoJg19ij_470hI726e2puXG1hivyYWz64g3Jx2S98nTfDxls9fnl_FoxqwEkxhowVHLRjWLDBaqkQhOZ0ZjXkvHjURhRI7AsUGbF844m2sQGSipbWGMgiG5P_bWh5kBXbUJbWfDrhK8OhxUnQ7ak3dH0tbdP_QX_gKYamEB</recordid><startdate>20230721</startdate><enddate>20230721</enddate><creator>Wei, Hongwei</creator><creator>Su, Xiaohong</creator><creator>Gao, Cuiyun</creator><creator>Zheng, Weining</creator><creator>Tao, Wenxin</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-8584-0716</orcidid><orcidid>https://orcid.org/0000-0002-5607-1065</orcidid><orcidid>https://orcid.org/0000-0003-4774-2434</orcidid><orcidid>https://orcid.org/0000-0001-6818-5118</orcidid><orcidid>https://orcid.org/0000-0003-3668-3600</orcidid></search><sort><creationdate>20230721</creationdate><title>A Hypothesis Testing-based Framework for Software Cross-modal Retrieval in Heterogeneous Semantic Spaces</title><author>Wei, Hongwei ; Su, Xiaohong ; Gao, Cuiyun ; Zheng, Weining ; Tao, Wenxin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a239t-3710e72d5db43b5d2e3f7497e6c2f092e1916e30edea68f9fa673143527a89953</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Search-based software engineering</topic><topic>Software and its engineering</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wei, Hongwei</creatorcontrib><creatorcontrib>Su, Xiaohong</creatorcontrib><creatorcontrib>Gao, Cuiyun</creatorcontrib><creatorcontrib>Zheng, Weining</creatorcontrib><creatorcontrib>Tao, Wenxin</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on software engineering and methodology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wei, Hongwei</au><au>Su, Xiaohong</au><au>Gao, Cuiyun</au><au>Zheng, Weining</au><au>Tao, Wenxin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Hypothesis Testing-based Framework for Software Cross-modal Retrieval in Heterogeneous Semantic Spaces</atitle><jtitle>ACM transactions on software engineering and methodology</jtitle><stitle>ACM TOSEM</stitle><date>2023-07-21</date><risdate>2023</risdate><volume>32</volume><issue>5</issue><spage>1</spage><epage>28</epage><pages>1-28</pages><artnum>123</artnum><issn>1049-331X</issn><eissn>1557-7392</eissn><abstract>Software cross-modal retrieval is a popular yet challenging direction, such as bug localization and code search. Previous studies generally map natural language texts and codes into a homogeneous semantic space for similarity measurement. However, it is not easy to accurately capture their similar semantics in a homogeneous semantic space due to the semantic gap. Therefore, we propose to map the multi-modal data into heterogeneous semantic spaces to capture their unique semantics. Specifically, we propose a novel software cross-modal retrieval framework named Deep Hypothesis Testing (DeepHT). In DeepHT, to capture the unique semantics of the code’s control flow structure, all control flow paths (CFPs) in the control flow graph are mapped to a CFP sample set in the sample space. Meanwhile, the text is mapped to a CFP correlation distribution in the distribution space to model its correlation with different CFPs. The matching score is calculated according to how well the sample set obeys the distribution using hypothesis testing. The experimental results on two text-to-code retrieval tasks (i.e., bug localization and code search) and two code-to-text retrieval tasks (i.e., vulnerability knowledge retrieval and historical patch retrieval) show that DeepHT outperforms the baseline methods.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/3591868</doi><tpages>28</tpages><orcidid>https://orcid.org/0000-0002-8584-0716</orcidid><orcidid>https://orcid.org/0000-0002-5607-1065</orcidid><orcidid>https://orcid.org/0000-0003-4774-2434</orcidid><orcidid>https://orcid.org/0000-0001-6818-5118</orcidid><orcidid>https://orcid.org/0000-0003-3668-3600</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1049-331X |
ispartof | ACM transactions on software engineering and methodology, 2023-07, Vol.32 (5), p.1-28, Article 123 |
issn | 1049-331X 1557-7392 |
language | eng |
recordid | cdi_crossref_primary_10_1145_3591868 |
source | ACM Digital Library |
subjects | Search-based software engineering Software and its engineering |
title | A Hypothesis Testing-based Framework for Software Cross-modal Retrieval in Heterogeneous Semantic Spaces |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-19T17%3A54%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Hypothesis%20Testing-based%20Framework%20for%20Software%20Cross-modal%20Retrieval%20in%20Heterogeneous%20Semantic%20Spaces&rft.jtitle=ACM%20transactions%20on%20software%20engineering%20and%20methodology&rft.au=Wei,%20Hongwei&rft.date=2023-07-21&rft.volume=32&rft.issue=5&rft.spage=1&rft.epage=28&rft.pages=1-28&rft.artnum=123&rft.issn=1049-331X&rft.eissn=1557-7392&rft_id=info:doi/10.1145/3591868&rft_dat=%3Cacm_cross%3E3591868%3C/acm_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |