Multi-Granularity Detector for Vulnerability Fixes

With the increasing reliance on Open Source Software, users are exposed to third-party library vulnerabilities. Software Composition Analysis (SCA) tools have been created to alert users of such vulnerabilities. SCA requires the identification of vulnerability-fixing commits. Prior works have propos...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on software engineering 2023-08, Vol.49 (8), p.4035-4057
Hauptverfasser:	Nguyen, Truong Giang, Le-Cong, Thanh, Kang, Hong Jin, Widyasari, Ratnadira, Yang, Chengran, Zhao, Zhipeng, Xu, Bowen, Zhou, Jiayuan, Xia, Xin, Hassan, Ahmed E., Le, Xuan-Bach D., Lo, David
Format:	Artikel
Sprache:	eng
Schlagworte:	Codes deep learning Fixing Java Libraries machine learning Neural networks Open source software Predictive models Security Software software security State of the art Task analysis Testing Vulnerability-fixing commit classification
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	4057
container_issue	8
container_start_page	4035
container_title	IEEE transactions on software engineering
container_volume	49
creator	Nguyen, Truong Giang Le-Cong, Thanh Kang, Hong Jin Widyasari, Ratnadira Yang, Chengran Zhao, Zhipeng Xu, Bowen Zhou, Jiayuan Xia, Xin Hassan, Ahmed E. Le, Xuan-Bach D. Lo, David
description	With the increasing reliance on Open Source Software, users are exposed to third-party library vulnerabilities. Software Composition Analysis (SCA) tools have been created to alert users of such vulnerabilities. SCA requires the identification of vulnerability-fixing commits. Prior works have proposed methods that can automatically identify such vulnerability-fixing commits. However, identifying such commits is highly challenging, as only a very small minority of commits are vulnerability fixing. Moreover, code changes can be noisy and difficult to analyze. We observe that noise can occur at different levels of detail, making it challenging to detect vulnerability fixes accurately. To address these challenges and boost the effectiveness of prior works, we propose MiDas (Multi-Granularity Detector for Vulnerability Fixes). Unique from prior works, MiDas constructs different neural networks for each level of code change granularity, corresponding to commit-level, file-level, hunk-level, and line-level, following their natural organization and then use an ensemble model combining all base models to output the final prediction. This design allows MiDas to better cope with the noisy and highly-imbalanced nature of vulnerability-fixing commit data. In addition, to reduce the human effort required to inspect code changes, we have designed an effort-aware adjustment for MiDas's outputs based on commit length. The evaluation result demonstrates that MiDas outperforms the current state-of-the-art baseline on both Java and Python-based datasets in terms of AUC by 4.9% and 13.7%, respectively. Furthermore, in terms of two effort-aware metrics, i.e., EffortCost@L and Popt@L, MiDas also performs better than the state-of-the-art baseline up to 28.2% and 15.9% on Java, 60% and 51.4% on Python, respectively.
doi_str_mv	10.1109/TSE.2023.3281275
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_10138621</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10138621</ieee_id><sourcerecordid>2851355765</sourcerecordid><originalsourceid>FETCH-LOGICAL-c292t-355acdd78014d567c2f2f0c1e627e81aa8af82836cf2c306916aabb4acb581383</originalsourceid><addsrcrecordid>eNpNkEtLAzEUhYMoOFb3LlwUXE-9uWleS6ltFSourG5DJk1gytipyQzYf98M7cLF5SzuecBHyD2FCaWgn9af8wkCsglDRVHyC1JQzXTJOMIlKQC0KjlX-prcpLQFAC4lLwi-901Xl8tod31jY90dxi--865r4zjk--6bnY-2qpvhtaj_fLolV8E2yd-ddUS-FvP17LVcfSzfZs-r0qHGLg9z6zYbqYBON1xIhwEDOOoFSq-otcoGhYoJF9AxEJoKa6tqal3FFWWKjcjjqXcf29_ep85s2z7u8qRBxWnul4JnF5xcLrYpRR_MPtY_Nh4MBTOQMZmMGciYM5kceThFau_9P3seFUjZETg7XnQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2851355765</pqid></control><display><type>article</type><title>Multi-Granularity Detector for Vulnerability Fixes</title><source>IEEE Electronic Library (IEL)</source><creator>Nguyen, Truong Giang ; Le-Cong, Thanh ; Kang, Hong Jin ; Widyasari, Ratnadira ; Yang, Chengran ; Zhao, Zhipeng ; Xu, Bowen ; Zhou, Jiayuan ; Xia, Xin ; Hassan, Ahmed E. ; Le, Xuan-Bach D. ; Lo, David</creator><creatorcontrib>Nguyen, Truong Giang ; Le-Cong, Thanh ; Kang, Hong Jin ; Widyasari, Ratnadira ; Yang, Chengran ; Zhao, Zhipeng ; Xu, Bowen ; Zhou, Jiayuan ; Xia, Xin ; Hassan, Ahmed E. ; Le, Xuan-Bach D. ; Lo, David</creatorcontrib><description>With the increasing reliance on Open Source Software, users are exposed to third-party library vulnerabilities. Software Composition Analysis (SCA) tools have been created to alert users of such vulnerabilities. SCA requires the identification of vulnerability-fixing commits. Prior works have proposed methods that can automatically identify such vulnerability-fixing commits. However, identifying such commits is highly challenging, as only a very small minority of commits are vulnerability fixing. Moreover, code changes can be noisy and difficult to analyze. We observe that noise can occur at different levels of detail, making it challenging to detect vulnerability fixes accurately. To address these challenges and boost the effectiveness of prior works, we propose MiDas (Multi-Granularity Detector for Vulnerability Fixes). Unique from prior works, MiDas constructs different neural networks for each level of code change granularity, corresponding to commit-level, file-level, hunk-level, and line-level, following their natural organization and then use an ensemble model combining all base models to output the final prediction. This design allows MiDas to better cope with the noisy and highly-imbalanced nature of vulnerability-fixing commit data. In addition, to reduce the human effort required to inspect code changes, we have designed an effort-aware adjustment for MiDas's outputs based on commit length. The evaluation result demonstrates that MiDas outperforms the current state-of-the-art baseline on both Java and Python-based datasets in terms of AUC by 4.9% and 13.7%, respectively. Furthermore, in terms of two effort-aware metrics, i.e., EffortCost@L and Popt@L, MiDas also performs better than the state-of-the-art baseline up to 28.2% and 15.9% on Java, 60% and 51.4% on Python, respectively.</description><identifier>ISSN: 0098-5589</identifier><identifier>EISSN: 1939-3520</identifier><identifier>DOI: 10.1109/TSE.2023.3281275</identifier><identifier>CODEN: IESEDJ</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Codes ; deep learning ; Fixing ; Java ; Libraries ; machine learning ; Neural networks ; Open source software ; Predictive models ; Security ; Software ; software security ; State of the art ; Task analysis ; Testing ; Vulnerability-fixing commit classification</subject><ispartof>IEEE transactions on software engineering, 2023-08, Vol.49 (8), p.4035-4057</ispartof><rights>Copyright IEEE Computer Society 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c292t-355acdd78014d567c2f2f0c1e627e81aa8af82836cf2c306916aabb4acb581383</citedby><cites>FETCH-LOGICAL-c292t-355acdd78014d567c2f2f0c1e627e81aa8af82836cf2c306916aabb4acb581383</cites><orcidid>0000-0002-1006-8493 ; 0000-0001-8190-5458 ; 0000-0001-6100-8127 ; 0000-0002-6302-3256 ; 0000-0001-5044-1582 ; 0000-0001-7335-7295 ; 0000-0002-5181-3146 ; 0000-0002-1701-0286 ; 0000-0002-4367-7201 ; 0000-0002-9566-324X ; 0000-0002-1057-7650</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10138621$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10138621$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Nguyen, Truong Giang</creatorcontrib><creatorcontrib>Le-Cong, Thanh</creatorcontrib><creatorcontrib>Kang, Hong Jin</creatorcontrib><creatorcontrib>Widyasari, Ratnadira</creatorcontrib><creatorcontrib>Yang, Chengran</creatorcontrib><creatorcontrib>Zhao, Zhipeng</creatorcontrib><creatorcontrib>Xu, Bowen</creatorcontrib><creatorcontrib>Zhou, Jiayuan</creatorcontrib><creatorcontrib>Xia, Xin</creatorcontrib><creatorcontrib>Hassan, Ahmed E.</creatorcontrib><creatorcontrib>Le, Xuan-Bach D.</creatorcontrib><creatorcontrib>Lo, David</creatorcontrib><title>Multi-Granularity Detector for Vulnerability Fixes</title><title>IEEE transactions on software engineering</title><addtitle>TSE</addtitle><description>With the increasing reliance on Open Source Software, users are exposed to third-party library vulnerabilities. Software Composition Analysis (SCA) tools have been created to alert users of such vulnerabilities. SCA requires the identification of vulnerability-fixing commits. Prior works have proposed methods that can automatically identify such vulnerability-fixing commits. However, identifying such commits is highly challenging, as only a very small minority of commits are vulnerability fixing. Moreover, code changes can be noisy and difficult to analyze. We observe that noise can occur at different levels of detail, making it challenging to detect vulnerability fixes accurately. To address these challenges and boost the effectiveness of prior works, we propose MiDas (Multi-Granularity Detector for Vulnerability Fixes). Unique from prior works, MiDas constructs different neural networks for each level of code change granularity, corresponding to commit-level, file-level, hunk-level, and line-level, following their natural organization and then use an ensemble model combining all base models to output the final prediction. This design allows MiDas to better cope with the noisy and highly-imbalanced nature of vulnerability-fixing commit data. In addition, to reduce the human effort required to inspect code changes, we have designed an effort-aware adjustment for MiDas's outputs based on commit length. The evaluation result demonstrates that MiDas outperforms the current state-of-the-art baseline on both Java and Python-based datasets in terms of AUC by 4.9% and 13.7%, respectively. Furthermore, in terms of two effort-aware metrics, i.e., EffortCost@L and Popt@L, MiDas also performs better than the state-of-the-art baseline up to 28.2% and 15.9% on Java, 60% and 51.4% on Python, respectively.</description><subject>Codes</subject><subject>deep learning</subject><subject>Fixing</subject><subject>Java</subject><subject>Libraries</subject><subject>machine learning</subject><subject>Neural networks</subject><subject>Open source software</subject><subject>Predictive models</subject><subject>Security</subject><subject>Software</subject><subject>software security</subject><subject>State of the art</subject><subject>Task analysis</subject><subject>Testing</subject><subject>Vulnerability-fixing commit classification</subject><issn>0098-5589</issn><issn>1939-3520</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkEtLAzEUhYMoOFb3LlwUXE-9uWleS6ltFSourG5DJk1gytipyQzYf98M7cLF5SzuecBHyD2FCaWgn9af8wkCsglDRVHyC1JQzXTJOMIlKQC0KjlX-prcpLQFAC4lLwi-901Xl8tod31jY90dxi--865r4zjk--6bnY-2qpvhtaj_fLolV8E2yd-ddUS-FvP17LVcfSzfZs-r0qHGLg9z6zYbqYBON1xIhwEDOOoFSq-otcoGhYoJF9AxEJoKa6tqal3FFWWKjcjjqXcf29_ep85s2z7u8qRBxWnul4JnF5xcLrYpRR_MPtY_Nh4MBTOQMZmMGciYM5kceThFau_9P3seFUjZETg7XnQ</recordid><startdate>20230801</startdate><enddate>20230801</enddate><creator>Nguyen, Truong Giang</creator><creator>Le-Cong, Thanh</creator><creator>Kang, Hong Jin</creator><creator>Widyasari, Ratnadira</creator><creator>Yang, Chengran</creator><creator>Zhao, Zhipeng</creator><creator>Xu, Bowen</creator><creator>Zhou, Jiayuan</creator><creator>Xia, Xin</creator><creator>Hassan, Ahmed E.</creator><creator>Le, Xuan-Bach D.</creator><creator>Lo, David</creator><general>IEEE</general><general>IEEE Computer Society</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>JQ2</scope><scope>K9.</scope><orcidid>https://orcid.org/0000-0002-1006-8493</orcidid><orcidid>https://orcid.org/0000-0001-8190-5458</orcidid><orcidid>https://orcid.org/0000-0001-6100-8127</orcidid><orcidid>https://orcid.org/0000-0002-6302-3256</orcidid><orcidid>https://orcid.org/0000-0001-5044-1582</orcidid><orcidid>https://orcid.org/0000-0001-7335-7295</orcidid><orcidid>https://orcid.org/0000-0002-5181-3146</orcidid><orcidid>https://orcid.org/0000-0002-1701-0286</orcidid><orcidid>https://orcid.org/0000-0002-4367-7201</orcidid><orcidid>https://orcid.org/0000-0002-9566-324X</orcidid><orcidid>https://orcid.org/0000-0002-1057-7650</orcidid></search><sort><creationdate>20230801</creationdate><title>Multi-Granularity Detector for Vulnerability Fixes</title><author>Nguyen, Truong Giang ; Le-Cong, Thanh ; Kang, Hong Jin ; Widyasari, Ratnadira ; Yang, Chengran ; Zhao, Zhipeng ; Xu, Bowen ; Zhou, Jiayuan ; Xia, Xin ; Hassan, Ahmed E. ; Le, Xuan-Bach D. ; Lo, David</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c292t-355acdd78014d567c2f2f0c1e627e81aa8af82836cf2c306916aabb4acb581383</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Codes</topic><topic>deep learning</topic><topic>Fixing</topic><topic>Java</topic><topic>Libraries</topic><topic>machine learning</topic><topic>Neural networks</topic><topic>Open source software</topic><topic>Predictive models</topic><topic>Security</topic><topic>Software</topic><topic>software security</topic><topic>State of the art</topic><topic>Task analysis</topic><topic>Testing</topic><topic>Vulnerability-fixing commit classification</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Nguyen, Truong Giang</creatorcontrib><creatorcontrib>Le-Cong, Thanh</creatorcontrib><creatorcontrib>Kang, Hong Jin</creatorcontrib><creatorcontrib>Widyasari, Ratnadira</creatorcontrib><creatorcontrib>Yang, Chengran</creatorcontrib><creatorcontrib>Zhao, Zhipeng</creatorcontrib><creatorcontrib>Xu, Bowen</creatorcontrib><creatorcontrib>Zhou, Jiayuan</creatorcontrib><creatorcontrib>Xia, Xin</creatorcontrib><creatorcontrib>Hassan, Ahmed E.</creatorcontrib><creatorcontrib>Le, Xuan-Bach D.</creatorcontrib><creatorcontrib>Lo, David</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><jtitle>IEEE transactions on software engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Nguyen, Truong Giang</au><au>Le-Cong, Thanh</au><au>Kang, Hong Jin</au><au>Widyasari, Ratnadira</au><au>Yang, Chengran</au><au>Zhao, Zhipeng</au><au>Xu, Bowen</au><au>Zhou, Jiayuan</au><au>Xia, Xin</au><au>Hassan, Ahmed E.</au><au>Le, Xuan-Bach D.</au><au>Lo, David</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multi-Granularity Detector for Vulnerability Fixes</atitle><jtitle>IEEE transactions on software engineering</jtitle><stitle>TSE</stitle><date>2023-08-01</date><risdate>2023</risdate><volume>49</volume><issue>8</issue><spage>4035</spage><epage>4057</epage><pages>4035-4057</pages><issn>0098-5589</issn><eissn>1939-3520</eissn><coden>IESEDJ</coden><abstract>With the increasing reliance on Open Source Software, users are exposed to third-party library vulnerabilities. Software Composition Analysis (SCA) tools have been created to alert users of such vulnerabilities. SCA requires the identification of vulnerability-fixing commits. Prior works have proposed methods that can automatically identify such vulnerability-fixing commits. However, identifying such commits is highly challenging, as only a very small minority of commits are vulnerability fixing. Moreover, code changes can be noisy and difficult to analyze. We observe that noise can occur at different levels of detail, making it challenging to detect vulnerability fixes accurately. To address these challenges and boost the effectiveness of prior works, we propose MiDas (Multi-Granularity Detector for Vulnerability Fixes). Unique from prior works, MiDas constructs different neural networks for each level of code change granularity, corresponding to commit-level, file-level, hunk-level, and line-level, following their natural organization and then use an ensemble model combining all base models to output the final prediction. This design allows MiDas to better cope with the noisy and highly-imbalanced nature of vulnerability-fixing commit data. In addition, to reduce the human effort required to inspect code changes, we have designed an effort-aware adjustment for MiDas's outputs based on commit length. The evaluation result demonstrates that MiDas outperforms the current state-of-the-art baseline on both Java and Python-based datasets in terms of AUC by 4.9% and 13.7%, respectively. Furthermore, in terms of two effort-aware metrics, i.e., EffortCost@L and Popt@L, MiDas also performs better than the state-of-the-art baseline up to 28.2% and 15.9% on Java, 60% and 51.4% on Python, respectively.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TSE.2023.3281275</doi><tpages>23</tpages><orcidid>https://orcid.org/0000-0002-1006-8493</orcidid><orcidid>https://orcid.org/0000-0001-8190-5458</orcidid><orcidid>https://orcid.org/0000-0001-6100-8127</orcidid><orcidid>https://orcid.org/0000-0002-6302-3256</orcidid><orcidid>https://orcid.org/0000-0001-5044-1582</orcidid><orcidid>https://orcid.org/0000-0001-7335-7295</orcidid><orcidid>https://orcid.org/0000-0002-5181-3146</orcidid><orcidid>https://orcid.org/0000-0002-1701-0286</orcidid><orcidid>https://orcid.org/0000-0002-4367-7201</orcidid><orcidid>https://orcid.org/0000-0002-9566-324X</orcidid><orcidid>https://orcid.org/0000-0002-1057-7650</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 0098-5589
ispartof	IEEE transactions on software engineering, 2023-08, Vol.49 (8), p.4035-4057
issn	0098-5589 1939-3520
language	eng
recordid	cdi_ieee_primary_10138621
source	IEEE Electronic Library (IEL)
subjects	Codes deep learning Fixing Java Libraries machine learning Neural networks Open source software Predictive models Security Software software security State of the art Task analysis Testing Vulnerability-fixing commit classification
title	Multi-Granularity Detector for Vulnerability Fixes
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T17%3A51%3A15IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multi-Granularity%20Detector%20for%20Vulnerability%20Fixes&rft.jtitle=IEEE%20transactions%20on%20software%20engineering&rft.au=Nguyen,%20Truong%20Giang&rft.date=2023-08-01&rft.volume=49&rft.issue=8&rft.spage=4035&rft.epage=4057&rft.pages=4035-4057&rft.issn=0098-5589&rft.eissn=1939-3520&rft.coden=IESEDJ&rft_id=info:doi/10.1109/TSE.2023.3281275&rft_dat=%3Cproquest_RIE%3E2851355765%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2851355765&rft_id=info:pmid/&rft_ieee_id=10138621&rfr_iscdi=true