Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries

Security experts reverse engineer (decompile) binary code to identify critical security vulnerabilities. The limited access to source code in vital systems - such as firmware, drivers, and proprietary software used in Critical Infrastructures (CI) - makes this analysis even more crucial on the binar...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Manuel, Dylan, Islam, Nafis Tanveer, Khoury, Joseph, Nunez, Ana, Bou-Harb, Elias, Najafirad, Peyman
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Cryptography and Security
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Manuel, Dylan Islam, Nafis Tanveer Khoury, Joseph Nunez, Ana Bou-Harb, Elias Najafirad, Peyman
description	Security experts reverse engineer (decompile) binary code to identify critical security vulnerabilities. The limited access to source code in vital systems - such as firmware, drivers, and proprietary software used in Critical Infrastructures (CI) - makes this analysis even more crucial on the binary level. Even with available source code, a semantic gap persists after compilation between the source and the binary code executed by the processor. This gap may hinder the detection of vulnerabilities in source code. That being said, current research on Large Language Models (LLMs) overlooks the significance of decompiled binaries in this area by focusing solely on source code. In this work, we are the first to empirically uncover the substantial semantic limitations of state-of-the-art LLMs when it comes to analyzing vulnerabilities in decompiled binaries, largely due to the absence of relevant datasets. To bridge the gap, we introduce DeBinVul, a novel decompiled binary code vulnerability dataset. Our dataset is multi-architecture and multi-optimization, focusing on C/C++ due to their wide usage in CI and association with numerous vulnerabilities. Specifically, we curate 150,872 samples of vulnerable and non-vulnerable decompiled binary code for the task of (i) identifying; (ii) classifying; (iii) describing vulnerabilities; and (iv) recovering function names in the domain of decompiled binaries. Subsequently, we fine-tune state-of-the-art LLMs using DeBinVul and report on a performance increase of 19%, 24%, and 21% in the capabilities of CodeLlama, Llama3, and CodeGen2 respectively, in detecting binary code vulnerabilities. Additionally, using DeBinVul, we report a high performance of 80-90% on the vulnerability classification task. Furthermore, we report improved performance in function name recovery and vulnerability description tasks.
doi_str_mv	10.48550/arxiv.2411.04981
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2411_04981</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2411_04981</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2411_049813</originalsourceid><addsrcrecordid>eNqFjs1uwkAMhPfCAQEP0BN-gYakTSTojZ8gKpVLVfUamWAWi8VB3hCRB-h7d4O4c_FYo9HMZ8xLEkfpNMviCeqNm-gtTZIoTmfTpG_-cjmilCwWvqkh9QS5WBYiDd4HfEpDvmaLdRdB2cOCpDyeUU-d8YVqKVyxVwzPttqT83CoFH6vTkhxx47rFuaCrvXsgQVWVFbnCzsKXSyoTH5oegd0nkYPHZjxOv9Zbl7vwMVFOQy2RQde3MHfnyf-AUPbUJk</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries</title><source>arXiv.org</source><creator>Manuel, Dylan ; Islam, Nafis Tanveer ; Khoury, Joseph ; Nunez, Ana ; Bou-Harb, Elias ; Najafirad, Peyman</creator><creatorcontrib>Manuel, Dylan ; Islam, Nafis Tanveer ; Khoury, Joseph ; Nunez, Ana ; Bou-Harb, Elias ; Najafirad, Peyman</creatorcontrib><description>Security experts reverse engineer (decompile) binary code to identify critical security vulnerabilities. The limited access to source code in vital systems - such as firmware, drivers, and proprietary software used in Critical Infrastructures (CI) - makes this analysis even more crucial on the binary level. Even with available source code, a semantic gap persists after compilation between the source and the binary code executed by the processor. This gap may hinder the detection of vulnerabilities in source code. That being said, current research on Large Language Models (LLMs) overlooks the significance of decompiled binaries in this area by focusing solely on source code. In this work, we are the first to empirically uncover the substantial semantic limitations of state-of-the-art LLMs when it comes to analyzing vulnerabilities in decompiled binaries, largely due to the absence of relevant datasets. To bridge the gap, we introduce DeBinVul, a novel decompiled binary code vulnerability dataset. Our dataset is multi-architecture and multi-optimization, focusing on C/C++ due to their wide usage in CI and association with numerous vulnerabilities. Specifically, we curate 150,872 samples of vulnerable and non-vulnerable decompiled binary code for the task of (i) identifying; (ii) classifying; (iii) describing vulnerabilities; and (iv) recovering function names in the domain of decompiled binaries. Subsequently, we fine-tune state-of-the-art LLMs using DeBinVul and report on a performance increase of 19%, 24%, and 21% in the capabilities of CodeLlama, Llama3, and CodeGen2 respectively, in detecting binary code vulnerabilities. Additionally, using DeBinVul, we report a high performance of 80-90% on the vulnerability classification task. Furthermore, we report improved performance in function name recovery and vulnerability description tasks.</description><identifier>DOI: 10.48550/arxiv.2411.04981</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Cryptography and Security</subject><creationdate>2024-11</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,777,882</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2411.04981$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2411.04981$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Manuel, Dylan</creatorcontrib><creatorcontrib>Islam, Nafis Tanveer</creatorcontrib><creatorcontrib>Khoury, Joseph</creatorcontrib><creatorcontrib>Nunez, Ana</creatorcontrib><creatorcontrib>Bou-Harb, Elias</creatorcontrib><creatorcontrib>Najafirad, Peyman</creatorcontrib><title>Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries</title><description>Security experts reverse engineer (decompile) binary code to identify critical security vulnerabilities. The limited access to source code in vital systems - such as firmware, drivers, and proprietary software used in Critical Infrastructures (CI) - makes this analysis even more crucial on the binary level. Even with available source code, a semantic gap persists after compilation between the source and the binary code executed by the processor. This gap may hinder the detection of vulnerabilities in source code. That being said, current research on Large Language Models (LLMs) overlooks the significance of decompiled binaries in this area by focusing solely on source code. In this work, we are the first to empirically uncover the substantial semantic limitations of state-of-the-art LLMs when it comes to analyzing vulnerabilities in decompiled binaries, largely due to the absence of relevant datasets. To bridge the gap, we introduce DeBinVul, a novel decompiled binary code vulnerability dataset. Our dataset is multi-architecture and multi-optimization, focusing on C/C++ due to their wide usage in CI and association with numerous vulnerabilities. Specifically, we curate 150,872 samples of vulnerable and non-vulnerable decompiled binary code for the task of (i) identifying; (ii) classifying; (iii) describing vulnerabilities; and (iv) recovering function names in the domain of decompiled binaries. Subsequently, we fine-tune state-of-the-art LLMs using DeBinVul and report on a performance increase of 19%, 24%, and 21% in the capabilities of CodeLlama, Llama3, and CodeGen2 respectively, in detecting binary code vulnerabilities. Additionally, using DeBinVul, we report a high performance of 80-90% on the vulnerability classification task. Furthermore, we report improved performance in function name recovery and vulnerability description tasks.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Cryptography and Security</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFjs1uwkAMhPfCAQEP0BN-gYakTSTojZ8gKpVLVfUamWAWi8VB3hCRB-h7d4O4c_FYo9HMZ8xLEkfpNMviCeqNm-gtTZIoTmfTpG_-cjmilCwWvqkh9QS5WBYiDd4HfEpDvmaLdRdB2cOCpDyeUU-d8YVqKVyxVwzPttqT83CoFH6vTkhxx47rFuaCrvXsgQVWVFbnCzsKXSyoTH5oegd0nkYPHZjxOv9Zbl7vwMVFOQy2RQde3MHfnyf-AUPbUJk</recordid><startdate>20241107</startdate><enddate>20241107</enddate><creator>Manuel, Dylan</creator><creator>Islam, Nafis Tanveer</creator><creator>Khoury, Joseph</creator><creator>Nunez, Ana</creator><creator>Bou-Harb, Elias</creator><creator>Najafirad, Peyman</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241107</creationdate><title>Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries</title><author>Manuel, Dylan ; Islam, Nafis Tanveer ; Khoury, Joseph ; Nunez, Ana ; Bou-Harb, Elias ; Najafirad, Peyman</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2411_049813</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Cryptography and Security</topic><toplevel>online_resources</toplevel><creatorcontrib>Manuel, Dylan</creatorcontrib><creatorcontrib>Islam, Nafis Tanveer</creatorcontrib><creatorcontrib>Khoury, Joseph</creatorcontrib><creatorcontrib>Nunez, Ana</creatorcontrib><creatorcontrib>Bou-Harb, Elias</creatorcontrib><creatorcontrib>Najafirad, Peyman</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Manuel, Dylan</au><au>Islam, Nafis Tanveer</au><au>Khoury, Joseph</au><au>Nunez, Ana</au><au>Bou-Harb, Elias</au><au>Najafirad, Peyman</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries</atitle><date>2024-11-07</date><risdate>2024</risdate><abstract>Security experts reverse engineer (decompile) binary code to identify critical security vulnerabilities. The limited access to source code in vital systems - such as firmware, drivers, and proprietary software used in Critical Infrastructures (CI) - makes this analysis even more crucial on the binary level. Even with available source code, a semantic gap persists after compilation between the source and the binary code executed by the processor. This gap may hinder the detection of vulnerabilities in source code. That being said, current research on Large Language Models (LLMs) overlooks the significance of decompiled binaries in this area by focusing solely on source code. In this work, we are the first to empirically uncover the substantial semantic limitations of state-of-the-art LLMs when it comes to analyzing vulnerabilities in decompiled binaries, largely due to the absence of relevant datasets. To bridge the gap, we introduce DeBinVul, a novel decompiled binary code vulnerability dataset. Our dataset is multi-architecture and multi-optimization, focusing on C/C++ due to their wide usage in CI and association with numerous vulnerabilities. Specifically, we curate 150,872 samples of vulnerable and non-vulnerable decompiled binary code for the task of (i) identifying; (ii) classifying; (iii) describing vulnerabilities; and (iv) recovering function names in the domain of decompiled binaries. Subsequently, we fine-tune state-of-the-art LLMs using DeBinVul and report on a performance increase of 19%, 24%, and 21% in the capabilities of CodeLlama, Llama3, and CodeGen2 respectively, in detecting binary code vulnerabilities. Additionally, using DeBinVul, we report a high performance of 80-90% on the vulnerability classification task. Furthermore, we report improved performance in function name recovery and vulnerability description tasks.</abstract><doi>10.48550/arxiv.2411.04981</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2411.04981
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2411_04981
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Cryptography and Security
title	Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T01%3A16%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Enhancing%20Reverse%20Engineering:%20Investigating%20and%20Benchmarking%20Large%20Language%20Models%20for%20Vulnerability%20Analysis%20in%20Decompiled%20Binaries&rft.au=Manuel,%20Dylan&rft.date=2024-11-07&rft_id=info:doi/10.48550/arxiv.2411.04981&rft_dat=%3Carxiv_GOX%3E2411_04981%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true