PEM: Representing Binary Program Semantics for Similarity Analysis via a Probabilistic Execution Model

Binary similarity analysis determines if two binary executables are from the same source program. Existing techniques leverage static and dynamic program features and may utilize advanced Deep Learning techniques. Although they have demonstrated great potential, the community believes that a more ef...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2023-08
Hauptverfasser: Xu, Xiangzhe, Zhou, Xuan, Feng, Shiwei, Cheng, Siyuan, Ye, Yapeng, Shi, Qingkai, Guanhong Tao, Le, Yu, Zhang, Zhuo, Zhang, Xiangyu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Xu, Xiangzhe
Zhou, Xuan
Feng, Shiwei
Cheng, Siyuan
Ye, Yapeng
Shi, Qingkai
Guanhong Tao
Le, Yu
Zhang, Zhuo
Zhang, Xiangyu
description Binary similarity analysis determines if two binary executables are from the same source program. Existing techniques leverage static and dynamic program features and may utilize advanced Deep Learning techniques. Although they have demonstrated great potential, the community believes that a more effective representation of program semantics can further improve similarity analysis. In this paper, we propose a new method to represent binary program semantics. It is based on a novel probabilistic execution engine that can effectively sample the input space and the program path space of subject binaries. More importantly, it ensures that the collected samples are comparable across binaries, addressing the substantial variations of input specifications. Our evaluation on 9 real-world projects with 35k functions, and comparison with 6 state-of-the-art techniques show that PEM can achieve a precision of 96% with common settings, outperforming the baselines by 10-20%.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2859363326</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2859363326</sourcerecordid><originalsourceid>FETCH-proquest_journals_28593633263</originalsourceid><addsrcrecordid>eNqNzMsKwjAQheEgCBbtOwy4Fmpi62WnUnEjiHVfRk1lSppophX79lbwAVydxf9xeiKQSk0ni5mUAxEyl1EUyWQu41gFojimhxWc9MNr1rYme4cNWfQtHL27e6wg0xV24cpQOA8ZVWTQU93C2qJpmRhehIBff8ELGeIOQ_rW16YmZ-HgbtqMRL9Awzr87VCMd-l5u588vHs2muu8dI3vDjmXi3ipEqVkov5TH47URv4</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2859363326</pqid></control><display><type>article</type><title>PEM: Representing Binary Program Semantics for Similarity Analysis via a Probabilistic Execution Model</title><source>Free E- Journals</source><creator>Xu, Xiangzhe ; Zhou, Xuan ; Feng, Shiwei ; Cheng, Siyuan ; Ye, Yapeng ; Shi, Qingkai ; Guanhong Tao ; Le, Yu ; Zhang, Zhuo ; Zhang, Xiangyu</creator><creatorcontrib>Xu, Xiangzhe ; Zhou, Xuan ; Feng, Shiwei ; Cheng, Siyuan ; Ye, Yapeng ; Shi, Qingkai ; Guanhong Tao ; Le, Yu ; Zhang, Zhuo ; Zhang, Xiangyu</creatorcontrib><description>Binary similarity analysis determines if two binary executables are from the same source program. Existing techniques leverage static and dynamic program features and may utilize advanced Deep Learning techniques. Although they have demonstrated great potential, the community believes that a more effective representation of program semantics can further improve similarity analysis. In this paper, we propose a new method to represent binary program semantics. It is based on a novel probabilistic execution engine that can effectively sample the input space and the program path space of subject binaries. More importantly, it ensures that the collected samples are comparable across binaries, addressing the substantial variations of input specifications. Our evaluation on 9 real-world projects with 35k functions, and comparison with 6 state-of-the-art techniques show that PEM can achieve a precision of 96% with common settings, outperforming the baselines by 10-20%.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Semantics ; Similarity ; Source programs</subject><ispartof>arXiv.org, 2023-08</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Xu, Xiangzhe</creatorcontrib><creatorcontrib>Zhou, Xuan</creatorcontrib><creatorcontrib>Feng, Shiwei</creatorcontrib><creatorcontrib>Cheng, Siyuan</creatorcontrib><creatorcontrib>Ye, Yapeng</creatorcontrib><creatorcontrib>Shi, Qingkai</creatorcontrib><creatorcontrib>Guanhong Tao</creatorcontrib><creatorcontrib>Le, Yu</creatorcontrib><creatorcontrib>Zhang, Zhuo</creatorcontrib><creatorcontrib>Zhang, Xiangyu</creatorcontrib><title>PEM: Representing Binary Program Semantics for Similarity Analysis via a Probabilistic Execution Model</title><title>arXiv.org</title><description>Binary similarity analysis determines if two binary executables are from the same source program. Existing techniques leverage static and dynamic program features and may utilize advanced Deep Learning techniques. Although they have demonstrated great potential, the community believes that a more effective representation of program semantics can further improve similarity analysis. In this paper, we propose a new method to represent binary program semantics. It is based on a novel probabilistic execution engine that can effectively sample the input space and the program path space of subject binaries. More importantly, it ensures that the collected samples are comparable across binaries, addressing the substantial variations of input specifications. Our evaluation on 9 real-world projects with 35k functions, and comparison with 6 state-of-the-art techniques show that PEM can achieve a precision of 96% with common settings, outperforming the baselines by 10-20%.</description><subject>Semantics</subject><subject>Similarity</subject><subject>Source programs</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNzMsKwjAQheEgCBbtOwy4Fmpi62WnUnEjiHVfRk1lSppophX79lbwAVydxf9xeiKQSk0ni5mUAxEyl1EUyWQu41gFojimhxWc9MNr1rYme4cNWfQtHL27e6wg0xV24cpQOA8ZVWTQU93C2qJpmRhehIBff8ELGeIOQ_rW16YmZ-HgbtqMRL9Awzr87VCMd-l5u588vHs2muu8dI3vDjmXi3ipEqVkov5TH47URv4</recordid><startdate>20230830</startdate><enddate>20230830</enddate><creator>Xu, Xiangzhe</creator><creator>Zhou, Xuan</creator><creator>Feng, Shiwei</creator><creator>Cheng, Siyuan</creator><creator>Ye, Yapeng</creator><creator>Shi, Qingkai</creator><creator>Guanhong Tao</creator><creator>Le, Yu</creator><creator>Zhang, Zhuo</creator><creator>Zhang, Xiangyu</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope></search><sort><creationdate>20230830</creationdate><title>PEM: Representing Binary Program Semantics for Similarity Analysis via a Probabilistic Execution Model</title><author>Xu, Xiangzhe ; Zhou, Xuan ; Feng, Shiwei ; Cheng, Siyuan ; Ye, Yapeng ; Shi, Qingkai ; Guanhong Tao ; Le, Yu ; Zhang, Zhuo ; Zhang, Xiangyu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28593633263</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Semantics</topic><topic>Similarity</topic><topic>Source programs</topic><toplevel>online_resources</toplevel><creatorcontrib>Xu, Xiangzhe</creatorcontrib><creatorcontrib>Zhou, Xuan</creatorcontrib><creatorcontrib>Feng, Shiwei</creatorcontrib><creatorcontrib>Cheng, Siyuan</creatorcontrib><creatorcontrib>Ye, Yapeng</creatorcontrib><creatorcontrib>Shi, Qingkai</creatorcontrib><creatorcontrib>Guanhong Tao</creatorcontrib><creatorcontrib>Le, Yu</creatorcontrib><creatorcontrib>Zhang, Zhuo</creatorcontrib><creatorcontrib>Zhang, Xiangyu</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Xu, Xiangzhe</au><au>Zhou, Xuan</au><au>Feng, Shiwei</au><au>Cheng, Siyuan</au><au>Ye, Yapeng</au><au>Shi, Qingkai</au><au>Guanhong Tao</au><au>Le, Yu</au><au>Zhang, Zhuo</au><au>Zhang, Xiangyu</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>PEM: Representing Binary Program Semantics for Similarity Analysis via a Probabilistic Execution Model</atitle><jtitle>arXiv.org</jtitle><date>2023-08-30</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Binary similarity analysis determines if two binary executables are from the same source program. Existing techniques leverage static and dynamic program features and may utilize advanced Deep Learning techniques. Although they have demonstrated great potential, the community believes that a more effective representation of program semantics can further improve similarity analysis. In this paper, we propose a new method to represent binary program semantics. It is based on a novel probabilistic execution engine that can effectively sample the input space and the program path space of subject binaries. More importantly, it ensures that the collected samples are comparable across binaries, addressing the substantial variations of input specifications. Our evaluation on 9 real-world projects with 35k functions, and comparison with 6 state-of-the-art techniques show that PEM can achieve a precision of 96% with common settings, outperforming the baselines by 10-20%.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2023-08
issn 2331-8422
language eng
recordid cdi_proquest_journals_2859363326
source Free E- Journals
subjects Semantics
Similarity
Source programs
title PEM: Representing Binary Program Semantics for Similarity Analysis via a Probabilistic Execution Model
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T03%3A44%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=PEM:%20Representing%20Binary%20Program%20Semantics%20for%20Similarity%20Analysis%20via%20a%20Probabilistic%20Execution%20Model&rft.jtitle=arXiv.org&rft.au=Xu,%20Xiangzhe&rft.date=2023-08-30&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2859363326%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2859363326&rft_id=info:pmid/&rfr_iscdi=true