Next-Generation Intermediate Representations for Binary Code Analysis

Many binary code analysis tools rely on intermediate representation (IR) derived from a binary code, instead of working directly with machine instructions. In this paper, we first consider binary code analysis problems that benefit from IR and compile a list of requirements that the IR suitable for...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Programming and computer software 2019-12, Vol.45 (7), p.424-437
Hauptverfasser: Solovev, M. A., Bakulin, M. G., Gorbachev, M. S., Manushin, D. V., Padaryan, V. A., Panasenko, S. S.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 437
container_issue 7
container_start_page 424
container_title Programming and computer software
container_volume 45
creator Solovev, M. A.
Bakulin, M. G.
Gorbachev, M. S.
Manushin, D. V.
Padaryan, V. A.
Panasenko, S. S.
description Many binary code analysis tools rely on intermediate representation (IR) derived from a binary code, instead of working directly with machine instructions. In this paper, we first consider binary code analysis problems that benefit from IR and compile a list of requirements that the IR suitable for solving these problems should meet. Generally speaking, a universal binary analysis platform requires two principal components. The first component is a retargetable instruction decoder that utilizes external specifications to describe target instruction sets. External specifications facilitate maintenance and allow one to quickly implement support for new instruction sets. We analyze some of the most popular instruction set architectures (ISAs), including those used in microcontrollers, and from that compile a list of requirements for the retargetable decoder. We then overview existing multi-ISA decoders and propose our vision of a more generic approach, based on a multi-layer directed acyclic graph that describes the decoding process in universal terms. The second component of the analysis platform is the actual architecture-neutral IR. In this paper, we describe such IRs and propose Pivot 2, an IR that is low-level enough to be easily constructed from decoded machine instructions, also being easy to analyze. The main features of Pivot 2 are explicit side effects, SSA variables, simpler alternative to phi-functions, and extensible elementary operation set at the core. This IR also supports machines that have multiple memory address spaces. Finally, we propose a way to tie the decoder and the IR together to fit them to most of the binary code analysis tasks through abstract interpretation on top of the IR. The proposed scheme takes into account various aspects of target architectures that are overlooked in many other works, including pipeline specifics (handling of delay slots, hardware loop support, etc.), exception and interrupt management, and generic address space model, in which accesses may have arbitrary side effects due to memory-mapped devices or other non-trivial behavior of the memory system.
doi_str_mv 10.1134/S0361768819070107
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2918495808</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2918495808</sourcerecordid><originalsourceid>FETCH-LOGICAL-c316t-ad314662f6d6e4db4b465ebb7f17a0555623e165686fbab09a063fb7fa2a44213</originalsourceid><addsrcrecordid>eNp1kE1LAzEURYMoWKs_wN2A69G8JPMms6yl1kJR8GM9ZDovMqXN1CQF---bWsGFuHqLc-7l8hi7Bn4LINXdK5cIJWoNFS858PKEDQC5zqVAOGWDA84P_JxdhLDkSeFKDdjkib5iPiVH3sSud9nMRfJrajsTKXuhjadALn6zkNneZ_edM36XjfuWspEzq13owiU7s2YV6OrnDtn7w-Rt_JjPn6ez8WieLyRgzE0rQSEKiy2SahvVKCyoaUoLpeFFUaCQBFigRtuYhleGo7QJG2GUEiCH7ObYu_H955ZCrJf91qcRoRYVaFUVmutkwdFa-D4ET7be-G6dRtfA68O36j_fShlxzITkug_yv83_h_Y9Lmsy</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2918495808</pqid></control><display><type>article</type><title>Next-Generation Intermediate Representations for Binary Code Analysis</title><source>SpringerLink Journals</source><source>ProQuest Central</source><creator>Solovev, M. A. ; Bakulin, M. G. ; Gorbachev, M. S. ; Manushin, D. V. ; Padaryan, V. A. ; Panasenko, S. S.</creator><creatorcontrib>Solovev, M. A. ; Bakulin, M. G. ; Gorbachev, M. S. ; Manushin, D. V. ; Padaryan, V. A. ; Panasenko, S. S.</creatorcontrib><description>Many binary code analysis tools rely on intermediate representation (IR) derived from a binary code, instead of working directly with machine instructions. In this paper, we first consider binary code analysis problems that benefit from IR and compile a list of requirements that the IR suitable for solving these problems should meet. Generally speaking, a universal binary analysis platform requires two principal components. The first component is a retargetable instruction decoder that utilizes external specifications to describe target instruction sets. External specifications facilitate maintenance and allow one to quickly implement support for new instruction sets. We analyze some of the most popular instruction set architectures (ISAs), including those used in microcontrollers, and from that compile a list of requirements for the retargetable decoder. We then overview existing multi-ISA decoders and propose our vision of a more generic approach, based on a multi-layer directed acyclic graph that describes the decoding process in universal terms. The second component of the analysis platform is the actual architecture-neutral IR. In this paper, we describe such IRs and propose Pivot 2, an IR that is low-level enough to be easily constructed from decoded machine instructions, also being easy to analyze. The main features of Pivot 2 are explicit side effects, SSA variables, simpler alternative to phi-functions, and extensible elementary operation set at the core. This IR also supports machines that have multiple memory address spaces. Finally, we propose a way to tie the decoder and the IR together to fit them to most of the binary code analysis tasks through abstract interpretation on top of the IR. The proposed scheme takes into account various aspects of target architectures that are overlooked in many other works, including pipeline specifics (handling of delay slots, hardware loop support, etc.), exception and interrupt management, and generic address space model, in which accesses may have arbitrary side effects due to memory-mapped devices or other non-trivial behavior of the memory system.</description><identifier>ISSN: 0361-7688</identifier><identifier>EISSN: 1608-3261</identifier><identifier>DOI: 10.1134/S0361768819070107</identifier><language>eng</language><publisher>Moscow: Pleiades Publishing</publisher><subject>Artificial Intelligence ; Automation ; Binary codes ; Code reuse ; Computer Science ; Debugging ; Decoders ; Decoding ; Memory devices ; Multilayers ; Operating Systems ; Programming languages ; Representations ; Semantics ; Side effects ; Software Engineering ; Software Engineering/Programming and Operating Systems ; Software utilities ; Specifications</subject><ispartof>Programming and computer software, 2019-12, Vol.45 (7), p.424-437</ispartof><rights>Pleiades Publishing, Ltd. 2019</rights><rights>Pleiades Publishing, Ltd. 2019.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c316t-ad314662f6d6e4db4b465ebb7f17a0555623e165686fbab09a063fb7fa2a44213</citedby><cites>FETCH-LOGICAL-c316t-ad314662f6d6e4db4b465ebb7f17a0555623e165686fbab09a063fb7fa2a44213</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1134/S0361768819070107$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2918495808?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,776,780,21369,27903,27904,33723,41467,42536,43784,51297</link.rule.ids></links><search><creatorcontrib>Solovev, M. A.</creatorcontrib><creatorcontrib>Bakulin, M. G.</creatorcontrib><creatorcontrib>Gorbachev, M. S.</creatorcontrib><creatorcontrib>Manushin, D. V.</creatorcontrib><creatorcontrib>Padaryan, V. A.</creatorcontrib><creatorcontrib>Panasenko, S. S.</creatorcontrib><title>Next-Generation Intermediate Representations for Binary Code Analysis</title><title>Programming and computer software</title><addtitle>Program Comput Soft</addtitle><description>Many binary code analysis tools rely on intermediate representation (IR) derived from a binary code, instead of working directly with machine instructions. In this paper, we first consider binary code analysis problems that benefit from IR and compile a list of requirements that the IR suitable for solving these problems should meet. Generally speaking, a universal binary analysis platform requires two principal components. The first component is a retargetable instruction decoder that utilizes external specifications to describe target instruction sets. External specifications facilitate maintenance and allow one to quickly implement support for new instruction sets. We analyze some of the most popular instruction set architectures (ISAs), including those used in microcontrollers, and from that compile a list of requirements for the retargetable decoder. We then overview existing multi-ISA decoders and propose our vision of a more generic approach, based on a multi-layer directed acyclic graph that describes the decoding process in universal terms. The second component of the analysis platform is the actual architecture-neutral IR. In this paper, we describe such IRs and propose Pivot 2, an IR that is low-level enough to be easily constructed from decoded machine instructions, also being easy to analyze. The main features of Pivot 2 are explicit side effects, SSA variables, simpler alternative to phi-functions, and extensible elementary operation set at the core. This IR also supports machines that have multiple memory address spaces. Finally, we propose a way to tie the decoder and the IR together to fit them to most of the binary code analysis tasks through abstract interpretation on top of the IR. The proposed scheme takes into account various aspects of target architectures that are overlooked in many other works, including pipeline specifics (handling of delay slots, hardware loop support, etc.), exception and interrupt management, and generic address space model, in which accesses may have arbitrary side effects due to memory-mapped devices or other non-trivial behavior of the memory system.</description><subject>Artificial Intelligence</subject><subject>Automation</subject><subject>Binary codes</subject><subject>Code reuse</subject><subject>Computer Science</subject><subject>Debugging</subject><subject>Decoders</subject><subject>Decoding</subject><subject>Memory devices</subject><subject>Multilayers</subject><subject>Operating Systems</subject><subject>Programming languages</subject><subject>Representations</subject><subject>Semantics</subject><subject>Side effects</subject><subject>Software Engineering</subject><subject>Software Engineering/Programming and Operating Systems</subject><subject>Software utilities</subject><subject>Specifications</subject><issn>0361-7688</issn><issn>1608-3261</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNp1kE1LAzEURYMoWKs_wN2A69G8JPMms6yl1kJR8GM9ZDovMqXN1CQF---bWsGFuHqLc-7l8hi7Bn4LINXdK5cIJWoNFS858PKEDQC5zqVAOGWDA84P_JxdhLDkSeFKDdjkib5iPiVH3sSud9nMRfJrajsTKXuhjadALn6zkNneZ_edM36XjfuWspEzq13owiU7s2YV6OrnDtn7w-Rt_JjPn6ez8WieLyRgzE0rQSEKiy2SahvVKCyoaUoLpeFFUaCQBFigRtuYhleGo7QJG2GUEiCH7ObYu_H955ZCrJf91qcRoRYVaFUVmutkwdFa-D4ET7be-G6dRtfA68O36j_fShlxzITkug_yv83_h_Y9Lmsy</recordid><startdate>20191201</startdate><enddate>20191201</enddate><creator>Solovev, M. A.</creator><creator>Bakulin, M. G.</creator><creator>Gorbachev, M. S.</creator><creator>Manushin, D. V.</creator><creator>Padaryan, V. A.</creator><creator>Panasenko, S. S.</creator><general>Pleiades Publishing</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope></search><sort><creationdate>20191201</creationdate><title>Next-Generation Intermediate Representations for Binary Code Analysis</title><author>Solovev, M. A. ; Bakulin, M. G. ; Gorbachev, M. S. ; Manushin, D. V. ; Padaryan, V. A. ; Panasenko, S. S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c316t-ad314662f6d6e4db4b465ebb7f17a0555623e165686fbab09a063fb7fa2a44213</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Artificial Intelligence</topic><topic>Automation</topic><topic>Binary codes</topic><topic>Code reuse</topic><topic>Computer Science</topic><topic>Debugging</topic><topic>Decoders</topic><topic>Decoding</topic><topic>Memory devices</topic><topic>Multilayers</topic><topic>Operating Systems</topic><topic>Programming languages</topic><topic>Representations</topic><topic>Semantics</topic><topic>Side effects</topic><topic>Software Engineering</topic><topic>Software Engineering/Programming and Operating Systems</topic><topic>Software utilities</topic><topic>Specifications</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Solovev, M. A.</creatorcontrib><creatorcontrib>Bakulin, M. G.</creatorcontrib><creatorcontrib>Gorbachev, M. S.</creatorcontrib><creatorcontrib>Manushin, D. V.</creatorcontrib><creatorcontrib>Padaryan, V. A.</creatorcontrib><creatorcontrib>Panasenko, S. S.</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><jtitle>Programming and computer software</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Solovev, M. A.</au><au>Bakulin, M. G.</au><au>Gorbachev, M. S.</au><au>Manushin, D. V.</au><au>Padaryan, V. A.</au><au>Panasenko, S. S.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Next-Generation Intermediate Representations for Binary Code Analysis</atitle><jtitle>Programming and computer software</jtitle><stitle>Program Comput Soft</stitle><date>2019-12-01</date><risdate>2019</risdate><volume>45</volume><issue>7</issue><spage>424</spage><epage>437</epage><pages>424-437</pages><issn>0361-7688</issn><eissn>1608-3261</eissn><abstract>Many binary code analysis tools rely on intermediate representation (IR) derived from a binary code, instead of working directly with machine instructions. In this paper, we first consider binary code analysis problems that benefit from IR and compile a list of requirements that the IR suitable for solving these problems should meet. Generally speaking, a universal binary analysis platform requires two principal components. The first component is a retargetable instruction decoder that utilizes external specifications to describe target instruction sets. External specifications facilitate maintenance and allow one to quickly implement support for new instruction sets. We analyze some of the most popular instruction set architectures (ISAs), including those used in microcontrollers, and from that compile a list of requirements for the retargetable decoder. We then overview existing multi-ISA decoders and propose our vision of a more generic approach, based on a multi-layer directed acyclic graph that describes the decoding process in universal terms. The second component of the analysis platform is the actual architecture-neutral IR. In this paper, we describe such IRs and propose Pivot 2, an IR that is low-level enough to be easily constructed from decoded machine instructions, also being easy to analyze. The main features of Pivot 2 are explicit side effects, SSA variables, simpler alternative to phi-functions, and extensible elementary operation set at the core. This IR also supports machines that have multiple memory address spaces. Finally, we propose a way to tie the decoder and the IR together to fit them to most of the binary code analysis tasks through abstract interpretation on top of the IR. The proposed scheme takes into account various aspects of target architectures that are overlooked in many other works, including pipeline specifics (handling of delay slots, hardware loop support, etc.), exception and interrupt management, and generic address space model, in which accesses may have arbitrary side effects due to memory-mapped devices or other non-trivial behavior of the memory system.</abstract><cop>Moscow</cop><pub>Pleiades Publishing</pub><doi>10.1134/S0361768819070107</doi><tpages>14</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0361-7688
ispartof Programming and computer software, 2019-12, Vol.45 (7), p.424-437
issn 0361-7688
1608-3261
language eng
recordid cdi_proquest_journals_2918495808
source SpringerLink Journals; ProQuest Central
subjects Artificial Intelligence
Automation
Binary codes
Code reuse
Computer Science
Debugging
Decoders
Decoding
Memory devices
Multilayers
Operating Systems
Programming languages
Representations
Semantics
Side effects
Software Engineering
Software Engineering/Programming and Operating Systems
Software utilities
Specifications
title Next-Generation Intermediate Representations for Binary Code Analysis
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T04%3A48%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Next-Generation%20Intermediate%20Representations%20for%20Binary%20Code%20Analysis&rft.jtitle=Programming%20and%20computer%20software&rft.au=Solovev,%20M.%20A.&rft.date=2019-12-01&rft.volume=45&rft.issue=7&rft.spage=424&rft.epage=437&rft.pages=424-437&rft.issn=0361-7688&rft.eissn=1608-3261&rft_id=info:doi/10.1134/S0361768819070107&rft_dat=%3Cproquest_cross%3E2918495808%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2918495808&rft_id=info:pmid/&rfr_iscdi=true