A semantics aware approach to automated reverse engineering unknown protocols

Extracting the protocol message format specifications of unknown applications from network traces is important for a variety of applications such as application protocol parsing, vulnerability discovery, and system integration. In this paper, we propose ProDecoder, a network trace based protocol mes...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Yipeng Wang, Xiaochun Yun, Shafiq, M. Z., Liyan Wang, Liu, A. X., Zhibin Zhang, Danfeng Yao, Yongzheng Zhang, Li Guo
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 10
container_issue
container_start_page 1
container_title
container_volume
creator Yipeng Wang
Xiaochun Yun
Shafiq, M. Z.
Liyan Wang
Liu, A. X.
Zhibin Zhang
Danfeng Yao
Yongzheng Zhang
Li Guo
description Extracting the protocol message format specifications of unknown applications from network traces is important for a variety of applications such as application protocol parsing, vulnerability discovery, and system integration. In this paper, we propose ProDecoder, a network trace based protocol message format inference system that exploits the semantics of protocol messages without the executable code of application protocols. ProDecoder is based on the key insight that the n-grams of protocol traces exhibit highly skewed frequency distribution that can be leveraged for accurate protocol message format inference. In ProDecoder, we first discover the latent relationship among n-grams by first grouping protocol messages with the same semantics and then inferring message formats by keyword based clustering and cluster sequence alignment. We implemented and evaluated ProDecoder to infer message format specifications of SMB (a binary protocol) and SMTP (a textual protocol). Our experimental results show that ProDecoder accurately parses and infers SMB protocol with 100% precision and recall. For SMTP, ProDecoder achieves approximately 95% precision and recall.
doi_str_mv 10.1109/ICNP.2012.6459963
format Conference Proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_6459963</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6459963</ieee_id><sourcerecordid>6459963</sourcerecordid><originalsourceid>FETCH-LOGICAL-c223t-6877ecbca3a82a8b784ebd7de6902610b38ce5a43bb952b7bacdd1ac1a3bb3bb3</originalsourceid><addsrcrecordid>eNpFUFtLwzAYjTewzv0A8SV_oDW3Ju3jGE4H8_Kgz-NL-jmra1KSzuG_t-JAOHDg3B4OIVecFZyz-mY5f3wuBOOi0Kqsay2PyAVX2kihlDHHJBNayVxKJk_-jZKdkmxsi5xrVZ2TaUofjDHOpNK6zMjDjCbswA-tSxT2EJFC38cA7p0OgcJuCB0M2NCIXxgTUvSb1iPG1m_ozn_6sPd0zA_BhW26JGdvsE04PfCEvC5uX-b3-erpbjmfrXInhBxyXRmDzjqQUAmorKkU2sY0qGsmNGdWVg5LUNLauhTWWHBNw8FxGJVfTMj1326LiOs-th3E7_XhFvkDuRtUbA</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>A semantics aware approach to automated reverse engineering unknown protocols</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Yipeng Wang ; Xiaochun Yun ; Shafiq, M. Z. ; Liyan Wang ; Liu, A. X. ; Zhibin Zhang ; Danfeng Yao ; Yongzheng Zhang ; Li Guo</creator><creatorcontrib>Yipeng Wang ; Xiaochun Yun ; Shafiq, M. Z. ; Liyan Wang ; Liu, A. X. ; Zhibin Zhang ; Danfeng Yao ; Yongzheng Zhang ; Li Guo</creatorcontrib><description>Extracting the protocol message format specifications of unknown applications from network traces is important for a variety of applications such as application protocol parsing, vulnerability discovery, and system integration. In this paper, we propose ProDecoder, a network trace based protocol message format inference system that exploits the semantics of protocol messages without the executable code of application protocols. ProDecoder is based on the key insight that the n-grams of protocol traces exhibit highly skewed frequency distribution that can be leveraged for accurate protocol message format inference. In ProDecoder, we first discover the latent relationship among n-grams by first grouping protocol messages with the same semantics and then inferring message formats by keyword based clustering and cluster sequence alignment. We implemented and evaluated ProDecoder to infer message format specifications of SMB (a binary protocol) and SMTP (a textual protocol). Our experimental results show that ProDecoder accurately parses and infers SMB protocol with 100% precision and recall. For SMTP, ProDecoder achieves approximately 95% precision and recall.</description><identifier>ISSN: 1092-1648</identifier><identifier>ISBN: 1467324450</identifier><identifier>ISBN: 9781467324458</identifier><identifier>EISSN: 2643-3303</identifier><identifier>EISBN: 1467324477</identifier><identifier>EISBN: 9781467324472</identifier><identifier>EISBN: 9781467324465</identifier><identifier>EISBN: 1467324469</identifier><identifier>DOI: 10.1109/ICNP.2012.6459963</identifier><language>eng</language><publisher>IEEE</publisher><subject>Electronic mail ; Natural language processing ; Postal services ; Protocols ; Reverse engineering ; Semantics ; Vectors</subject><ispartof>2012 20th IEEE International Conference on Network Protocols (ICNP), 2012, p.1-10</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c223t-6877ecbca3a82a8b784ebd7de6902610b38ce5a43bb952b7bacdd1ac1a3bb3bb3</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6459963$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2051,27904,54898</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6459963$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Yipeng Wang</creatorcontrib><creatorcontrib>Xiaochun Yun</creatorcontrib><creatorcontrib>Shafiq, M. Z.</creatorcontrib><creatorcontrib>Liyan Wang</creatorcontrib><creatorcontrib>Liu, A. X.</creatorcontrib><creatorcontrib>Zhibin Zhang</creatorcontrib><creatorcontrib>Danfeng Yao</creatorcontrib><creatorcontrib>Yongzheng Zhang</creatorcontrib><creatorcontrib>Li Guo</creatorcontrib><title>A semantics aware approach to automated reverse engineering unknown protocols</title><title>2012 20th IEEE International Conference on Network Protocols (ICNP)</title><addtitle>ICNP</addtitle><description>Extracting the protocol message format specifications of unknown applications from network traces is important for a variety of applications such as application protocol parsing, vulnerability discovery, and system integration. In this paper, we propose ProDecoder, a network trace based protocol message format inference system that exploits the semantics of protocol messages without the executable code of application protocols. ProDecoder is based on the key insight that the n-grams of protocol traces exhibit highly skewed frequency distribution that can be leveraged for accurate protocol message format inference. In ProDecoder, we first discover the latent relationship among n-grams by first grouping protocol messages with the same semantics and then inferring message formats by keyword based clustering and cluster sequence alignment. We implemented and evaluated ProDecoder to infer message format specifications of SMB (a binary protocol) and SMTP (a textual protocol). Our experimental results show that ProDecoder accurately parses and infers SMB protocol with 100% precision and recall. For SMTP, ProDecoder achieves approximately 95% precision and recall.</description><subject>Electronic mail</subject><subject>Natural language processing</subject><subject>Postal services</subject><subject>Protocols</subject><subject>Reverse engineering</subject><subject>Semantics</subject><subject>Vectors</subject><issn>1092-1648</issn><issn>2643-3303</issn><isbn>1467324450</isbn><isbn>9781467324458</isbn><isbn>1467324477</isbn><isbn>9781467324472</isbn><isbn>9781467324465</isbn><isbn>1467324469</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2012</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNpFUFtLwzAYjTewzv0A8SV_oDW3Ju3jGE4H8_Kgz-NL-jmra1KSzuG_t-JAOHDg3B4OIVecFZyz-mY5f3wuBOOi0Kqsay2PyAVX2kihlDHHJBNayVxKJk_-jZKdkmxsi5xrVZ2TaUofjDHOpNK6zMjDjCbswA-tSxT2EJFC38cA7p0OgcJuCB0M2NCIXxgTUvSb1iPG1m_ozn_6sPd0zA_BhW26JGdvsE04PfCEvC5uX-b3-erpbjmfrXInhBxyXRmDzjqQUAmorKkU2sY0qGsmNGdWVg5LUNLauhTWWHBNw8FxGJVfTMj1326LiOs-th3E7_XhFvkDuRtUbA</recordid><startdate>201210</startdate><enddate>201210</enddate><creator>Yipeng Wang</creator><creator>Xiaochun Yun</creator><creator>Shafiq, M. Z.</creator><creator>Liyan Wang</creator><creator>Liu, A. X.</creator><creator>Zhibin Zhang</creator><creator>Danfeng Yao</creator><creator>Yongzheng Zhang</creator><creator>Li Guo</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201210</creationdate><title>A semantics aware approach to automated reverse engineering unknown protocols</title><author>Yipeng Wang ; Xiaochun Yun ; Shafiq, M. Z. ; Liyan Wang ; Liu, A. X. ; Zhibin Zhang ; Danfeng Yao ; Yongzheng Zhang ; Li Guo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c223t-6877ecbca3a82a8b784ebd7de6902610b38ce5a43bb952b7bacdd1ac1a3bb3bb3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Electronic mail</topic><topic>Natural language processing</topic><topic>Postal services</topic><topic>Protocols</topic><topic>Reverse engineering</topic><topic>Semantics</topic><topic>Vectors</topic><toplevel>online_resources</toplevel><creatorcontrib>Yipeng Wang</creatorcontrib><creatorcontrib>Xiaochun Yun</creatorcontrib><creatorcontrib>Shafiq, M. Z.</creatorcontrib><creatorcontrib>Liyan Wang</creatorcontrib><creatorcontrib>Liu, A. X.</creatorcontrib><creatorcontrib>Zhibin Zhang</creatorcontrib><creatorcontrib>Danfeng Yao</creatorcontrib><creatorcontrib>Yongzheng Zhang</creatorcontrib><creatorcontrib>Li Guo</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Yipeng Wang</au><au>Xiaochun Yun</au><au>Shafiq, M. Z.</au><au>Liyan Wang</au><au>Liu, A. X.</au><au>Zhibin Zhang</au><au>Danfeng Yao</au><au>Yongzheng Zhang</au><au>Li Guo</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>A semantics aware approach to automated reverse engineering unknown protocols</atitle><btitle>2012 20th IEEE International Conference on Network Protocols (ICNP)</btitle><stitle>ICNP</stitle><date>2012-10</date><risdate>2012</risdate><spage>1</spage><epage>10</epage><pages>1-10</pages><issn>1092-1648</issn><eissn>2643-3303</eissn><isbn>1467324450</isbn><isbn>9781467324458</isbn><eisbn>1467324477</eisbn><eisbn>9781467324472</eisbn><eisbn>9781467324465</eisbn><eisbn>1467324469</eisbn><abstract>Extracting the protocol message format specifications of unknown applications from network traces is important for a variety of applications such as application protocol parsing, vulnerability discovery, and system integration. In this paper, we propose ProDecoder, a network trace based protocol message format inference system that exploits the semantics of protocol messages without the executable code of application protocols. ProDecoder is based on the key insight that the n-grams of protocol traces exhibit highly skewed frequency distribution that can be leveraged for accurate protocol message format inference. In ProDecoder, we first discover the latent relationship among n-grams by first grouping protocol messages with the same semantics and then inferring message formats by keyword based clustering and cluster sequence alignment. We implemented and evaluated ProDecoder to infer message format specifications of SMB (a binary protocol) and SMTP (a textual protocol). Our experimental results show that ProDecoder accurately parses and infers SMB protocol with 100% precision and recall. For SMTP, ProDecoder achieves approximately 95% precision and recall.</abstract><pub>IEEE</pub><doi>10.1109/ICNP.2012.6459963</doi><tpages>10</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1092-1648
ispartof 2012 20th IEEE International Conference on Network Protocols (ICNP), 2012, p.1-10
issn 1092-1648
2643-3303
language eng
recordid cdi_ieee_primary_6459963
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Electronic mail
Natural language processing
Postal services
Protocols
Reverse engineering
Semantics
Vectors
title A semantics aware approach to automated reverse engineering unknown protocols
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T19%3A27%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=A%20semantics%20aware%20approach%20to%20automated%20reverse%20engineering%20unknown%20protocols&rft.btitle=2012%2020th%20IEEE%20International%20Conference%20on%20Network%20Protocols%20(ICNP)&rft.au=Yipeng%20Wang&rft.date=2012-10&rft.spage=1&rft.epage=10&rft.pages=1-10&rft.issn=1092-1648&rft.eissn=2643-3303&rft.isbn=1467324450&rft.isbn_list=9781467324458&rft_id=info:doi/10.1109/ICNP.2012.6459963&rft_dat=%3Cieee_6IE%3E6459963%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=1467324477&rft.eisbn_list=9781467324472&rft.eisbn_list=9781467324465&rft.eisbn_list=1467324469&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6459963&rfr_iscdi=true