Extracting Protocol Format as State Machine via Controlled Static Loop Analysis
Reverse engineering of protocol message formats is critical for many security applications. Mainstream techniques use dynamic analysis and inherit its low-coverage problem -- the inferred message formats only reflect the features of their inputs. To achieve high coverage, we choose to use static ana...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Reverse engineering of protocol message formats is critical for many security
applications. Mainstream techniques use dynamic analysis and inherit its
low-coverage problem -- the inferred message formats only reflect the features
of their inputs. To achieve high coverage, we choose to use static analysis to
infer message formats from the implementation of protocol parsers. In this
work, we focus on a class of extremely challenging protocols whose formats are
described via constraint-enhanced regular expressions and parsed using
finite-state machines. Such state machines are often implemented as complicated
parsing loops, which are inherently difficult to analyze via conventional
static analysis. Our new technique extracts a state machine by regarding each
loop iteration as a state and the dependency between loop iterations as state
transitions. To achieve high, i.e., path-sensitive, precision but avoid path
explosion, the analysis is controlled to merge as many paths as possible based
on carefully-designed rules. The evaluation results show that we can infer a
state machine and, thus, the message formats, in five minutes with over 90%
precision and recall, far better than state of the art. We also applied the
state machines to enhance protocol fuzzers, which are improved by 20% to 230%
in terms of coverage and detect ten more zero-days compared to baselines. |
---|---|
DOI: | 10.48550/arxiv.2305.13483 |