Scalably Detecting Third-Party Android Libraries With Two-Stage Bloom Filtering

Third-party library (TPL) detection is important for Android app security analysis nowadays. Unfortunately, the existing techniques often suffer from poor scalability. In some situations, the detection time cost is even unacceptable. Although a few existing methods run relatively fast, they cannot p...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on software engineering 2023-04, Vol.49 (4), p.1-14
Hauptverfasser: Huang, Jianjun, Xue, Bo, Jiang, Jiasheng, You, Wei, Liang, Bin, Wu, Jingzheng, Wu, Yanjun
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Third-party library (TPL) detection is important for Android app security analysis nowadays. Unfortunately, the existing techniques often suffer from poor scalability. In some situations, the detection time cost is even unacceptable. Although a few existing methods run relatively fast, they cannot provide enough effectiveness, especially for non-structure-preserving obfuscated apps, e.g., repackaged and flattened. In this paper, we treat TPLs detection as a set inclusion problem to effectively and efficiently analyze obfuscated apps, and develop a scalable two-stage detection approach, Libloom . Specifically, the package and class signatures are encoded into two levels of Bloom filters respectively. At the first stage, the package filters are used to identify a limited number of candidate TPLs via set overlapping measurement to avoid unnecessary class-level set analysis. Subsequently, with the class filters, a similarity score is computed between the query app and each candidate to detect the integrated TPLs, and a novel entropy-based metric is presented to specially handle the repackaged and flattened apps. We have evaluated Libloom on some large-scale benchmarks involving tens of thousands of TPL instances. The experiment results demonstrate that Libloom outperforms state-of-the-art tools in both effectiveness and efficiency. Especially, the proposed two-stage method can run about ten times faster than the straightforward class-level analysis on flattened apps, and without loss of accuracy.
ISSN:0098-5589
1939-3520
DOI:10.1109/TSE.2022.3215628