Sandslash: A Two-Level Framework for Efficient Graph Pattern Mining
Graph pattern mining (GPM) is used in diverse application areas including social network analysis, bioinformatics, and chemical engineering. Existing GPM frameworks either provide high-level interfaces for productivity at the cost of expressiveness or provide low-level interfaces that can express a...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Graph pattern mining (GPM) is used in diverse application areas including
social network analysis, bioinformatics, and chemical engineering. Existing GPM
frameworks either provide high-level interfaces for productivity at the cost of
expressiveness or provide low-level interfaces that can express a wide variety
of GPM algorithms at the cost of increased programming complexity. Moreover,
existing systems lack the flexibility to explore combinations of optimizations
to achieve performance competitive with hand-optimized applications.
We present Sandslash, an in-memory Graph Pattern Mining (GPM) framework that
uses a novel programming interface to support productive, expressive, and
efficient GPM on large graphs. Sandslash provides a high-level API that needs
only a specification of the GPM problem, and it implements fast subgraph
enumeration, provides efficient data structures, and applies high-level
optimizations automatically. To achieve performance competitive with
expert-optimized implementations, Sandslash also provides a low-level API that
allows users to express algorithm-specific optimizations. This enables
Sandslash to support both high-productivity and high-efficiency without losing
expressiveness. We evaluate Sandslash on shared-memory machines using five GPM
applications and a wide range of large real-world graphs. Experimental results
demonstrate that applications written using Sandslash high-level or low-level
API outperforms state-of-the-art GPM systems AutoMine, Pangolin, and Peregrine
on average by 13.8x, 7.9x, and 5.4x, respectively. We also show that these
Sandslash applications outperform expert-optimized GPM implementations by 2.3x
on average with less programming effort. |
---|---|
DOI: | 10.48550/arxiv.2011.03135 |