Apache Spark Accelerated Deep Learning Inference for Large Scale Satellite Image Analytics
The shear volumes of data generated from earth observation and remote sensing technologies continue to make major impact; leaping key geospatial applications into the dual data and compute intensive era. As a consequence, this rapid advancement poses new computational and data processing challenges....
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The shear volumes of data generated from earth observation and remote sensing
technologies continue to make major impact; leaping key geospatial applications
into the dual data and compute intensive era. As a consequence, this rapid
advancement poses new computational and data processing challenges. We
implement a novel remote sensing data flow (RESFlow) for advanced machine
learning and computing with massive amounts of remotely sensed imagery. The
core contribution is partitioning massive amount of data based on the spectral
and semantic characteristics for distributed imagery analysis. RESFlow takes
advantage of both a unified analytics engine for large-scale data processing
and the availability of modern computing hardware to harness the acceleration
of deep learning inference on expansive remote sensing imagery. The framework
incorporates a strategy to optimize resource utilization across multiple
executors assigned to a single worker. We showcase its deployment across
computationally and data-intensive on pixel-level labeling workloads. The
pipeline invokes deep learning inference at three stages; during deep feature
extraction, deep metric mapping, and deep semantic segmentation. The tasks
impose compute intensive and GPU resource sharing challenges motivating for a
parallelized pipeline for all execution steps. By taking advantage of Apache
Spark, Nvidia DGX1, and DGX2 computing platforms, we demonstrate unprecedented
compute speed-ups for deep learning inference on pixel labeling workloads;
processing 21,028~Terrabytes of imagery data and delivering an output maps at
area rate of 5.245sq.km/sec, amounting to 453,168 sq.km/day - reducing a 28 day
workload to 21~hours. |
---|---|
DOI: | 10.48550/arxiv.1908.04383 |