Apache Spark Accelerated Deep Learning Inference for Large Scale Satellite Image Analytics

The shear volumes of data generated from earth observation and remote sensing technologies continue to make major impact; leaping key geospatial applications into the dual data and compute intensive era. As a consequence, this rapid advancement poses new computational and data processing challenges....

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2019-08
Hauptverfasser: Dalton Lunga, Gerrand, Jonathan, Yang, Hsiuhan Lexie, Layton, Christopher, Stewart, Robert
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Dalton Lunga
Gerrand, Jonathan
Yang, Hsiuhan Lexie
Layton, Christopher
Stewart, Robert
description The shear volumes of data generated from earth observation and remote sensing technologies continue to make major impact; leaping key geospatial applications into the dual data and compute intensive era. As a consequence, this rapid advancement poses new computational and data processing challenges. We implement a novel remote sensing data flow (RESFlow) for advanced machine learning and computing with massive amounts of remotely sensed imagery. The core contribution is partitioning massive amount of data based on the spectral and semantic characteristics for distributed imagery analysis. RESFlow takes advantage of both a unified analytics engine for large-scale data processing and the availability of modern computing hardware to harness the acceleration of deep learning inference on expansive remote sensing imagery. The framework incorporates a strategy to optimize resource utilization across multiple executors assigned to a single worker. We showcase its deployment across computationally and data-intensive on pixel-level labeling workloads. The pipeline invokes deep learning inference at three stages; during deep feature extraction, deep metric mapping, and deep semantic segmentation. The tasks impose compute intensive and GPU resource sharing challenges motivating for a parallelized pipeline for all execution steps. By taking advantage of Apache Spark, Nvidia DGX1, and DGX2 computing platforms, we demonstrate unprecedented compute speed-ups for deep learning inference on pixel labeling workloads; processing 21,028~Terrabytes of imagery data and delivering an output maps at area rate of 5.245sq.km/sec, amounting to 453,168 sq.km/day - reducing a 28 day workload to 21~hours.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2273028902</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2273028902</sourcerecordid><originalsourceid>FETCH-proquest_journals_22730289023</originalsourceid><addsrcrecordid>eNqNy70KwjAUBeAgCBbtOwScC_HG2joWf7DgppNLucTb2hrTmLSDb28GH8DlHDh8Z8IikHKV5GuAGYu974QQsMkgTWXEboVF9SB-seievFCKNDkc6M73RJafCZ1pTcNLU5Mjo4jXveNndE34KNQhg9a6HYiXLwxrYVB_hlb5BZvWqD3Fv56z5fFw3Z0S6_r3SH6oun50QfsKIJMC8q0A-Z_6ApdAQZ4</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2273028902</pqid></control><display><type>article</type><title>Apache Spark Accelerated Deep Learning Inference for Large Scale Satellite Image Analytics</title><source>Free E- Journals</source><creator>Dalton Lunga ; Gerrand, Jonathan ; Yang, Hsiuhan Lexie ; Layton, Christopher ; Stewart, Robert</creator><creatorcontrib>Dalton Lunga ; Gerrand, Jonathan ; Yang, Hsiuhan Lexie ; Layton, Christopher ; Stewart, Robert</creatorcontrib><description>The shear volumes of data generated from earth observation and remote sensing technologies continue to make major impact; leaping key geospatial applications into the dual data and compute intensive era. As a consequence, this rapid advancement poses new computational and data processing challenges. We implement a novel remote sensing data flow (RESFlow) for advanced machine learning and computing with massive amounts of remotely sensed imagery. The core contribution is partitioning massive amount of data based on the spectral and semantic characteristics for distributed imagery analysis. RESFlow takes advantage of both a unified analytics engine for large-scale data processing and the availability of modern computing hardware to harness the acceleration of deep learning inference on expansive remote sensing imagery. The framework incorporates a strategy to optimize resource utilization across multiple executors assigned to a single worker. We showcase its deployment across computationally and data-intensive on pixel-level labeling workloads. The pipeline invokes deep learning inference at three stages; during deep feature extraction, deep metric mapping, and deep semantic segmentation. The tasks impose compute intensive and GPU resource sharing challenges motivating for a parallelized pipeline for all execution steps. By taking advantage of Apache Spark, Nvidia DGX1, and DGX2 computing platforms, we demonstrate unprecedented compute speed-ups for deep learning inference on pixel labeling workloads; processing 21,028~Terrabytes of imagery data and delivering an output maps at area rate of 5.245sq.km/sec, amounting to 453,168 sq.km/day - reducing a 28 day workload to 21~hours.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Acceleration ; Analytics ; Computation ; Data processing ; Deep learning ; Detection ; Earth observations (from space) ; Feature extraction ; Image segmentation ; Inference ; Labeling ; Machine learning ; Mapping ; Parallel processing ; Pixels ; Remote observing ; Remote sensing ; Satellite imagery ; Semantic segmentation ; Semantics ; Workload ; Workloads</subject><ispartof>arXiv.org, 2019-08</ispartof><rights>2019. This work is published under http://creativecommons.org/publicdomain/zero/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Dalton Lunga</creatorcontrib><creatorcontrib>Gerrand, Jonathan</creatorcontrib><creatorcontrib>Yang, Hsiuhan Lexie</creatorcontrib><creatorcontrib>Layton, Christopher</creatorcontrib><creatorcontrib>Stewart, Robert</creatorcontrib><title>Apache Spark Accelerated Deep Learning Inference for Large Scale Satellite Image Analytics</title><title>arXiv.org</title><description>The shear volumes of data generated from earth observation and remote sensing technologies continue to make major impact; leaping key geospatial applications into the dual data and compute intensive era. As a consequence, this rapid advancement poses new computational and data processing challenges. We implement a novel remote sensing data flow (RESFlow) for advanced machine learning and computing with massive amounts of remotely sensed imagery. The core contribution is partitioning massive amount of data based on the spectral and semantic characteristics for distributed imagery analysis. RESFlow takes advantage of both a unified analytics engine for large-scale data processing and the availability of modern computing hardware to harness the acceleration of deep learning inference on expansive remote sensing imagery. The framework incorporates a strategy to optimize resource utilization across multiple executors assigned to a single worker. We showcase its deployment across computationally and data-intensive on pixel-level labeling workloads. The pipeline invokes deep learning inference at three stages; during deep feature extraction, deep metric mapping, and deep semantic segmentation. The tasks impose compute intensive and GPU resource sharing challenges motivating for a parallelized pipeline for all execution steps. By taking advantage of Apache Spark, Nvidia DGX1, and DGX2 computing platforms, we demonstrate unprecedented compute speed-ups for deep learning inference on pixel labeling workloads; processing 21,028~Terrabytes of imagery data and delivering an output maps at area rate of 5.245sq.km/sec, amounting to 453,168 sq.km/day - reducing a 28 day workload to 21~hours.</description><subject>Acceleration</subject><subject>Analytics</subject><subject>Computation</subject><subject>Data processing</subject><subject>Deep learning</subject><subject>Detection</subject><subject>Earth observations (from space)</subject><subject>Feature extraction</subject><subject>Image segmentation</subject><subject>Inference</subject><subject>Labeling</subject><subject>Machine learning</subject><subject>Mapping</subject><subject>Parallel processing</subject><subject>Pixels</subject><subject>Remote observing</subject><subject>Remote sensing</subject><subject>Satellite imagery</subject><subject>Semantic segmentation</subject><subject>Semantics</subject><subject>Workload</subject><subject>Workloads</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNy70KwjAUBeAgCBbtOwScC_HG2joWf7DgppNLucTb2hrTmLSDb28GH8DlHDh8Z8IikHKV5GuAGYu974QQsMkgTWXEboVF9SB-seievFCKNDkc6M73RJafCZ1pTcNLU5Mjo4jXveNndE34KNQhg9a6HYiXLwxrYVB_hlb5BZvWqD3Fv56z5fFw3Z0S6_r3SH6oun50QfsKIJMC8q0A-Z_6ApdAQZ4</recordid><startdate>20190808</startdate><enddate>20190808</enddate><creator>Dalton Lunga</creator><creator>Gerrand, Jonathan</creator><creator>Yang, Hsiuhan Lexie</creator><creator>Layton, Christopher</creator><creator>Stewart, Robert</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20190808</creationdate><title>Apache Spark Accelerated Deep Learning Inference for Large Scale Satellite Image Analytics</title><author>Dalton Lunga ; Gerrand, Jonathan ; Yang, Hsiuhan Lexie ; Layton, Christopher ; Stewart, Robert</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_22730289023</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Acceleration</topic><topic>Analytics</topic><topic>Computation</topic><topic>Data processing</topic><topic>Deep learning</topic><topic>Detection</topic><topic>Earth observations (from space)</topic><topic>Feature extraction</topic><topic>Image segmentation</topic><topic>Inference</topic><topic>Labeling</topic><topic>Machine learning</topic><topic>Mapping</topic><topic>Parallel processing</topic><topic>Pixels</topic><topic>Remote observing</topic><topic>Remote sensing</topic><topic>Satellite imagery</topic><topic>Semantic segmentation</topic><topic>Semantics</topic><topic>Workload</topic><topic>Workloads</topic><toplevel>online_resources</toplevel><creatorcontrib>Dalton Lunga</creatorcontrib><creatorcontrib>Gerrand, Jonathan</creatorcontrib><creatorcontrib>Yang, Hsiuhan Lexie</creatorcontrib><creatorcontrib>Layton, Christopher</creatorcontrib><creatorcontrib>Stewart, Robert</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dalton Lunga</au><au>Gerrand, Jonathan</au><au>Yang, Hsiuhan Lexie</au><au>Layton, Christopher</au><au>Stewart, Robert</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Apache Spark Accelerated Deep Learning Inference for Large Scale Satellite Image Analytics</atitle><jtitle>arXiv.org</jtitle><date>2019-08-08</date><risdate>2019</risdate><eissn>2331-8422</eissn><abstract>The shear volumes of data generated from earth observation and remote sensing technologies continue to make major impact; leaping key geospatial applications into the dual data and compute intensive era. As a consequence, this rapid advancement poses new computational and data processing challenges. We implement a novel remote sensing data flow (RESFlow) for advanced machine learning and computing with massive amounts of remotely sensed imagery. The core contribution is partitioning massive amount of data based on the spectral and semantic characteristics for distributed imagery analysis. RESFlow takes advantage of both a unified analytics engine for large-scale data processing and the availability of modern computing hardware to harness the acceleration of deep learning inference on expansive remote sensing imagery. The framework incorporates a strategy to optimize resource utilization across multiple executors assigned to a single worker. We showcase its deployment across computationally and data-intensive on pixel-level labeling workloads. The pipeline invokes deep learning inference at three stages; during deep feature extraction, deep metric mapping, and deep semantic segmentation. The tasks impose compute intensive and GPU resource sharing challenges motivating for a parallelized pipeline for all execution steps. By taking advantage of Apache Spark, Nvidia DGX1, and DGX2 computing platforms, we demonstrate unprecedented compute speed-ups for deep learning inference on pixel labeling workloads; processing 21,028~Terrabytes of imagery data and delivering an output maps at area rate of 5.245sq.km/sec, amounting to 453,168 sq.km/day - reducing a 28 day workload to 21~hours.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2019-08
issn 2331-8422
language eng
recordid cdi_proquest_journals_2273028902
source Free E- Journals
subjects Acceleration
Analytics
Computation
Data processing
Deep learning
Detection
Earth observations (from space)
Feature extraction
Image segmentation
Inference
Labeling
Machine learning
Mapping
Parallel processing
Pixels
Remote observing
Remote sensing
Satellite imagery
Semantic segmentation
Semantics
Workload
Workloads
title Apache Spark Accelerated Deep Learning Inference for Large Scale Satellite Image Analytics
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-10T06%3A29%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Apache%20Spark%20Accelerated%20Deep%20Learning%20Inference%20for%20Large%20Scale%20Satellite%20Image%20Analytics&rft.jtitle=arXiv.org&rft.au=Dalton%20Lunga&rft.date=2019-08-08&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2273028902%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2273028902&rft_id=info:pmid/&rfr_iscdi=true