Accelerating Statewide Connected Vehicles Big (Sensor Fusion) Data ETL Pipelines on GPUs
Real-time traffic and sensor data from connected vehicles have the potential to provide insights that will lead to the immediate benefit of efficient management of the transportation infrastructure and related adjacent services. However, the growth of electric vehicles (EVs) and connected vehicles (...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Mussah, Abdul Rashid Shoman, Maged Amo-Boateng, Mark Adu-Gyamfi, Yaw |
description | Real-time traffic and sensor data from connected vehicles have the potential
to provide insights that will lead to the immediate benefit of efficient
management of the transportation infrastructure and related adjacent services.
However, the growth of electric vehicles (EVs) and connected vehicles (CVs) has
generated an abundance of CV data and sensor data that has put a strain on the
processing capabilities of existing data center infrastructure. As a result,
the benefits are either delayed or not fully realized. To address this issue,
we propose a solution for processing state-wide CV traffic and sensor data on
GPUs that provides real-time micro-scale insights in both temporal and spatial
dimensions. This is achieved through the use of the Nvidia Rapids framework and
the Dask parallel cluster in Python. Our findings demonstrate a 70x
acceleration in the extraction, transformation, and loading (ETL) of CV data
for the State of Missouri for a full day of all unique CV journeys, reducing
the processing time from approximately 48 hours to just 25 minutes. Given that
these results are for thousands of CVs and several thousands of individual
journeys with sub-second sensor data, implies that we can model and obtain
actionable insights for the management of the transportation infrastructure. |
doi_str_mv | 10.48550/arxiv.2305.07454 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2305_07454</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2305_07454</sourcerecordid><originalsourceid>FETCH-LOGICAL-a674-75f7bccd1b0eb5143cd177dd71815a2ccd763cfbd573e28e0d2db1e3c524e3583</originalsourceid><addsrcrecordid>eNotjztPwzAYRb0woMIPYKpHGBL8rLOW0BakSFRqitgiP74US8GpbPP694TCdK90j650ELqipBSVlORWxy__UTJOZEmUkOIcvSythQGizj4c8C7rDJ_eAa7HEMBmcPgZXr0dIOE7f8DXOwhpjHj9nvwYbvC9zhqv2gZv_REGHyZsDHiz3acLdNbrIcHlf85Qu1619UPRPG0e62VT6IUShZK9MtY6aggYSQWfqlLOKVpRqdm0qAW3vXFScWAVEMecocCtZAK4rPgMzf9uT2rdMfo3Hb-7X8XupMh_AF04S7c</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Accelerating Statewide Connected Vehicles Big (Sensor Fusion) Data ETL Pipelines on GPUs</title><source>arXiv.org</source><creator>Mussah, Abdul Rashid ; Shoman, Maged ; Amo-Boateng, Mark ; Adu-Gyamfi, Yaw</creator><creatorcontrib>Mussah, Abdul Rashid ; Shoman, Maged ; Amo-Boateng, Mark ; Adu-Gyamfi, Yaw</creatorcontrib><description>Real-time traffic and sensor data from connected vehicles have the potential
to provide insights that will lead to the immediate benefit of efficient
management of the transportation infrastructure and related adjacent services.
However, the growth of electric vehicles (EVs) and connected vehicles (CVs) has
generated an abundance of CV data and sensor data that has put a strain on the
processing capabilities of existing data center infrastructure. As a result,
the benefits are either delayed or not fully realized. To address this issue,
we propose a solution for processing state-wide CV traffic and sensor data on
GPUs that provides real-time micro-scale insights in both temporal and spatial
dimensions. This is achieved through the use of the Nvidia Rapids framework and
the Dask parallel cluster in Python. Our findings demonstrate a 70x
acceleration in the extraction, transformation, and loading (ETL) of CV data
for the State of Missouri for a full day of all unique CV journeys, reducing
the processing time from approximately 48 hours to just 25 minutes. Given that
these results are for thousands of CVs and several thousands of individual
journeys with sub-second sensor data, implies that we can model and obtain
actionable insights for the management of the transportation infrastructure.</description><identifier>DOI: 10.48550/arxiv.2305.07454</identifier><language>eng</language><subject>Computer Science - Distributed, Parallel, and Cluster Computing</subject><creationdate>2023-05</creationdate><rights>http://creativecommons.org/licenses/by-sa/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2305.07454$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2305.07454$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Mussah, Abdul Rashid</creatorcontrib><creatorcontrib>Shoman, Maged</creatorcontrib><creatorcontrib>Amo-Boateng, Mark</creatorcontrib><creatorcontrib>Adu-Gyamfi, Yaw</creatorcontrib><title>Accelerating Statewide Connected Vehicles Big (Sensor Fusion) Data ETL Pipelines on GPUs</title><description>Real-time traffic and sensor data from connected vehicles have the potential
to provide insights that will lead to the immediate benefit of efficient
management of the transportation infrastructure and related adjacent services.
However, the growth of electric vehicles (EVs) and connected vehicles (CVs) has
generated an abundance of CV data and sensor data that has put a strain on the
processing capabilities of existing data center infrastructure. As a result,
the benefits are either delayed or not fully realized. To address this issue,
we propose a solution for processing state-wide CV traffic and sensor data on
GPUs that provides real-time micro-scale insights in both temporal and spatial
dimensions. This is achieved through the use of the Nvidia Rapids framework and
the Dask parallel cluster in Python. Our findings demonstrate a 70x
acceleration in the extraction, transformation, and loading (ETL) of CV data
for the State of Missouri for a full day of all unique CV journeys, reducing
the processing time from approximately 48 hours to just 25 minutes. Given that
these results are for thousands of CVs and several thousands of individual
journeys with sub-second sensor data, implies that we can model and obtain
actionable insights for the management of the transportation infrastructure.</description><subject>Computer Science - Distributed, Parallel, and Cluster Computing</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotjztPwzAYRb0woMIPYKpHGBL8rLOW0BakSFRqitgiP74US8GpbPP694TCdK90j650ELqipBSVlORWxy__UTJOZEmUkOIcvSythQGizj4c8C7rDJ_eAa7HEMBmcPgZXr0dIOE7f8DXOwhpjHj9nvwYbvC9zhqv2gZv_REGHyZsDHiz3acLdNbrIcHlf85Qu1619UPRPG0e62VT6IUShZK9MtY6aggYSQWfqlLOKVpRqdm0qAW3vXFScWAVEMecocCtZAK4rPgMzf9uT2rdMfo3Hb-7X8XupMh_AF04S7c</recordid><startdate>20230508</startdate><enddate>20230508</enddate><creator>Mussah, Abdul Rashid</creator><creator>Shoman, Maged</creator><creator>Amo-Boateng, Mark</creator><creator>Adu-Gyamfi, Yaw</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230508</creationdate><title>Accelerating Statewide Connected Vehicles Big (Sensor Fusion) Data ETL Pipelines on GPUs</title><author>Mussah, Abdul Rashid ; Shoman, Maged ; Amo-Boateng, Mark ; Adu-Gyamfi, Yaw</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a674-75f7bccd1b0eb5143cd177dd71815a2ccd763cfbd573e28e0d2db1e3c524e3583</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Distributed, Parallel, and Cluster Computing</topic><toplevel>online_resources</toplevel><creatorcontrib>Mussah, Abdul Rashid</creatorcontrib><creatorcontrib>Shoman, Maged</creatorcontrib><creatorcontrib>Amo-Boateng, Mark</creatorcontrib><creatorcontrib>Adu-Gyamfi, Yaw</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Mussah, Abdul Rashid</au><au>Shoman, Maged</au><au>Amo-Boateng, Mark</au><au>Adu-Gyamfi, Yaw</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Accelerating Statewide Connected Vehicles Big (Sensor Fusion) Data ETL Pipelines on GPUs</atitle><date>2023-05-08</date><risdate>2023</risdate><abstract>Real-time traffic and sensor data from connected vehicles have the potential
to provide insights that will lead to the immediate benefit of efficient
management of the transportation infrastructure and related adjacent services.
However, the growth of electric vehicles (EVs) and connected vehicles (CVs) has
generated an abundance of CV data and sensor data that has put a strain on the
processing capabilities of existing data center infrastructure. As a result,
the benefits are either delayed or not fully realized. To address this issue,
we propose a solution for processing state-wide CV traffic and sensor data on
GPUs that provides real-time micro-scale insights in both temporal and spatial
dimensions. This is achieved through the use of the Nvidia Rapids framework and
the Dask parallel cluster in Python. Our findings demonstrate a 70x
acceleration in the extraction, transformation, and loading (ETL) of CV data
for the State of Missouri for a full day of all unique CV journeys, reducing
the processing time from approximately 48 hours to just 25 minutes. Given that
these results are for thousands of CVs and several thousands of individual
journeys with sub-second sensor data, implies that we can model and obtain
actionable insights for the management of the transportation infrastructure.</abstract><doi>10.48550/arxiv.2305.07454</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2305.07454 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2305_07454 |
source | arXiv.org |
subjects | Computer Science - Distributed, Parallel, and Cluster Computing |
title | Accelerating Statewide Connected Vehicles Big (Sensor Fusion) Data ETL Pipelines on GPUs |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T05%3A42%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Accelerating%20Statewide%20Connected%20Vehicles%20Big%20(Sensor%20Fusion)%20Data%20ETL%20Pipelines%20on%20GPUs&rft.au=Mussah,%20Abdul%20Rashid&rft.date=2023-05-08&rft_id=info:doi/10.48550/arxiv.2305.07454&rft_dat=%3Carxiv_GOX%3E2305_07454%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |