Big Data Trip Classification on the New York City Taxi and Uber Sensor Network

Millions of trips are made every day by taxis and Uber in New York City. We first employ big data technologies to analyze this vast dataset: Apache Spark is used for data processing and classification, Apache Hive is used for data storage, and MapReduce is used for data profiling. Since taxis and Ub...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Wangji Wanglu Jishu Xuekan = Journal of Internet Technology 2018-01, Vol.19 (2), p.591-598
Hauptverfasser: Sun, Huiyu, Hu, Siyuan, McIntosh, Suzanne, Cao, Yi
Format: Artikel
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 598
container_issue 2
container_start_page 591
container_title Wangji Wanglu Jishu Xuekan = Journal of Internet Technology
container_volume 19
creator Sun, Huiyu
Hu, Siyuan
McIntosh, Suzanne
Cao, Yi
description Millions of trips are made every day by taxis and Uber in New York City. We first employ big data technologies to analyze this vast dataset: Apache Spark is used for data processing and classification, Apache Hive is used for data storage, and MapReduce is used for data profiling. Since taxis and Uber are equipped with GPS sensors, we then visualize a mobile sensor network over New York City separated into fine-sized regions each acting as a mobile sensing node. Each location on the network falls into a region and is classified into one of three categories based on which service dominates the particular region: Yellow taxi, Green taxi, or Uber. We utilize logistic regression to classify a region into one of the three categories. Our classification algorithm is then used to analyze the interaction between taxi and Uber, for example to quantify the expansion of Uber. Experiments run on the Spark cluster show our classifier achieves an accuracy of over 85% scored on the 2014 taxi and Uber dataset. Finally, we propose a trip recommendation system for users using classification results together with a web service application
doi_str_mv 10.3966/160792642018031902027
format Article
fullrecord <record><control><sourceid>proquest_hyweb</sourceid><recordid>TN_cdi_proquest_journals_2059157343</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2059157343</sourcerecordid><originalsourceid>FETCH-LOGICAL-h202t-ba8a6ee205d7cba6a2ed980c956dfc242438012b6fd7aa26a33e0f2de5d232ca3</originalsourceid><addsrcrecordid>eNotzt9LwzAQB_AgCo65f0AQAj5XL5c2bR61_oQxH-wefCrXJnXR0c40Y-6_N2PCwcHx4Xtfxq4E3Eit1K1QkGtUKYIoQAoNCJifsAnGc5IC6lM2OZjkgM7ZbBxdA4AiQ4liwhb37pM_UCBeebfh5Zoi6FxLwQ09jxNWli_sjn8M_puXLux5Rb-OU2_4srGev9t-HHwkYRfFBTvraD3a2f-esuXTY1W-JPO359fybp6sYr2QNFSQshYhM3nbkCK0RhfQ6kyZrsUUU1mAwEZ1JidCRVJa6NDYzMTaLckpuz7mbvzws7VjqL-Gre_jyzqGapHlMpVRXR7Var-zTW29a2uAVIPKlfwDqARZcw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2059157343</pqid></control><display><type>article</type><title>Big Data Trip Classification on the New York City Taxi and Uber Sensor Network</title><source>Alma/SFX Local Collection</source><creator>Sun, Huiyu ; Hu, Siyuan ; McIntosh, Suzanne ; Cao, Yi</creator><creatorcontrib>Sun, Huiyu ; Hu, Siyuan ; McIntosh, Suzanne ; Cao, Yi</creatorcontrib><description>Millions of trips are made every day by taxis and Uber in New York City. We first employ big data technologies to analyze this vast dataset: Apache Spark is used for data processing and classification, Apache Hive is used for data storage, and MapReduce is used for data profiling. Since taxis and Uber are equipped with GPS sensors, we then visualize a mobile sensor network over New York City separated into fine-sized regions each acting as a mobile sensing node. Each location on the network falls into a region and is classified into one of three categories based on which service dominates the particular region: Yellow taxi, Green taxi, or Uber. We utilize logistic regression to classify a region into one of the three categories. Our classification algorithm is then used to analyze the interaction between taxi and Uber, for example to quantify the expansion of Uber. Experiments run on the Spark cluster show our classifier achieves an accuracy of over 85% scored on the 2014 taxi and Uber dataset. Finally, we propose a trip recommendation system for users using classification results together with a web service application</description><identifier>ISSN: 1607-9264</identifier><identifier>EISSN: 2079-4029</identifier><identifier>DOI: 10.3966/160792642018031902027</identifier><language>chi ; eng</language><publisher>台灣: 台灣學術網路管理委員會</publisher><subject>Big Data ; Classification ; Data management ; Data processing ; Data storage ; Recommender systems ; Regression analysis ; Remote sensors ; Taxicabs ; Wireless sensor networks</subject><ispartof>Wangji Wanglu Jishu Xuekan = Journal of Internet Technology, 2018-01, Vol.19 (2), p.591-598</ispartof><rights>Copyright National Dong Hwa University, Computer Center Mar 2018</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Sun, Huiyu</creatorcontrib><creatorcontrib>Hu, Siyuan</creatorcontrib><creatorcontrib>McIntosh, Suzanne</creatorcontrib><creatorcontrib>Cao, Yi</creatorcontrib><title>Big Data Trip Classification on the New York City Taxi and Uber Sensor Network</title><title>Wangji Wanglu Jishu Xuekan = Journal of Internet Technology</title><addtitle>Journal of Internet Technology</addtitle><description>Millions of trips are made every day by taxis and Uber in New York City. We first employ big data technologies to analyze this vast dataset: Apache Spark is used for data processing and classification, Apache Hive is used for data storage, and MapReduce is used for data profiling. Since taxis and Uber are equipped with GPS sensors, we then visualize a mobile sensor network over New York City separated into fine-sized regions each acting as a mobile sensing node. Each location on the network falls into a region and is classified into one of three categories based on which service dominates the particular region: Yellow taxi, Green taxi, or Uber. We utilize logistic regression to classify a region into one of the three categories. Our classification algorithm is then used to analyze the interaction between taxi and Uber, for example to quantify the expansion of Uber. Experiments run on the Spark cluster show our classifier achieves an accuracy of over 85% scored on the 2014 taxi and Uber dataset. Finally, we propose a trip recommendation system for users using classification results together with a web service application</description><subject>Big Data</subject><subject>Classification</subject><subject>Data management</subject><subject>Data processing</subject><subject>Data storage</subject><subject>Recommender systems</subject><subject>Regression analysis</subject><subject>Remote sensors</subject><subject>Taxicabs</subject><subject>Wireless sensor networks</subject><issn>1607-9264</issn><issn>2079-4029</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNotzt9LwzAQB_AgCo65f0AQAj5XL5c2bR61_oQxH-wefCrXJnXR0c40Y-6_N2PCwcHx4Xtfxq4E3Eit1K1QkGtUKYIoQAoNCJifsAnGc5IC6lM2OZjkgM7ZbBxdA4AiQ4liwhb37pM_UCBeebfh5Zoi6FxLwQ09jxNWli_sjn8M_puXLux5Rb-OU2_4srGev9t-HHwkYRfFBTvraD3a2f-esuXTY1W-JPO359fybp6sYr2QNFSQshYhM3nbkCK0RhfQ6kyZrsUUU1mAwEZ1JidCRVJa6NDYzMTaLckpuz7mbvzws7VjqL-Gre_jyzqGapHlMpVRXR7Var-zTW29a2uAVIPKlfwDqARZcw</recordid><startdate>20180101</startdate><enddate>20180101</enddate><creator>Sun, Huiyu</creator><creator>Hu, Siyuan</creator><creator>McIntosh, Suzanne</creator><creator>Cao, Yi</creator><general>台灣學術網路管理委員會</general><general>National Dong Hwa University, Computer Center</general><scope>DT-</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20180101</creationdate><title>Big Data Trip Classification on the New York City Taxi and Uber Sensor Network</title><author>Sun, Huiyu ; Hu, Siyuan ; McIntosh, Suzanne ; Cao, Yi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-h202t-ba8a6ee205d7cba6a2ed980c956dfc242438012b6fd7aa26a33e0f2de5d232ca3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>chi ; eng</language><creationdate>2018</creationdate><topic>Big Data</topic><topic>Classification</topic><topic>Data management</topic><topic>Data processing</topic><topic>Data storage</topic><topic>Recommender systems</topic><topic>Regression analysis</topic><topic>Remote sensors</topic><topic>Taxicabs</topic><topic>Wireless sensor networks</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sun, Huiyu</creatorcontrib><creatorcontrib>Hu, Siyuan</creatorcontrib><creatorcontrib>McIntosh, Suzanne</creatorcontrib><creatorcontrib>Cao, Yi</creatorcontrib><collection>Ericdata Higher Education Knowledge Database</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Wangji Wanglu Jishu Xuekan = Journal of Internet Technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sun, Huiyu</au><au>Hu, Siyuan</au><au>McIntosh, Suzanne</au><au>Cao, Yi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Big Data Trip Classification on the New York City Taxi and Uber Sensor Network</atitle><jtitle>Wangji Wanglu Jishu Xuekan = Journal of Internet Technology</jtitle><addtitle>Journal of Internet Technology</addtitle><date>2018-01-01</date><risdate>2018</risdate><volume>19</volume><issue>2</issue><spage>591</spage><epage>598</epage><pages>591-598</pages><issn>1607-9264</issn><eissn>2079-4029</eissn><abstract>Millions of trips are made every day by taxis and Uber in New York City. We first employ big data technologies to analyze this vast dataset: Apache Spark is used for data processing and classification, Apache Hive is used for data storage, and MapReduce is used for data profiling. Since taxis and Uber are equipped with GPS sensors, we then visualize a mobile sensor network over New York City separated into fine-sized regions each acting as a mobile sensing node. Each location on the network falls into a region and is classified into one of three categories based on which service dominates the particular region: Yellow taxi, Green taxi, or Uber. We utilize logistic regression to classify a region into one of the three categories. Our classification algorithm is then used to analyze the interaction between taxi and Uber, for example to quantify the expansion of Uber. Experiments run on the Spark cluster show our classifier achieves an accuracy of over 85% scored on the 2014 taxi and Uber dataset. Finally, we propose a trip recommendation system for users using classification results together with a web service application</abstract><cop>台灣</cop><pub>台灣學術網路管理委員會</pub><doi>10.3966/160792642018031902027</doi><tpages>8</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1607-9264
ispartof Wangji Wanglu Jishu Xuekan = Journal of Internet Technology, 2018-01, Vol.19 (2), p.591-598
issn 1607-9264
2079-4029
language chi ; eng
recordid cdi_proquest_journals_2059157343
source Alma/SFX Local Collection
subjects Big Data
Classification
Data management
Data processing
Data storage
Recommender systems
Regression analysis
Remote sensors
Taxicabs
Wireless sensor networks
title Big Data Trip Classification on the New York City Taxi and Uber Sensor Network
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-15T18%3A38%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_hyweb&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Big%20Data%20Trip%20Classification%20on%20the%20New%20York%20City%20Taxi%20and%20Uber%20Sensor%20Network&rft.jtitle=Wangji%20Wanglu%20Jishu%20Xuekan%20=%20Journal%20of%20Internet%20Technology&rft.au=Sun,%20Huiyu&rft.date=2018-01-01&rft.volume=19&rft.issue=2&rft.spage=591&rft.epage=598&rft.pages=591-598&rft.issn=1607-9264&rft.eissn=2079-4029&rft_id=info:doi/10.3966/160792642018031902027&rft_dat=%3Cproquest_hyweb%3E2059157343%3C/proquest_hyweb%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2059157343&rft_id=info:pmid/&rfr_iscdi=true