Big Data Trip Classification on the New York City Taxi and Uber Sensor Network
Millions of trips are made every day by taxis and Uber in New York City. We first employ big data technologies to analyze this vast dataset: Apache Spark is used for data processing and classification, Apache Hive is used for data storage, and MapReduce is used for data profiling. Since taxis and Ub...
Gespeichert in:
Veröffentlicht in: | Wangji Wanglu Jishu Xuekan = Journal of Internet Technology 2018-01, Vol.19 (2), p.591-598 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 598 |
---|---|
container_issue | 2 |
container_start_page | 591 |
container_title | Wangji Wanglu Jishu Xuekan = Journal of Internet Technology |
container_volume | 19 |
creator | Sun, Huiyu Hu, Siyuan McIntosh, Suzanne Cao, Yi |
description | Millions of trips are made every day by taxis and Uber in New York City. We first employ big data technologies to analyze this vast dataset: Apache Spark is used for data processing and classification, Apache Hive is used for data storage, and MapReduce is used for data profiling. Since taxis and Uber are equipped with GPS sensors, we then visualize a mobile sensor network over New York City separated into fine-sized regions each acting as a mobile sensing node. Each location on the network falls into a region and is classified into one of three categories based on which service dominates the particular region: Yellow taxi, Green taxi, or Uber. We utilize logistic regression to classify a region into one of the three categories. Our classification algorithm is then used to analyze the interaction between taxi and Uber, for example to quantify the expansion of Uber. Experiments run on the Spark cluster show our classifier achieves an accuracy of over 85% scored on the 2014 taxi and Uber dataset. Finally, we propose a trip recommendation system for users using classification results together with a web service application |
doi_str_mv | 10.3966/160792642018031902027 |
format | Article |
fullrecord | <record><control><sourceid>proquest_hyweb</sourceid><recordid>TN_cdi_proquest_journals_2059157343</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2059157343</sourcerecordid><originalsourceid>FETCH-LOGICAL-h202t-ba8a6ee205d7cba6a2ed980c956dfc242438012b6fd7aa26a33e0f2de5d232ca3</originalsourceid><addsrcrecordid>eNotzt9LwzAQB_AgCo65f0AQAj5XL5c2bR61_oQxH-wefCrXJnXR0c40Y-6_N2PCwcHx4Xtfxq4E3Eit1K1QkGtUKYIoQAoNCJifsAnGc5IC6lM2OZjkgM7ZbBxdA4AiQ4liwhb37pM_UCBeebfh5Zoi6FxLwQ09jxNWli_sjn8M_puXLux5Rb-OU2_4srGev9t-HHwkYRfFBTvraD3a2f-esuXTY1W-JPO359fybp6sYr2QNFSQshYhM3nbkCK0RhfQ6kyZrsUUU1mAwEZ1JidCRVJa6NDYzMTaLckpuz7mbvzws7VjqL-Gre_jyzqGapHlMpVRXR7Var-zTW29a2uAVIPKlfwDqARZcw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2059157343</pqid></control><display><type>article</type><title>Big Data Trip Classification on the New York City Taxi and Uber Sensor Network</title><source>Alma/SFX Local Collection</source><creator>Sun, Huiyu ; Hu, Siyuan ; McIntosh, Suzanne ; Cao, Yi</creator><creatorcontrib>Sun, Huiyu ; Hu, Siyuan ; McIntosh, Suzanne ; Cao, Yi</creatorcontrib><description>Millions of trips are made every day by taxis and Uber in New York City. We first employ big data technologies to analyze this vast dataset: Apache Spark is used for data processing and classification, Apache Hive is used for data storage, and MapReduce is used for data profiling. Since taxis and Uber are equipped with GPS sensors, we then visualize a mobile sensor network over New York City separated into fine-sized regions each acting as a mobile sensing node. Each location on the network falls into a region and is classified into one of three categories based on which service dominates the particular region: Yellow taxi, Green taxi, or Uber. We utilize logistic regression to classify a region into one of the three categories. Our classification algorithm is then used to analyze the interaction between taxi and Uber, for example to quantify the expansion of Uber. Experiments run on the Spark cluster show our classifier achieves an accuracy of over 85% scored on the 2014 taxi and Uber dataset. Finally, we propose a trip recommendation system for users using classification results together with a web service application</description><identifier>ISSN: 1607-9264</identifier><identifier>EISSN: 2079-4029</identifier><identifier>DOI: 10.3966/160792642018031902027</identifier><language>chi ; eng</language><publisher>台灣: 台灣學術網路管理委員會</publisher><subject>Big Data ; Classification ; Data management ; Data processing ; Data storage ; Recommender systems ; Regression analysis ; Remote sensors ; Taxicabs ; Wireless sensor networks</subject><ispartof>Wangji Wanglu Jishu Xuekan = Journal of Internet Technology, 2018-01, Vol.19 (2), p.591-598</ispartof><rights>Copyright National Dong Hwa University, Computer Center Mar 2018</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Sun, Huiyu</creatorcontrib><creatorcontrib>Hu, Siyuan</creatorcontrib><creatorcontrib>McIntosh, Suzanne</creatorcontrib><creatorcontrib>Cao, Yi</creatorcontrib><title>Big Data Trip Classification on the New York City Taxi and Uber Sensor Network</title><title>Wangji Wanglu Jishu Xuekan = Journal of Internet Technology</title><addtitle>Journal of Internet Technology</addtitle><description>Millions of trips are made every day by taxis and Uber in New York City. We first employ big data technologies to analyze this vast dataset: Apache Spark is used for data processing and classification, Apache Hive is used for data storage, and MapReduce is used for data profiling. Since taxis and Uber are equipped with GPS sensors, we then visualize a mobile sensor network over New York City separated into fine-sized regions each acting as a mobile sensing node. Each location on the network falls into a region and is classified into one of three categories based on which service dominates the particular region: Yellow taxi, Green taxi, or Uber. We utilize logistic regression to classify a region into one of the three categories. Our classification algorithm is then used to analyze the interaction between taxi and Uber, for example to quantify the expansion of Uber. Experiments run on the Spark cluster show our classifier achieves an accuracy of over 85% scored on the 2014 taxi and Uber dataset. Finally, we propose a trip recommendation system for users using classification results together with a web service application</description><subject>Big Data</subject><subject>Classification</subject><subject>Data management</subject><subject>Data processing</subject><subject>Data storage</subject><subject>Recommender systems</subject><subject>Regression analysis</subject><subject>Remote sensors</subject><subject>Taxicabs</subject><subject>Wireless sensor networks</subject><issn>1607-9264</issn><issn>2079-4029</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNotzt9LwzAQB_AgCo65f0AQAj5XL5c2bR61_oQxH-wefCrXJnXR0c40Y-6_N2PCwcHx4Xtfxq4E3Eit1K1QkGtUKYIoQAoNCJifsAnGc5IC6lM2OZjkgM7ZbBxdA4AiQ4liwhb37pM_UCBeebfh5Zoi6FxLwQ09jxNWli_sjn8M_puXLux5Rb-OU2_4srGev9t-HHwkYRfFBTvraD3a2f-esuXTY1W-JPO359fybp6sYr2QNFSQshYhM3nbkCK0RhfQ6kyZrsUUU1mAwEZ1JidCRVJa6NDYzMTaLckpuz7mbvzws7VjqL-Gre_jyzqGapHlMpVRXR7Var-zTW29a2uAVIPKlfwDqARZcw</recordid><startdate>20180101</startdate><enddate>20180101</enddate><creator>Sun, Huiyu</creator><creator>Hu, Siyuan</creator><creator>McIntosh, Suzanne</creator><creator>Cao, Yi</creator><general>台灣學術網路管理委員會</general><general>National Dong Hwa University, Computer Center</general><scope>DT-</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20180101</creationdate><title>Big Data Trip Classification on the New York City Taxi and Uber Sensor Network</title><author>Sun, Huiyu ; Hu, Siyuan ; McIntosh, Suzanne ; Cao, Yi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-h202t-ba8a6ee205d7cba6a2ed980c956dfc242438012b6fd7aa26a33e0f2de5d232ca3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>chi ; eng</language><creationdate>2018</creationdate><topic>Big Data</topic><topic>Classification</topic><topic>Data management</topic><topic>Data processing</topic><topic>Data storage</topic><topic>Recommender systems</topic><topic>Regression analysis</topic><topic>Remote sensors</topic><topic>Taxicabs</topic><topic>Wireless sensor networks</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sun, Huiyu</creatorcontrib><creatorcontrib>Hu, Siyuan</creatorcontrib><creatorcontrib>McIntosh, Suzanne</creatorcontrib><creatorcontrib>Cao, Yi</creatorcontrib><collection>Ericdata Higher Education Knowledge Database</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Wangji Wanglu Jishu Xuekan = Journal of Internet Technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sun, Huiyu</au><au>Hu, Siyuan</au><au>McIntosh, Suzanne</au><au>Cao, Yi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Big Data Trip Classification on the New York City Taxi and Uber Sensor Network</atitle><jtitle>Wangji Wanglu Jishu Xuekan = Journal of Internet Technology</jtitle><addtitle>Journal of Internet Technology</addtitle><date>2018-01-01</date><risdate>2018</risdate><volume>19</volume><issue>2</issue><spage>591</spage><epage>598</epage><pages>591-598</pages><issn>1607-9264</issn><eissn>2079-4029</eissn><abstract>Millions of trips are made every day by taxis and Uber in New York City. We first employ big data technologies to analyze this vast dataset: Apache Spark is used for data processing and classification, Apache Hive is used for data storage, and MapReduce is used for data profiling. Since taxis and Uber are equipped with GPS sensors, we then visualize a mobile sensor network over New York City separated into fine-sized regions each acting as a mobile sensing node. Each location on the network falls into a region and is classified into one of three categories based on which service dominates the particular region: Yellow taxi, Green taxi, or Uber. We utilize logistic regression to classify a region into one of the three categories. Our classification algorithm is then used to analyze the interaction between taxi and Uber, for example to quantify the expansion of Uber. Experiments run on the Spark cluster show our classifier achieves an accuracy of over 85% scored on the 2014 taxi and Uber dataset. Finally, we propose a trip recommendation system for users using classification results together with a web service application</abstract><cop>台灣</cop><pub>台灣學術網路管理委員會</pub><doi>10.3966/160792642018031902027</doi><tpages>8</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1607-9264 |
ispartof | Wangji Wanglu Jishu Xuekan = Journal of Internet Technology, 2018-01, Vol.19 (2), p.591-598 |
issn | 1607-9264 2079-4029 |
language | chi ; eng |
recordid | cdi_proquest_journals_2059157343 |
source | Alma/SFX Local Collection |
subjects | Big Data Classification Data management Data processing Data storage Recommender systems Regression analysis Remote sensors Taxicabs Wireless sensor networks |
title | Big Data Trip Classification on the New York City Taxi and Uber Sensor Network |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-15T18%3A38%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_hyweb&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Big%20Data%20Trip%20Classification%20on%20the%20New%20York%20City%20Taxi%20and%20Uber%20Sensor%20Network&rft.jtitle=Wangji%20Wanglu%20Jishu%20Xuekan%20=%20Journal%20of%20Internet%20Technology&rft.au=Sun,%20Huiyu&rft.date=2018-01-01&rft.volume=19&rft.issue=2&rft.spage=591&rft.epage=598&rft.pages=591-598&rft.issn=1607-9264&rft.eissn=2079-4029&rft_id=info:doi/10.3966/160792642018031902027&rft_dat=%3Cproquest_hyweb%3E2059157343%3C/proquest_hyweb%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2059157343&rft_id=info:pmid/&rfr_iscdi=true |