Real-time Twitter data analysis using Hadoop ecosystem
In the era of the Internet, social media has become an integral part of modern society. People use social media to share their opinions and to have an up-to-date knowledge about the current trends on a daily basis. Twitter is one of the renowned social media that gets a huge amount of tweets each da...
Gespeichert in:
Veröffentlicht in: | Cogent engineering 2018-01, Vol.5 (1), p.1534519 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In the era of the Internet, social media has become an integral part of modern society. People use social media to share their opinions and to have an up-to-date knowledge about the current trends on a daily basis. Twitter is one of the renowned social media that gets a huge amount of tweets each day. This information can be used for economic, industrial, social or government approaches by arranging and analyzing the tweets as per our demand. Since Twitter contains a huge volumeof data, storing and processing this data is a complex problem. Hadoop is a big data storage and processing tool for analyzing data with 3Vs, i.e. data with huge volume, variety and velocity. Hadoop is a framework which deals with Big data and it has its own family which supports processing of different things which are tied up in one umbrella called the Hadoop Ecosystem. In this paper, we will be analyzing tweets streamed in real time. We have used Apache Flume to capture real-time tweets. As an analysis, we have proposed a method for finding recent trends in tweets and performed sentiment analysis on real-time tweets. The analysis is done using Hadoop ecosystem tools such as Apache Hive and Apache Pig. Performance in terms of execution time is compared for analysis of real-time tweets using Pig and Hive. From the experimental results, conclusion can be drawn that Pig is more efficient than Hive as Pig takes less time for execution than Hive. |
---|---|
ISSN: | 2331-1916 2331-1916 |
DOI: | 10.1080/23311916.2018.1534519 |