Performance Evaluation of Yahoo! S4: A First Look

Processing large data sets has been dominated recently by the map/reduce programming model [1], originally proposed by Google and widely adopted through the Apache Hadoop1 implementation. Over the years, developers have identified weaknesses of processing data sets in batches as in MapReduce and hav...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Chauhan, J., Chowdhury, S. A., Makaroff, D.
Format: Tagungsbericht
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Processing large data sets has been dominated recently by the map/reduce programming model [1], originally proposed by Google and widely adopted through the Apache Hadoop1 implementation. Over the years, developers have identified weaknesses of processing data sets in batches as in MapReduce and have proposed alternatives. One such alternative is continuous processing of data streams. This is particularly suitable for applications in online analytics, monitoring, financial data processing and fraud detection that require timely processing of data, making the delay introduced by batch processing highly undesirable. This processing paradigm has led to the development of systems such as Yahoo! S4 [2] and Twitter Storm.2 Yahoo! S4 is a general-purpose, distributed and scalable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data. As these frameworks are quite young and new, there is a need to understand their performance for real time applications and find out the existing issues in terms of scalability, execution time and fault tolerance. We did an empirical evaluation of one application on Yahoo! S4 and focused on the performance in terms of scalability, lost events and fault tolerance. Findings of our analyses can be helpful towards understanding the challenges in developing stream-based data intensive computing tools and thus providing a guideline for the future development.
DOI:10.1109/3PGCIC.2012.55