A Survey on Automatic Parameter Tuning for Big Data Processing Systems

Big data processing systems (e.g., Hadoop, Spark, Storm) contain a vast number of configuration parameters controlling parallelism, I/O behavior, memory settings, and compression. Improper parameter settings can cause significant performance degradation and stability issues. However, regular users a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ACM computing surveys 2021-03, Vol.53 (2), p.1-37, Article 43
Hauptverfasser:	Herodotou, Herodotos, Chen, Yuxing, Lu, Jiaheng
Format:	Artikel
Sprache:	eng
Schlagworte:	Architectures Big Data Computer science Computer systems organization Computing platforms Data processing Information systems Information systems applications Machine learning Other architectures Parameters Performance degradation Self-organizing autonomic computing Tuning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Big data processing systems (e.g., Hadoop, Spark, Storm) contain a vast number of configuration parameters controlling parallelism, I/O behavior, memory settings, and compression. Improper parameter settings can cause significant performance degradation and stability issues. However, regular users and even expert administrators grapple with understanding and tuning them to achieve good performance. We investigate existing approaches on parameter tuning for both batch and stream data processing systems and classify them into six categories: rule-based, cost modeling, simulation-based, experiment-driven, machine learning, and adaptive tuning. We summarize the pros and cons of each approach and raise some open research problems for automatic parameter tuning.
ISSN:	0360-0300 1557-7341
DOI:	10.1145/3381027