Performance Analysis of Not Only SQL Semi-Stream Join Using MongoDB for Real-Time Data Warehousing

Data warehousing has been indispensable to enterprises for decades. However, infrequently updated data warehouse environment does not support quicker business decisions and faster data recovery in case of transformation or load issue. Implementation of real-time data warehouse provides solution to u...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2019, Vol.7, p.134215-134225
Hauptverfasser: Mehmood, Erum, Anees, Tayyaba
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Data warehousing has been indispensable to enterprises for decades. However, infrequently updated data warehouse environment does not support quicker business decisions and faster data recovery in case of transformation or load issue. Implementation of real-time data warehouse provides solution to update problems of enterprises. Efficient stream processing for un-structured(NoSQL) and structured(SQL) data from various sources is required for the successful implementation of real-time data warehousing. We have done an analysis between un-structured and structured semi-stream join processing, using efficient database engine MongoDB at Extraction-Transformation-Loading phase. Semi-stream tuples coming from different sources are joined with disk-based master data, based on keys in memory, for both un-structured and structured documents(tuples) using MongoDB server, where the I/O rates are different for both inputs. Through experiments, in this paper we have analyzed the CPU and memory usage for real-time semi-stream join processing through two types of tests, un-structured and structured data streams using synthetic and real datasets. The results show that, memory usage and execution time remains consistent for a given specification irrespective of the nature of data streams (un-structured or structured), even when incoming semi-streams are growing.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2019.2941925