On the distributed software architecture of a data analysis workflow: A case study

Hybrid distributed computing software architectures gain great importance in data analysis workflows as the number of available underlying machine learning libraries and data storage systems increase. We argue that there is a need for novel approaches for software architecture designs that can enabl...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Concurrency and computation 2022-04, Vol.34 (9), p.n/a
Hauptverfasser:	Tasgetiren, Nail, Tigrak, Umit, Bozan, Erdal, Gul, Guven, Demirci, Emir, Saribiyik, Hakan, Aktas, Mehmet S.
Format:	Artikel
Sprache:	eng
Schlagworte:	Banking Case studies Clustering Computer architecture Computer networks Customers Data analysis data analysis workflow Data storage Distributed processing distributed software architecture facade design pattern lambda software architecture Libraries Loans Machine learning machine learning workflows Performance tests Prototypes Software Storage systems Subsystems Usability Workflow
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Hybrid distributed computing software architectures gain great importance in data analysis workflows as the number of available underlying machine learning libraries and data storage systems increase. We argue that there is a need for novel approaches for software architecture designs that can enable machine learning data analysis workflows to run on top of different subsystem libraries. To address this need, we propose a hybrid distributed software architecture in this manuscript. The proposed architecture manages machine learning models for both supervised and unsupervised machine learning data analysis workflows. To show the usability of the proposed architecture, we implement a prototype for the banking sector as a case study. The prototype application includes two data analysis workflows: a workflow for predicting the loan usage tendency of customers, and a workflow for clustering the customers based on the usage patterns of banking loans. The prototype is tested on a large scale banking dataset. Performance tests were carried out to investigate the performance in terms of both responsiveness and scalability of the system. The results obtained reveal the usability of the proposed architecture.
ISSN:	1532-0626 1532-0634
DOI:	10.1002/cpe.6522