Leverage hadoop framework for large scale clinical informatics applications

In this manuscript, we present our experiences using the Apache Hadoop framework for high data volume and computationally intensive applications, and discuss some best practice guidelines in a clinical informatics setting. There are three main aspects in our approach: (a) process and integrate diver...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	AMIA Summits on Translational Science proceedings 2013, Vol.2013, p.53-53
Hauptverfasser:	Dong, Xiao, Bahroos, Neil, Sadhu, Eugene, Jackson, Tommie, Chukhman, Morris, Johnson, Robert, Boyd, Andrew, Hynes, Denise
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this manuscript, we present our experiences using the Apache Hadoop framework for high data volume and computationally intensive applications, and discuss some best practice guidelines in a clinical informatics setting. There are three main aspects in our approach: (a) process and integrate diverse, heterogeneous data sources using standard Hadoop programming tools and customized MapReduce programs; (b) after fine-grained aggregate results are obtained, perform data analysis using the Mahout data mining library; (c) leverage the column oriented features in HBase for patient centric modeling and complex temporal reasoning. This framework provides a scalable solution to meet the rapidly increasing, imperative "Big Data" needs of clinical and translational research. The intrinsic advantage of fault tolerance, high availability and scalability of Hadoop platform makes these applications readily deployable at the enterprise level cluster environment.
ISSN:	2153-4063 2153-4063