MaReIA: a cloud MapReduce based high performance whole slide image analysis framework

Recent advancements in systematic analysis of high resolution whole slide images have increase efficiency of diagnosis, prognosis and prediction of cancer and important diseases. Due to the enormous sizes and dimensions of whole slide images, the analysis requires extensive computing resources which...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Distributed and parallel databases : an international journal 2019-06, Vol.37 (2), p.251-272
Hauptverfasser: Vo, Hoang, Kong, Jun, Teng, Dejun, Liang, Yanhui, Aji, Ablimit, Teodoro, George, Wang, Fusheng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 272
container_issue 2
container_start_page 251
container_title Distributed and parallel databases : an international journal
container_volume 37
creator Vo, Hoang
Kong, Jun
Teng, Dejun
Liang, Yanhui
Aji, Ablimit
Teodoro, George
Wang, Fusheng
description Recent advancements in systematic analysis of high resolution whole slide images have increase efficiency of diagnosis, prognosis and prediction of cancer and important diseases. Due to the enormous sizes and dimensions of whole slide images, the analysis requires extensive computing resources which are not commonly available. Images have to be tiled for processing due to computer memory limitations, which lead to inaccurate results due to the ignorance of boundary crossing objects. Thus, we propose a generic and highly scalable cloud-based image analysis framework for whole slide images. The framework enables parallelized integration of image analysis steps, such as segmentation and aggregation of micro-structures in a single pipeline, and generation of final objects manageable by databases. The core concept relies on the abstraction of objects in whole slide images as different classes of spatial geometries, which in turn can be handled as text based records in MapReduce. The framework applies an overlapping partitioning scheme on images, and provides parallelization of tiling and image segmentation based on MapReduce architecture. It further provides robust object normalization, graceful handling of boundary objects with an efficient spatial indexing based matching method to generate accurate results. Our experiments on Amazon EMR show that MaReIA is highly scalable, generic and extremely cost effective by benchmark tests.
doi_str_mv 10.1007/s10619-018-7237-1
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_6583906</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2244132000</sourcerecordid><originalsourceid>FETCH-LOGICAL-c470t-e9016a150ad92af28635a8705ab2b2df1de0d2c1b746e519b21f76fb408d81f13</originalsourceid><addsrcrecordid>eNp1kVtv1DAQhS0EokvhB_CCLPHCS2DGSXzhAamquFRqhVTRZ8uJx7spSbzYm1b993i1pVwknizNfHM8Zw5jLxHeIoB6lxEkmgpQV0rUqsJHbIWtqivVKv2YrcAIWWmlxRF7lvM1ABiF6ik7qlGgktKs2NWFu6Szk_fc8X6Mi-cXbntJfumJdy6T55thveFbSiGmyc2lfLuJI_E8Dp74MLk1cTe78S4PmYfkJrqN6ftz9iS4MdOL-_eYXX36-O30S3X-9fPZ6cl51TcKdhUZQOmwBeeNcEFoWbdOK2hdJzrhA3oCL3rsVCOpRdMJDEqGrgHtNQasj9mHg-526SbyPc275Ea7TWWxdGejG-zfnXnY2HW8sbLVtQFZBN7cC6T4Y6G8s9OQexpHN1NcshWiabAW5XIFff0Peh2XVKzvKaEQNKIpFB6oPsWcE4WHZRDsPjR7CM2W0Ow-NLt38epPFw8Tv1IqgDgAubTmNaXfX_9f9Sfn36Hr</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2227108119</pqid></control><display><type>article</type><title>MaReIA: a cloud MapReduce based high performance whole slide image analysis framework</title><source>Springer Nature - Complete Springer Journals</source><creator>Vo, Hoang ; Kong, Jun ; Teng, Dejun ; Liang, Yanhui ; Aji, Ablimit ; Teodoro, George ; Wang, Fusheng</creator><creatorcontrib>Vo, Hoang ; Kong, Jun ; Teng, Dejun ; Liang, Yanhui ; Aji, Ablimit ; Teodoro, George ; Wang, Fusheng</creatorcontrib><description>Recent advancements in systematic analysis of high resolution whole slide images have increase efficiency of diagnosis, prognosis and prediction of cancer and important diseases. Due to the enormous sizes and dimensions of whole slide images, the analysis requires extensive computing resources which are not commonly available. Images have to be tiled for processing due to computer memory limitations, which lead to inaccurate results due to the ignorance of boundary crossing objects. Thus, we propose a generic and highly scalable cloud-based image analysis framework for whole slide images. The framework enables parallelized integration of image analysis steps, such as segmentation and aggregation of micro-structures in a single pipeline, and generation of final objects manageable by databases. The core concept relies on the abstraction of objects in whole slide images as different classes of spatial geometries, which in turn can be handled as text based records in MapReduce. The framework applies an overlapping partitioning scheme on images, and provides parallelization of tiling and image segmentation based on MapReduce architecture. It further provides robust object normalization, graceful handling of boundary objects with an efficient spatial indexing based matching method to generate accurate results. Our experiments on Amazon EMR show that MaReIA is highly scalable, generic and extremely cost effective by benchmark tests.</description><identifier>ISSN: 0926-8782</identifier><identifier>EISSN: 1573-7578</identifier><identifier>DOI: 10.1007/s10619-018-7237-1</identifier><identifier>PMID: 31217669</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Computer memory ; Computer Science ; Data Structures ; Database Management ; Image analysis ; Image resolution ; Image segmentation ; Information Systems Applications (incl.Internet) ; Medical imaging ; Memory Structures ; Operating Systems ; Parallel processing ; Special Issue on Data Management and Analytics for Healthcare ; Tiling</subject><ispartof>Distributed and parallel databases : an international journal, 2019-06, Vol.37 (2), p.251-272</ispartof><rights>Springer Science+Business Media, LLC, part of Springer Nature 2018</rights><rights>Copyright Springer Nature B.V. 2019</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c470t-e9016a150ad92af28635a8705ab2b2df1de0d2c1b746e519b21f76fb408d81f13</citedby><cites>FETCH-LOGICAL-c470t-e9016a150ad92af28635a8705ab2b2df1de0d2c1b746e519b21f76fb408d81f13</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10619-018-7237-1$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10619-018-7237-1$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>230,314,778,782,883,27911,27912,41475,42544,51306</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/31217669$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Vo, Hoang</creatorcontrib><creatorcontrib>Kong, Jun</creatorcontrib><creatorcontrib>Teng, Dejun</creatorcontrib><creatorcontrib>Liang, Yanhui</creatorcontrib><creatorcontrib>Aji, Ablimit</creatorcontrib><creatorcontrib>Teodoro, George</creatorcontrib><creatorcontrib>Wang, Fusheng</creatorcontrib><title>MaReIA: a cloud MapReduce based high performance whole slide image analysis framework</title><title>Distributed and parallel databases : an international journal</title><addtitle>Distrib Parallel Databases</addtitle><addtitle>Distrib Parallel Databases</addtitle><description>Recent advancements in systematic analysis of high resolution whole slide images have increase efficiency of diagnosis, prognosis and prediction of cancer and important diseases. Due to the enormous sizes and dimensions of whole slide images, the analysis requires extensive computing resources which are not commonly available. Images have to be tiled for processing due to computer memory limitations, which lead to inaccurate results due to the ignorance of boundary crossing objects. Thus, we propose a generic and highly scalable cloud-based image analysis framework for whole slide images. The framework enables parallelized integration of image analysis steps, such as segmentation and aggregation of micro-structures in a single pipeline, and generation of final objects manageable by databases. The core concept relies on the abstraction of objects in whole slide images as different classes of spatial geometries, which in turn can be handled as text based records in MapReduce. The framework applies an overlapping partitioning scheme on images, and provides parallelization of tiling and image segmentation based on MapReduce architecture. It further provides robust object normalization, graceful handling of boundary objects with an efficient spatial indexing based matching method to generate accurate results. Our experiments on Amazon EMR show that MaReIA is highly scalable, generic and extremely cost effective by benchmark tests.</description><subject>Computer memory</subject><subject>Computer Science</subject><subject>Data Structures</subject><subject>Database Management</subject><subject>Image analysis</subject><subject>Image resolution</subject><subject>Image segmentation</subject><subject>Information Systems Applications (incl.Internet)</subject><subject>Medical imaging</subject><subject>Memory Structures</subject><subject>Operating Systems</subject><subject>Parallel processing</subject><subject>Special Issue on Data Management and Analytics for Healthcare</subject><subject>Tiling</subject><issn>0926-8782</issn><issn>1573-7578</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNp1kVtv1DAQhS0EokvhB_CCLPHCS2DGSXzhAamquFRqhVTRZ8uJx7spSbzYm1b993i1pVwknizNfHM8Zw5jLxHeIoB6lxEkmgpQV0rUqsJHbIWtqivVKv2YrcAIWWmlxRF7lvM1ABiF6ik7qlGgktKs2NWFu6Szk_fc8X6Mi-cXbntJfumJdy6T55thveFbSiGmyc2lfLuJI_E8Dp74MLk1cTe78S4PmYfkJrqN6ftz9iS4MdOL-_eYXX36-O30S3X-9fPZ6cl51TcKdhUZQOmwBeeNcEFoWbdOK2hdJzrhA3oCL3rsVCOpRdMJDEqGrgHtNQasj9mHg-526SbyPc275Ea7TWWxdGejG-zfnXnY2HW8sbLVtQFZBN7cC6T4Y6G8s9OQexpHN1NcshWiabAW5XIFff0Peh2XVKzvKaEQNKIpFB6oPsWcE4WHZRDsPjR7CM2W0Ow-NLt38epPFw8Tv1IqgDgAubTmNaXfX_9f9Sfn36Hr</recordid><startdate>20190601</startdate><enddate>20190601</enddate><creator>Vo, Hoang</creator><creator>Kong, Jun</creator><creator>Teng, Dejun</creator><creator>Liang, Yanhui</creator><creator>Aji, Ablimit</creator><creator>Teodoro, George</creator><creator>Wang, Fusheng</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20190601</creationdate><title>MaReIA: a cloud MapReduce based high performance whole slide image analysis framework</title><author>Vo, Hoang ; Kong, Jun ; Teng, Dejun ; Liang, Yanhui ; Aji, Ablimit ; Teodoro, George ; Wang, Fusheng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c470t-e9016a150ad92af28635a8705ab2b2df1de0d2c1b746e519b21f76fb408d81f13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Computer memory</topic><topic>Computer Science</topic><topic>Data Structures</topic><topic>Database Management</topic><topic>Image analysis</topic><topic>Image resolution</topic><topic>Image segmentation</topic><topic>Information Systems Applications (incl.Internet)</topic><topic>Medical imaging</topic><topic>Memory Structures</topic><topic>Operating Systems</topic><topic>Parallel processing</topic><topic>Special Issue on Data Management and Analytics for Healthcare</topic><topic>Tiling</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Vo, Hoang</creatorcontrib><creatorcontrib>Kong, Jun</creatorcontrib><creatorcontrib>Teng, Dejun</creatorcontrib><creatorcontrib>Liang, Yanhui</creatorcontrib><creatorcontrib>Aji, Ablimit</creatorcontrib><creatorcontrib>Teodoro, George</creatorcontrib><creatorcontrib>Wang, Fusheng</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Distributed and parallel databases : an international journal</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Vo, Hoang</au><au>Kong, Jun</au><au>Teng, Dejun</au><au>Liang, Yanhui</au><au>Aji, Ablimit</au><au>Teodoro, George</au><au>Wang, Fusheng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MaReIA: a cloud MapReduce based high performance whole slide image analysis framework</atitle><jtitle>Distributed and parallel databases : an international journal</jtitle><stitle>Distrib Parallel Databases</stitle><addtitle>Distrib Parallel Databases</addtitle><date>2019-06-01</date><risdate>2019</risdate><volume>37</volume><issue>2</issue><spage>251</spage><epage>272</epage><pages>251-272</pages><issn>0926-8782</issn><eissn>1573-7578</eissn><abstract>Recent advancements in systematic analysis of high resolution whole slide images have increase efficiency of diagnosis, prognosis and prediction of cancer and important diseases. Due to the enormous sizes and dimensions of whole slide images, the analysis requires extensive computing resources which are not commonly available. Images have to be tiled for processing due to computer memory limitations, which lead to inaccurate results due to the ignorance of boundary crossing objects. Thus, we propose a generic and highly scalable cloud-based image analysis framework for whole slide images. The framework enables parallelized integration of image analysis steps, such as segmentation and aggregation of micro-structures in a single pipeline, and generation of final objects manageable by databases. The core concept relies on the abstraction of objects in whole slide images as different classes of spatial geometries, which in turn can be handled as text based records in MapReduce. The framework applies an overlapping partitioning scheme on images, and provides parallelization of tiling and image segmentation based on MapReduce architecture. It further provides robust object normalization, graceful handling of boundary objects with an efficient spatial indexing based matching method to generate accurate results. Our experiments on Amazon EMR show that MaReIA is highly scalable, generic and extremely cost effective by benchmark tests.</abstract><cop>New York</cop><pub>Springer US</pub><pmid>31217669</pmid><doi>10.1007/s10619-018-7237-1</doi><tpages>22</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0926-8782
ispartof Distributed and parallel databases : an international journal, 2019-06, Vol.37 (2), p.251-272
issn 0926-8782
1573-7578
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_6583906
source Springer Nature - Complete Springer Journals
subjects Computer memory
Computer Science
Data Structures
Database Management
Image analysis
Image resolution
Image segmentation
Information Systems Applications (incl.Internet)
Medical imaging
Memory Structures
Operating Systems
Parallel processing
Special Issue on Data Management and Analytics for Healthcare
Tiling
title MaReIA: a cloud MapReduce based high performance whole slide image analysis framework
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-15T19%3A57%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MaReIA:%20a%20cloud%20MapReduce%20based%20high%20performance%20whole%20slide%20image%20analysis%20framework&rft.jtitle=Distributed%20and%20parallel%20databases%20:%20an%20international%20journal&rft.au=Vo,%20Hoang&rft.date=2019-06-01&rft.volume=37&rft.issue=2&rft.spage=251&rft.epage=272&rft.pages=251-272&rft.issn=0926-8782&rft.eissn=1573-7578&rft_id=info:doi/10.1007/s10619-018-7237-1&rft_dat=%3Cproquest_pubme%3E2244132000%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2227108119&rft_id=info:pmid/31217669&rfr_iscdi=true