A data-pipeline processing electrocardiogram recordings for use in artificial intelligence algorithms

Abstract Introduction Artificial intelligence (AI) can be used for various tasks in medicine and specifically in cardiology. Medical data such as electrocardiogram recordings (ECGs) are widely used and universally accepted as diagnostic and prognostic tools. It has been shown that deep learning meth...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:European heart journal 2021-10, Vol.42 (Supplement_1)
Hauptverfasser: Prim, J, Uhlemann, T, Gumpfer, N, Gruen, D, Wegener, S, Krug, S, Hannig, J, Keller, T, Guckert, M
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue Supplement_1
container_start_page
container_title European heart journal
container_volume 42
creator Prim, J
Uhlemann, T
Gumpfer, N
Gruen, D
Wegener, S
Krug, S
Hannig, J
Keller, T
Guckert, M
description Abstract Introduction Artificial intelligence (AI) can be used for various tasks in medicine and specifically in cardiology. Medical data such as electrocardiogram recordings (ECGs) are widely used and universally accepted as diagnostic and prognostic tools. It has been shown that deep learning methods using ECGs yield excellent results detecting cardiac pathologies. A significant amount of reliable data is required for supervised learning algorithms such as deep learning models. However, only a small fraction of ECG data generated in daily practice is available in a fully digital and machine-readable format, such as XML. Frequently, used ECG devices produce PDF files or even paper-based print outs, which need to be digitised later for inclusion in clinical information systems. Such ECGs cannot be used without further effort for training or application of deep learning models. Therefore, aim of the present project was to develop a data-pipeline that generates machine-readable ECG data for AI use data irrespective of the initial ECG format. Methods We propose an end-to-end pipeline that can not only process data from modern digital ECG devices but is also capable of extracting all necessary information from PDF files (both scanned hard copies and digitally generated PDFs) (see Figure 1). By using different techniques including adaption of open source libraries for vectorisation of image data, and modern computer vision technologies, such as optical character recognition (OCR), our pipeline is able to flexibly process data from different recording devices and read both data in PDF format and data from native digital devices delivered in XML. The processed files from various sources are either saved as a common and easily accessible CSV file format, or are processed directly with deep learning models (see Figure 2). Results The developed data-pipeline was validated using data from a set of 113 12-lead ECGs for which data was available in multiple formats. Each format dataset was separately processed by our pipeline and then used for training and validation of a deep learning architecture for myocardial scar detection based on raw ECG signals. The quality of the extraction process by our pipeline was assessed by the respective deep learning models with their prediction capability depicted by receiver operator characteristic analyses (ROC). Comparing the benchmark model that was generated from XML data against a model that was purely trained on PDF data process
doi_str_mv 10.1093/eurheartj/ehab724.3041
format Article
fullrecord <record><control><sourceid>oup_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1093_eurheartj_ehab724_3041</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/eurheartj/ehab724.3041</oup_id><sourcerecordid>10.1093/eurheartj/ehab724.3041</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1371-be8548450b8171328c572355a37fc9404c1a9aabe2699cbe1221ccaf9064bb583</originalsourceid><addsrcrecordid>eNqNkM1qwzAQBkVpoWnaVyh6ASdaWZKlYwj9g0AvLfRm1sraUXBsIzmHvn0dEnrOaZnDfAvD2DOIBQiXL-kYd4Rx3C9ph1Uh1SIXCm7YDLSUmTNK37KZAKczY-zPPXtIaS-EsAbMjNGKb3HEbAgDtaEjPsTeU0qhazi15McJMW5D30Q88Ei-n6BrEq_7yI-JeOj49DvUwQdsJxqpbUNDnSeObdPHMO4O6ZHd1dgmerrcOft-fflav2ebz7eP9WqTecgLyCqyWlmlRWWhgFxarwuZa415UXunhPKADrEiaZzzFYGU4D3WThhVVdrmc2bOuz72KUWqyyGGA8bfEkR5ilX-xyovscpTrEmEs9gfh2udPz2bdMM</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>A data-pipeline processing electrocardiogram recordings for use in artificial intelligence algorithms</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>Oxford University Press Journals All Titles (1996-Current)</source><source>Alma/SFX Local Collection</source><creator>Prim, J ; Uhlemann, T ; Gumpfer, N ; Gruen, D ; Wegener, S ; Krug, S ; Hannig, J ; Keller, T ; Guckert, M</creator><creatorcontrib>Prim, J ; Uhlemann, T ; Gumpfer, N ; Gruen, D ; Wegener, S ; Krug, S ; Hannig, J ; Keller, T ; Guckert, M</creatorcontrib><description>Abstract Introduction Artificial intelligence (AI) can be used for various tasks in medicine and specifically in cardiology. Medical data such as electrocardiogram recordings (ECGs) are widely used and universally accepted as diagnostic and prognostic tools. It has been shown that deep learning methods using ECGs yield excellent results detecting cardiac pathologies. A significant amount of reliable data is required for supervised learning algorithms such as deep learning models. However, only a small fraction of ECG data generated in daily practice is available in a fully digital and machine-readable format, such as XML. Frequently, used ECG devices produce PDF files or even paper-based print outs, which need to be digitised later for inclusion in clinical information systems. Such ECGs cannot be used without further effort for training or application of deep learning models. Therefore, aim of the present project was to develop a data-pipeline that generates machine-readable ECG data for AI use data irrespective of the initial ECG format. Methods We propose an end-to-end pipeline that can not only process data from modern digital ECG devices but is also capable of extracting all necessary information from PDF files (both scanned hard copies and digitally generated PDFs) (see Figure 1). By using different techniques including adaption of open source libraries for vectorisation of image data, and modern computer vision technologies, such as optical character recognition (OCR), our pipeline is able to flexibly process data from different recording devices and read both data in PDF format and data from native digital devices delivered in XML. The processed files from various sources are either saved as a common and easily accessible CSV file format, or are processed directly with deep learning models (see Figure 2). Results The developed data-pipeline was validated using data from a set of 113 12-lead ECGs for which data was available in multiple formats. Each format dataset was separately processed by our pipeline and then used for training and validation of a deep learning architecture for myocardial scar detection based on raw ECG signals. The quality of the extraction process by our pipeline was assessed by the respective deep learning models with their prediction capability depicted by receiver operator characteristic analyses (ROC). Comparing the benchmark model that was generated from XML data against a model that was purely trained on PDF data processed by the pipeline shows that both models produced comparable results, reaching area under the curve (AUC) values of 0:79±0:10 (XML) and 0:83±0:07 (PDF). Conclusion The data pipeline facilitates acceleration of ECG-based AI research and application of AI algorithms by providing access to ECG data irrespective of the format of the stored ECG. Future work will focus on independent validation as well as expanding this pipeline to include additional ECG types. Funding Acknowledgement Type of funding sources: Public Institution(s). Main funding source(s): Flexi Funds by Forschungscampus Mittelhessen</description><identifier>ISSN: 0195-668X</identifier><identifier>EISSN: 1522-9645</identifier><identifier>DOI: 10.1093/eurheartj/ehab724.3041</identifier><language>eng</language><publisher>Oxford University Press</publisher><ispartof>European heart journal, 2021-10, Vol.42 (Supplement_1)</ispartof><rights>Published on behalf of the European Society of Cardiology. All rights reserved. © The Author(s) 2021. For permissions, please email: journals.permissions@oup.com. 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Prim, J</creatorcontrib><creatorcontrib>Uhlemann, T</creatorcontrib><creatorcontrib>Gumpfer, N</creatorcontrib><creatorcontrib>Gruen, D</creatorcontrib><creatorcontrib>Wegener, S</creatorcontrib><creatorcontrib>Krug, S</creatorcontrib><creatorcontrib>Hannig, J</creatorcontrib><creatorcontrib>Keller, T</creatorcontrib><creatorcontrib>Guckert, M</creatorcontrib><title>A data-pipeline processing electrocardiogram recordings for use in artificial intelligence algorithms</title><title>European heart journal</title><description>Abstract Introduction Artificial intelligence (AI) can be used for various tasks in medicine and specifically in cardiology. Medical data such as electrocardiogram recordings (ECGs) are widely used and universally accepted as diagnostic and prognostic tools. It has been shown that deep learning methods using ECGs yield excellent results detecting cardiac pathologies. A significant amount of reliable data is required for supervised learning algorithms such as deep learning models. However, only a small fraction of ECG data generated in daily practice is available in a fully digital and machine-readable format, such as XML. Frequently, used ECG devices produce PDF files or even paper-based print outs, which need to be digitised later for inclusion in clinical information systems. Such ECGs cannot be used without further effort for training or application of deep learning models. Therefore, aim of the present project was to develop a data-pipeline that generates machine-readable ECG data for AI use data irrespective of the initial ECG format. Methods We propose an end-to-end pipeline that can not only process data from modern digital ECG devices but is also capable of extracting all necessary information from PDF files (both scanned hard copies and digitally generated PDFs) (see Figure 1). By using different techniques including adaption of open source libraries for vectorisation of image data, and modern computer vision technologies, such as optical character recognition (OCR), our pipeline is able to flexibly process data from different recording devices and read both data in PDF format and data from native digital devices delivered in XML. The processed files from various sources are either saved as a common and easily accessible CSV file format, or are processed directly with deep learning models (see Figure 2). Results The developed data-pipeline was validated using data from a set of 113 12-lead ECGs for which data was available in multiple formats. Each format dataset was separately processed by our pipeline and then used for training and validation of a deep learning architecture for myocardial scar detection based on raw ECG signals. The quality of the extraction process by our pipeline was assessed by the respective deep learning models with their prediction capability depicted by receiver operator characteristic analyses (ROC). Comparing the benchmark model that was generated from XML data against a model that was purely trained on PDF data processed by the pipeline shows that both models produced comparable results, reaching area under the curve (AUC) values of 0:79±0:10 (XML) and 0:83±0:07 (PDF). Conclusion The data pipeline facilitates acceleration of ECG-based AI research and application of AI algorithms by providing access to ECG data irrespective of the format of the stored ECG. Future work will focus on independent validation as well as expanding this pipeline to include additional ECG types. Funding Acknowledgement Type of funding sources: Public Institution(s). Main funding source(s): Flexi Funds by Forschungscampus Mittelhessen</description><issn>0195-668X</issn><issn>1522-9645</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNqNkM1qwzAQBkVpoWnaVyh6ASdaWZKlYwj9g0AvLfRm1sraUXBsIzmHvn0dEnrOaZnDfAvD2DOIBQiXL-kYd4Rx3C9ph1Uh1SIXCm7YDLSUmTNK37KZAKczY-zPPXtIaS-EsAbMjNGKb3HEbAgDtaEjPsTeU0qhazi15McJMW5D30Q88Ei-n6BrEq_7yI-JeOj49DvUwQdsJxqpbUNDnSeObdPHMO4O6ZHd1dgmerrcOft-fflav2ebz7eP9WqTecgLyCqyWlmlRWWhgFxarwuZa415UXunhPKADrEiaZzzFYGU4D3WThhVVdrmc2bOuz72KUWqyyGGA8bfEkR5ilX-xyovscpTrEmEs9gfh2udPz2bdMM</recordid><startdate>20211012</startdate><enddate>20211012</enddate><creator>Prim, J</creator><creator>Uhlemann, T</creator><creator>Gumpfer, N</creator><creator>Gruen, D</creator><creator>Wegener, S</creator><creator>Krug, S</creator><creator>Hannig, J</creator><creator>Keller, T</creator><creator>Guckert, M</creator><general>Oxford University Press</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20211012</creationdate><title>A data-pipeline processing electrocardiogram recordings for use in artificial intelligence algorithms</title><author>Prim, J ; Uhlemann, T ; Gumpfer, N ; Gruen, D ; Wegener, S ; Krug, S ; Hannig, J ; Keller, T ; Guckert, M</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1371-be8548450b8171328c572355a37fc9404c1a9aabe2699cbe1221ccaf9064bb583</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Prim, J</creatorcontrib><creatorcontrib>Uhlemann, T</creatorcontrib><creatorcontrib>Gumpfer, N</creatorcontrib><creatorcontrib>Gruen, D</creatorcontrib><creatorcontrib>Wegener, S</creatorcontrib><creatorcontrib>Krug, S</creatorcontrib><creatorcontrib>Hannig, J</creatorcontrib><creatorcontrib>Keller, T</creatorcontrib><creatorcontrib>Guckert, M</creatorcontrib><collection>CrossRef</collection><jtitle>European heart journal</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Prim, J</au><au>Uhlemann, T</au><au>Gumpfer, N</au><au>Gruen, D</au><au>Wegener, S</au><au>Krug, S</au><au>Hannig, J</au><au>Keller, T</au><au>Guckert, M</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A data-pipeline processing electrocardiogram recordings for use in artificial intelligence algorithms</atitle><jtitle>European heart journal</jtitle><date>2021-10-12</date><risdate>2021</risdate><volume>42</volume><issue>Supplement_1</issue><issn>0195-668X</issn><eissn>1522-9645</eissn><abstract>Abstract Introduction Artificial intelligence (AI) can be used for various tasks in medicine and specifically in cardiology. Medical data such as electrocardiogram recordings (ECGs) are widely used and universally accepted as diagnostic and prognostic tools. It has been shown that deep learning methods using ECGs yield excellent results detecting cardiac pathologies. A significant amount of reliable data is required for supervised learning algorithms such as deep learning models. However, only a small fraction of ECG data generated in daily practice is available in a fully digital and machine-readable format, such as XML. Frequently, used ECG devices produce PDF files or even paper-based print outs, which need to be digitised later for inclusion in clinical information systems. Such ECGs cannot be used without further effort for training or application of deep learning models. Therefore, aim of the present project was to develop a data-pipeline that generates machine-readable ECG data for AI use data irrespective of the initial ECG format. Methods We propose an end-to-end pipeline that can not only process data from modern digital ECG devices but is also capable of extracting all necessary information from PDF files (both scanned hard copies and digitally generated PDFs) (see Figure 1). By using different techniques including adaption of open source libraries for vectorisation of image data, and modern computer vision technologies, such as optical character recognition (OCR), our pipeline is able to flexibly process data from different recording devices and read both data in PDF format and data from native digital devices delivered in XML. The processed files from various sources are either saved as a common and easily accessible CSV file format, or are processed directly with deep learning models (see Figure 2). Results The developed data-pipeline was validated using data from a set of 113 12-lead ECGs for which data was available in multiple formats. Each format dataset was separately processed by our pipeline and then used for training and validation of a deep learning architecture for myocardial scar detection based on raw ECG signals. The quality of the extraction process by our pipeline was assessed by the respective deep learning models with their prediction capability depicted by receiver operator characteristic analyses (ROC). Comparing the benchmark model that was generated from XML data against a model that was purely trained on PDF data processed by the pipeline shows that both models produced comparable results, reaching area under the curve (AUC) values of 0:79±0:10 (XML) and 0:83±0:07 (PDF). Conclusion The data pipeline facilitates acceleration of ECG-based AI research and application of AI algorithms by providing access to ECG data irrespective of the format of the stored ECG. Future work will focus on independent validation as well as expanding this pipeline to include additional ECG types. Funding Acknowledgement Type of funding sources: Public Institution(s). Main funding source(s): Flexi Funds by Forschungscampus Mittelhessen</abstract><pub>Oxford University Press</pub><doi>10.1093/eurheartj/ehab724.3041</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0195-668X
ispartof European heart journal, 2021-10, Vol.42 (Supplement_1)
issn 0195-668X
1522-9645
language eng
recordid cdi_crossref_primary_10_1093_eurheartj_ehab724_3041
source Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; Oxford University Press Journals All Titles (1996-Current); Alma/SFX Local Collection
title A data-pipeline processing electrocardiogram recordings for use in artificial intelligence algorithms
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T12%3A48%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-oup_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20data-pipeline%20processing%20electrocardiogram%20recordings%20for%20use%20in%20artificial%20intelligence%20algorithms&rft.jtitle=European%20heart%20journal&rft.au=Prim,%20J&rft.date=2021-10-12&rft.volume=42&rft.issue=Supplement_1&rft.issn=0195-668X&rft.eissn=1522-9645&rft_id=info:doi/10.1093/eurheartj/ehab724.3041&rft_dat=%3Coup_cross%3E10.1093/eurheartj/ehab724.3041%3C/oup_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_oup_id=10.1093/eurheartj/ehab724.3041&rfr_iscdi=true