Development of a Semiautomated Database for Patients With Adult Congenital Heart Disease

Databases for Congenital Heart Disease (CHD) are effective in delivering accessible datasets ready for statistical inference. Data collection hitherto has, however, been labour and time intensive and has required substantial financial support to ensure sustainability. We propose here creation and pi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Canadian journal of cardiology 2022-10, Vol.38 (10), p.1634-1640
Hauptverfasser: Verma, Shourya, Alkan, Muhammet, Deligianni, Fani, Anagnostopoulos, Christos, Diller, Gerhard, Walker, Lisa, Johnston, Fiona C., Danton, Mark, Walker, Hamish, Swan, Lorna, Hunter, Amanda, McGuire, Alex, Dawes, Martin, Stott, Sharon, Lyndsey, Mitchell, Walker, Niki, Veldtman, Gruschen
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Databases for Congenital Heart Disease (CHD) are effective in delivering accessible datasets ready for statistical inference. Data collection hitherto has, however, been labour and time intensive and has required substantial financial support to ensure sustainability. We propose here creation and piloting of a semiautomated technique for data extraction from clinic letters to populate a clinical database. PDF formatted clinic letters stored in a local folder, through a series of algorithms, underwent data extraction, preprocessing, and analysis. Specific patient information (diagnoses, diagnostic complexity, interventions, arrhythmia, medications, and demographic data) was processed into text files and structured data tables, used to populate a database. A specific data validation schema was predefined to verify and accommodate the information populating the database. Unsupervised learning in the form of a dimensionality reduction technique was used to project data into 2 dimensions and visualize their intrinsic structure in relation to the diagnosis, medication, intervention, and European Society of Cardiology classification lists of disease complexity. Ninety-three randomly selected letters were reviewed manually for accuracy. There were 1409 consecutive outpatient clinic letters used to populate the Scottish Adult Congenital Cardiac Database. Mean patient age was 35.4 years; 47.6% female; with 698 (49.5%) having moderately complex, 369 (26.1%) greatly complex, and 284 (20.1%) mildly complex lesions. Individual diagnoses were successfully extracted in 96.95%, and demographic data were extracted in 100% of letters. Data extraction, database upload, data analysis and visualization took 571 seconds (9.51 minutes). Manual data extraction in the categories of diagnoses, intervention, and medications yielded accuracy of the computer algorithm in 94%, 93%, and 93%, respectively. Semiautomated data extraction from clinic letters into a database can be achieved successfully with a high degree of accuracy and efficiency. Les bases de données sur les cardiopathies congénitales (CC) sont un moyen efficace d’obtenir des ensembles de données pour réaliser des inférences statistiques. Cependant, la collecte de données était jusqu’ici une activité exigeant beaucoup de travail et de temps, et un soutien financier considérable était nécessaire pour en assurer la pérennité. Nous rapportons la création et l’essai pilote d’une technique d’extraction semi-automatique de donné
ISSN:0828-282X
1916-7075
DOI:10.1016/j.cjca.2022.05.022