Supervised embedding of textual predictors with applications in clinical diagnostics for pediatric cardiology

Electronic health records possess critical predictive information for machine-learning-based diagnostic aids. However, many traditional machine learning methods fail to simultaneously integrate textual data into the prediction process because of its high dimensionality. In this paper, we present a s...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of the American Medical Informatics Association : JAMIA 2014-02, Vol.21 (e1), p.e136-e142
Hauptverfasser: Perry, Thomas Ernest, Zha, Hongyuan, Zhou, Ke, Frias, Patricio, Zeng, Dadan, Braunstein, Mark
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page e142
container_issue e1
container_start_page e136
container_title Journal of the American Medical Informatics Association : JAMIA
container_volume 21
creator Perry, Thomas Ernest
Zha, Hongyuan
Zhou, Ke
Frias, Patricio
Zeng, Dadan
Braunstein, Mark
description Electronic health records possess critical predictive information for machine-learning-based diagnostic aids. However, many traditional machine learning methods fail to simultaneously integrate textual data into the prediction process because of its high dimensionality. In this paper, we present a supervised method using Laplacian Eigenmaps to enable existing machine learning methods to estimate both low-dimensional representations of textual data and accurate predictors based on these low-dimensional representations at the same time. We present a supervised Laplacian Eigenmap method to enhance predictive models by embedding textual predictors into a low-dimensional latent space, which preserves the local similarities among textual data in high-dimensional space. The proposed implementation performs alternating optimization using gradient descent. For the evaluation, we applied our method to over 2000 patient records from a large single-center pediatric cardiology practice to predict if patients were diagnosed with cardiac disease. In our experiments, we consider relatively short textual descriptions because of data availability. We compared our method with latent semantic indexing, latent Dirichlet allocation, and local Fisher discriminant analysis. The results were assessed using four metrics: the area under the receiver operating characteristic curve (AUC), Matthews correlation coefficient (MCC), specificity, and sensitivity. The results indicate that supervised Laplacian Eigenmaps was the highest performing method in our study, achieving 0.782 and 0.374 for AUC and MCC, respectively. Supervised Laplacian Eigenmaps showed an increase of 8.16% in AUC and 20.6% in MCC over the baseline that excluded textual data and a 2.69% and 5.35% increase in AUC and MCC, respectively, over unsupervised Laplacian Eigenmaps. As a solution, we present a supervised Laplacian Eigenmap method to embed textual predictors into a low-dimensional Euclidean space. This method allows many existing machine learning predictors to effectively and efficiently capture the potential of textual predictors, especially those based on short texts.
doi_str_mv 10.1136/amiajnl-2013-001792
format Article
fullrecord <record><control><sourceid>pubmed_cross</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_3957389</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>24076750</sourcerecordid><originalsourceid>FETCH-LOGICAL-c405t-6711a2aee6f9800e8291c1fbfeefe6d8e72d5ecc060b1f595d20050477a071fa3</originalsourceid><addsrcrecordid>eNpVkNtKAzEQhoMoHqpPIEheYHWSbTbdG0GKJxC8UMG7kCaTNrK7WZJU7du7pbXo1cww8_0DHyHnDC4ZK6sr3Xr90TUFB1YWAEzWfI8cM8FlUcvx-_7QQyULAVwekZOUPoabipfikBzxMchKCjgm7cuyx_jpE1qK7Qyt9d2cBkczfuelbmgf0XqTQ0z0y-cF1X3feKOzD12ivqOm8d0wN9R6Pe9Cyt4k6kKk_cDpHL2hRkfrQxPmq1Ny4HST8GxbR-Tt7vZ1-lA8Pd8_Tm-eCjMGkYtKMqa5RqxcPQHACa-ZYW7mEB1WdoKSW4HGQAUz5kQtLAcQMJZSg2ROlyNyvcntl7MWrcEuR92oPvpWx5UK2qv_m84v1Dx8qrIWspzUQ0C5CTAxpBTR7VgGam1fbe2rtX21sT9QF3_f7phf3eUPsQGHcw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Supervised embedding of textual predictors with applications in clinical diagnostics for pediatric cardiology</title><source>MEDLINE</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>Oxford University Press Journals All Titles (1996-Current)</source><source>PubMed Central</source><creator>Perry, Thomas Ernest ; Zha, Hongyuan ; Zhou, Ke ; Frias, Patricio ; Zeng, Dadan ; Braunstein, Mark</creator><creatorcontrib>Perry, Thomas Ernest ; Zha, Hongyuan ; Zhou, Ke ; Frias, Patricio ; Zeng, Dadan ; Braunstein, Mark</creatorcontrib><description>Electronic health records possess critical predictive information for machine-learning-based diagnostic aids. However, many traditional machine learning methods fail to simultaneously integrate textual data into the prediction process because of its high dimensionality. In this paper, we present a supervised method using Laplacian Eigenmaps to enable existing machine learning methods to estimate both low-dimensional representations of textual data and accurate predictors based on these low-dimensional representations at the same time. We present a supervised Laplacian Eigenmap method to enhance predictive models by embedding textual predictors into a low-dimensional latent space, which preserves the local similarities among textual data in high-dimensional space. The proposed implementation performs alternating optimization using gradient descent. For the evaluation, we applied our method to over 2000 patient records from a large single-center pediatric cardiology practice to predict if patients were diagnosed with cardiac disease. In our experiments, we consider relatively short textual descriptions because of data availability. We compared our method with latent semantic indexing, latent Dirichlet allocation, and local Fisher discriminant analysis. The results were assessed using four metrics: the area under the receiver operating characteristic curve (AUC), Matthews correlation coefficient (MCC), specificity, and sensitivity. The results indicate that supervised Laplacian Eigenmaps was the highest performing method in our study, achieving 0.782 and 0.374 for AUC and MCC, respectively. Supervised Laplacian Eigenmaps showed an increase of 8.16% in AUC and 20.6% in MCC over the baseline that excluded textual data and a 2.69% and 5.35% increase in AUC and MCC, respectively, over unsupervised Laplacian Eigenmaps. As a solution, we present a supervised Laplacian Eigenmap method to embed textual predictors into a low-dimensional Euclidean space. This method allows many existing machine learning predictors to effectively and efficiently capture the potential of textual predictors, especially those based on short texts.</description><identifier>ISSN: 1067-5027</identifier><identifier>EISSN: 1527-974X</identifier><identifier>DOI: 10.1136/amiajnl-2013-001792</identifier><identifier>PMID: 24076750</identifier><language>eng</language><publisher>England: BMJ Publishing Group</publisher><subject>Algorithms ; Area Under Curve ; Artificial Intelligence ; Cardiology - methods ; Diagnosis ; Discriminant Analysis ; Humans ; Pattern Recognition, Automated - methods ; Pediatrics - methods ; Research and Applications ; ROC Curve ; Sensitivity and Specificity</subject><ispartof>Journal of the American Medical Informatics Association : JAMIA, 2014-02, Vol.21 (e1), p.e136-e142</ispartof><rights>Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions 2014</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c405t-6711a2aee6f9800e8291c1fbfeefe6d8e72d5ecc060b1f595d20050477a071fa3</citedby><cites>FETCH-LOGICAL-c405t-6711a2aee6f9800e8291c1fbfeefe6d8e72d5ecc060b1f595d20050477a071fa3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3957389/pdf/$$EPDF$$P50$$Gpubmedcentral$$H</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3957389/$$EHTML$$P50$$Gpubmedcentral$$H</linktohtml><link.rule.ids>230,314,727,780,784,885,27924,27925,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/24076750$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Perry, Thomas Ernest</creatorcontrib><creatorcontrib>Zha, Hongyuan</creatorcontrib><creatorcontrib>Zhou, Ke</creatorcontrib><creatorcontrib>Frias, Patricio</creatorcontrib><creatorcontrib>Zeng, Dadan</creatorcontrib><creatorcontrib>Braunstein, Mark</creatorcontrib><title>Supervised embedding of textual predictors with applications in clinical diagnostics for pediatric cardiology</title><title>Journal of the American Medical Informatics Association : JAMIA</title><addtitle>J Am Med Inform Assoc</addtitle><description>Electronic health records possess critical predictive information for machine-learning-based diagnostic aids. However, many traditional machine learning methods fail to simultaneously integrate textual data into the prediction process because of its high dimensionality. In this paper, we present a supervised method using Laplacian Eigenmaps to enable existing machine learning methods to estimate both low-dimensional representations of textual data and accurate predictors based on these low-dimensional representations at the same time. We present a supervised Laplacian Eigenmap method to enhance predictive models by embedding textual predictors into a low-dimensional latent space, which preserves the local similarities among textual data in high-dimensional space. The proposed implementation performs alternating optimization using gradient descent. For the evaluation, we applied our method to over 2000 patient records from a large single-center pediatric cardiology practice to predict if patients were diagnosed with cardiac disease. In our experiments, we consider relatively short textual descriptions because of data availability. We compared our method with latent semantic indexing, latent Dirichlet allocation, and local Fisher discriminant analysis. The results were assessed using four metrics: the area under the receiver operating characteristic curve (AUC), Matthews correlation coefficient (MCC), specificity, and sensitivity. The results indicate that supervised Laplacian Eigenmaps was the highest performing method in our study, achieving 0.782 and 0.374 for AUC and MCC, respectively. Supervised Laplacian Eigenmaps showed an increase of 8.16% in AUC and 20.6% in MCC over the baseline that excluded textual data and a 2.69% and 5.35% increase in AUC and MCC, respectively, over unsupervised Laplacian Eigenmaps. As a solution, we present a supervised Laplacian Eigenmap method to embed textual predictors into a low-dimensional Euclidean space. This method allows many existing machine learning predictors to effectively and efficiently capture the potential of textual predictors, especially those based on short texts.</description><subject>Algorithms</subject><subject>Area Under Curve</subject><subject>Artificial Intelligence</subject><subject>Cardiology - methods</subject><subject>Diagnosis</subject><subject>Discriminant Analysis</subject><subject>Humans</subject><subject>Pattern Recognition, Automated - methods</subject><subject>Pediatrics - methods</subject><subject>Research and Applications</subject><subject>ROC Curve</subject><subject>Sensitivity and Specificity</subject><issn>1067-5027</issn><issn>1527-974X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNpVkNtKAzEQhoMoHqpPIEheYHWSbTbdG0GKJxC8UMG7kCaTNrK7WZJU7du7pbXo1cww8_0DHyHnDC4ZK6sr3Xr90TUFB1YWAEzWfI8cM8FlUcvx-_7QQyULAVwekZOUPoabipfikBzxMchKCjgm7cuyx_jpE1qK7Qyt9d2cBkczfuelbmgf0XqTQ0z0y-cF1X3feKOzD12ivqOm8d0wN9R6Pe9Cyt4k6kKk_cDpHL2hRkfrQxPmq1Ny4HST8GxbR-Tt7vZ1-lA8Pd8_Tm-eCjMGkYtKMqa5RqxcPQHACa-ZYW7mEB1WdoKSW4HGQAUz5kQtLAcQMJZSg2ROlyNyvcntl7MWrcEuR92oPvpWx5UK2qv_m84v1Dx8qrIWspzUQ0C5CTAxpBTR7VgGam1fbe2rtX21sT9QF3_f7phf3eUPsQGHcw</recordid><startdate>20140201</startdate><enddate>20140201</enddate><creator>Perry, Thomas Ernest</creator><creator>Zha, Hongyuan</creator><creator>Zhou, Ke</creator><creator>Frias, Patricio</creator><creator>Zeng, Dadan</creator><creator>Braunstein, Mark</creator><general>BMJ Publishing Group</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>5PM</scope></search><sort><creationdate>20140201</creationdate><title>Supervised embedding of textual predictors with applications in clinical diagnostics for pediatric cardiology</title><author>Perry, Thomas Ernest ; Zha, Hongyuan ; Zhou, Ke ; Frias, Patricio ; Zeng, Dadan ; Braunstein, Mark</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c405t-6711a2aee6f9800e8291c1fbfeefe6d8e72d5ecc060b1f595d20050477a071fa3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Algorithms</topic><topic>Area Under Curve</topic><topic>Artificial Intelligence</topic><topic>Cardiology - methods</topic><topic>Diagnosis</topic><topic>Discriminant Analysis</topic><topic>Humans</topic><topic>Pattern Recognition, Automated - methods</topic><topic>Pediatrics - methods</topic><topic>Research and Applications</topic><topic>ROC Curve</topic><topic>Sensitivity and Specificity</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Perry, Thomas Ernest</creatorcontrib><creatorcontrib>Zha, Hongyuan</creatorcontrib><creatorcontrib>Zhou, Ke</creatorcontrib><creatorcontrib>Frias, Patricio</creatorcontrib><creatorcontrib>Zeng, Dadan</creatorcontrib><creatorcontrib>Braunstein, Mark</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Journal of the American Medical Informatics Association : JAMIA</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Perry, Thomas Ernest</au><au>Zha, Hongyuan</au><au>Zhou, Ke</au><au>Frias, Patricio</au><au>Zeng, Dadan</au><au>Braunstein, Mark</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Supervised embedding of textual predictors with applications in clinical diagnostics for pediatric cardiology</atitle><jtitle>Journal of the American Medical Informatics Association : JAMIA</jtitle><addtitle>J Am Med Inform Assoc</addtitle><date>2014-02-01</date><risdate>2014</risdate><volume>21</volume><issue>e1</issue><spage>e136</spage><epage>e142</epage><pages>e136-e142</pages><issn>1067-5027</issn><eissn>1527-974X</eissn><abstract>Electronic health records possess critical predictive information for machine-learning-based diagnostic aids. However, many traditional machine learning methods fail to simultaneously integrate textual data into the prediction process because of its high dimensionality. In this paper, we present a supervised method using Laplacian Eigenmaps to enable existing machine learning methods to estimate both low-dimensional representations of textual data and accurate predictors based on these low-dimensional representations at the same time. We present a supervised Laplacian Eigenmap method to enhance predictive models by embedding textual predictors into a low-dimensional latent space, which preserves the local similarities among textual data in high-dimensional space. The proposed implementation performs alternating optimization using gradient descent. For the evaluation, we applied our method to over 2000 patient records from a large single-center pediatric cardiology practice to predict if patients were diagnosed with cardiac disease. In our experiments, we consider relatively short textual descriptions because of data availability. We compared our method with latent semantic indexing, latent Dirichlet allocation, and local Fisher discriminant analysis. The results were assessed using four metrics: the area under the receiver operating characteristic curve (AUC), Matthews correlation coefficient (MCC), specificity, and sensitivity. The results indicate that supervised Laplacian Eigenmaps was the highest performing method in our study, achieving 0.782 and 0.374 for AUC and MCC, respectively. Supervised Laplacian Eigenmaps showed an increase of 8.16% in AUC and 20.6% in MCC over the baseline that excluded textual data and a 2.69% and 5.35% increase in AUC and MCC, respectively, over unsupervised Laplacian Eigenmaps. As a solution, we present a supervised Laplacian Eigenmap method to embed textual predictors into a low-dimensional Euclidean space. This method allows many existing machine learning predictors to effectively and efficiently capture the potential of textual predictors, especially those based on short texts.</abstract><cop>England</cop><pub>BMJ Publishing Group</pub><pmid>24076750</pmid><doi>10.1136/amiajnl-2013-001792</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1067-5027
ispartof Journal of the American Medical Informatics Association : JAMIA, 2014-02, Vol.21 (e1), p.e136-e142
issn 1067-5027
1527-974X
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_3957389
source MEDLINE; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; Oxford University Press Journals All Titles (1996-Current); PubMed Central
subjects Algorithms
Area Under Curve
Artificial Intelligence
Cardiology - methods
Diagnosis
Discriminant Analysis
Humans
Pattern Recognition, Automated - methods
Pediatrics - methods
Research and Applications
ROC Curve
Sensitivity and Specificity
title Supervised embedding of textual predictors with applications in clinical diagnostics for pediatric cardiology
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-18T22%3A50%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-pubmed_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Supervised%20embedding%20of%20textual%20predictors%20with%20applications%20in%20clinical%20diagnostics%20for%20pediatric%20cardiology&rft.jtitle=Journal%20of%20the%20American%20Medical%20Informatics%20Association%20:%20JAMIA&rft.au=Perry,%20Thomas%20Ernest&rft.date=2014-02-01&rft.volume=21&rft.issue=e1&rft.spage=e136&rft.epage=e142&rft.pages=e136-e142&rft.issn=1067-5027&rft.eissn=1527-974X&rft_id=info:doi/10.1136/amiajnl-2013-001792&rft_dat=%3Cpubmed_cross%3E24076750%3C/pubmed_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/24076750&rfr_iscdi=true