FasTag: Automatic text classification of unstructured medical narratives

Unstructured clinical narratives are continuously being recorded as part of delivery of care in electronic health records, and dedicated tagging staff spend considerable effort manually assigning clinical codes for billing purposes. Despite these efforts, however, label availability and accuracy are...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:PloS one 2020-06, Vol.15 (6), p.e0234647-e0234647
Hauptverfasser: Venkataraman, Guhan Ram, Pineda, Arturo Lopez, Bear Don't Walk Iv, Oliver J, Zehnder, Ashley M, Ayyar, Sandeep, Page, Rodney L, Bustamante, Carlos D, Rivas, Manuel A
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page e0234647
container_issue 6
container_start_page e0234647
container_title PloS one
container_volume 15
creator Venkataraman, Guhan Ram
Pineda, Arturo Lopez
Bear Don't Walk Iv, Oliver J
Zehnder, Ashley M
Ayyar, Sandeep
Page, Rodney L
Bustamante, Carlos D
Rivas, Manuel A
description Unstructured clinical narratives are continuously being recorded as part of delivery of care in electronic health records, and dedicated tagging staff spend considerable effort manually assigning clinical codes for billing purposes. Despite these efforts, however, label availability and accuracy are both suboptimal. In this retrospective study, we aimed to automate the assignment of top-level International Classification of Diseases version 9 (ICD-9) codes to clinical records from human and veterinary data stores using minimal manual labor and feature curation. Automating top-level annotations could in turn enable rapid cohort identification, especially in a veterinary setting. To this end, we trained long short-term memory (LSTM) recurrent neural networks (RNNs) on 52,722 human and 89,591 veterinary records. We investigated the accuracy of both separate-domain and combined-domain models and probed model portability. We established relevant baseline classification performances by training Decision Trees (DT) and Random Forests (RF). We also investigated whether transforming the data using MetaMap Lite, a clinical natural language processing tool, affected classification performance. We showed that the LSTM-RNNs accurately classify veterinary and human text narratives into top-level categories with an average weighted macro F1 score of 0.74 and 0.68 respectively. In the "neoplasia" category, the model trained on veterinary data had a high validation accuracy in veterinary data and moderate accuracy in human data, with F1 scores of 0.91 and 0.70 respectively. Our LSTM method scored slightly higher than that of the DT and RF models. The use of LSTM-RNN models represents a scalable structure that could prove useful in cohort identification for comparative oncology studies. Digitization of human and veterinary health information will continue to be a reality, particularly in the form of unstructured narratives. Our approach is a step forward for these two domains to learn from and inform one another.
doi_str_mv 10.1371/journal.pone.0234647
format Article
fullrecord <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_2415814189</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A627340808</galeid><doaj_id>oai_doaj_org_article_eaecb0fb69b04760ae33474f226bb9e1</doaj_id><sourcerecordid>A627340808</sourcerecordid><originalsourceid>FETCH-LOGICAL-c692t-bfbbda2f7d111d98b9fc67c8faacf05248733a1765581511534e0bbc9eb907e3</originalsourceid><addsrcrecordid>eNqNkl2L1DAUhoso7of-A9GCsOjFjPlq0nohDIvrDiws6OBtOEmTmQ6ZZkzaRf-96U53mcpeSC9SznnOe3JO3ix7g9EcU4E_bX0fWnDzvW_NHBHKOBPPslNcUTLjBNHnR_8n2VmMW4QKWnL-MjuhpOApJ06z6yuIK1h_zhd953fQNTrvzO8u1w5ibGyjU8i3ubd538Yu9Lrrg6nznalTyuUthJCIOxNfZS8suGhej-d5trr6urq8nt3cflteLm5mmlekmymrVA3EihpjXFelqqzmQpcWQFtUEFYKSgELXhQlLjAuKDNIKV0ZVSFh6Hn27iC7dz7KcQdREoYTz3BZJWJ5IGoPW7kPzQ7CH-mhkfcBH9YSQprTGWnAaIWs4pVCTHAEhlImmCWEK1UZnLS-jN16lUbWpu0CuInoNNM2G7n2d1JQJASnSeDDKBD8r97ETu6aqI1z0Brf39-bE1EgPPR6_w_69HQjtYY0QNNan_rqQVQukhJlqERlouZPUOmrza7RyTC2SfFJwcdJQWIGG6yhj1Euf3z_f_b255S9OGI3Bly3id71g6niFGQHUAcfYzD2cckYycHvD9uQg9_l6PdU9vb4gR6LHgxO_wKl9fsF</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2415814189</pqid></control><display><type>article</type><title>FasTag: Automatic text classification of unstructured medical narratives</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Public Library of Science (PLoS) Journals Open Access</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><creator>Venkataraman, Guhan Ram ; Pineda, Arturo Lopez ; Bear Don't Walk Iv, Oliver J ; Zehnder, Ashley M ; Ayyar, Sandeep ; Page, Rodney L ; Bustamante, Carlos D ; Rivas, Manuel A</creator><contributor>Clegg, Simon</contributor><creatorcontrib>Venkataraman, Guhan Ram ; Pineda, Arturo Lopez ; Bear Don't Walk Iv, Oliver J ; Zehnder, Ashley M ; Ayyar, Sandeep ; Page, Rodney L ; Bustamante, Carlos D ; Rivas, Manuel A ; Clegg, Simon</creatorcontrib><description>Unstructured clinical narratives are continuously being recorded as part of delivery of care in electronic health records, and dedicated tagging staff spend considerable effort manually assigning clinical codes for billing purposes. Despite these efforts, however, label availability and accuracy are both suboptimal. In this retrospective study, we aimed to automate the assignment of top-level International Classification of Diseases version 9 (ICD-9) codes to clinical records from human and veterinary data stores using minimal manual labor and feature curation. Automating top-level annotations could in turn enable rapid cohort identification, especially in a veterinary setting. To this end, we trained long short-term memory (LSTM) recurrent neural networks (RNNs) on 52,722 human and 89,591 veterinary records. We investigated the accuracy of both separate-domain and combined-domain models and probed model portability. We established relevant baseline classification performances by training Decision Trees (DT) and Random Forests (RF). We also investigated whether transforming the data using MetaMap Lite, a clinical natural language processing tool, affected classification performance. We showed that the LSTM-RNNs accurately classify veterinary and human text narratives into top-level categories with an average weighted macro F1 score of 0.74 and 0.68 respectively. In the "neoplasia" category, the model trained on veterinary data had a high validation accuracy in veterinary data and moderate accuracy in human data, with F1 scores of 0.91 and 0.70 respectively. Our LSTM method scored slightly higher than that of the DT and RF models. The use of LSTM-RNN models represents a scalable structure that could prove useful in cohort identification for comparative oncology studies. Digitization of human and veterinary health information will continue to be a reality, particularly in the form of unstructured narratives. Our approach is a step forward for these two domains to learn from and inform one another.</description><identifier>ISSN: 1932-6203</identifier><identifier>EISSN: 1932-6203</identifier><identifier>DOI: 10.1371/journal.pone.0234647</identifier><identifier>PMID: 32569327</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Animals ; Annotations ; Artificial neural networks ; Automation ; Biology and Life Sciences ; Cancer ; Classification ; Clinical coding ; Clinical medicine ; Codes ; Computer and Information Sciences ; Data Mining ; Data processing ; Data storage ; Databases as Topic ; Decision trees ; Dictionaries ; Disease ; Domains ; Electronic health records ; Electronic medical records ; Electronic records ; Engineering and Technology ; Humans ; Long short-term memory ; Machine learning ; Management ; Medical coding ; Medical diagnosis ; Medical records ; Medicine and Health Sciences ; Methods ; Model accuracy ; Narrative Medicine ; Narratives ; Natural language processing ; Neural networks ; Oncology ; Physical work ; Recurrent neural networks ; Reproducibility of Results ; Research and Analysis Methods ; Schools ; Semantics ; Short term memory ; Software ; Species Specificity ; Tagging ; Technology application ; Unstructured data ; Veterinary colleges ; Veterinary medicine</subject><ispartof>PloS one, 2020-06, Vol.15 (6), p.e0234647-e0234647</ispartof><rights>COPYRIGHT 2020 Public Library of Science</rights><rights>2020 Venkataraman et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2020 Venkataraman et al 2020 Venkataraman et al</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c692t-bfbbda2f7d111d98b9fc67c8faacf05248733a1765581511534e0bbc9eb907e3</citedby><cites>FETCH-LOGICAL-c692t-bfbbda2f7d111d98b9fc67c8faacf05248733a1765581511534e0bbc9eb907e3</cites><orcidid>0000-0003-1457-9925</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7307763/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7307763/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,2102,2928,23866,27924,27925,53791,53793,79600,79601</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/32569327$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Clegg, Simon</contributor><creatorcontrib>Venkataraman, Guhan Ram</creatorcontrib><creatorcontrib>Pineda, Arturo Lopez</creatorcontrib><creatorcontrib>Bear Don't Walk Iv, Oliver J</creatorcontrib><creatorcontrib>Zehnder, Ashley M</creatorcontrib><creatorcontrib>Ayyar, Sandeep</creatorcontrib><creatorcontrib>Page, Rodney L</creatorcontrib><creatorcontrib>Bustamante, Carlos D</creatorcontrib><creatorcontrib>Rivas, Manuel A</creatorcontrib><title>FasTag: Automatic text classification of unstructured medical narratives</title><title>PloS one</title><addtitle>PLoS One</addtitle><description>Unstructured clinical narratives are continuously being recorded as part of delivery of care in electronic health records, and dedicated tagging staff spend considerable effort manually assigning clinical codes for billing purposes. Despite these efforts, however, label availability and accuracy are both suboptimal. In this retrospective study, we aimed to automate the assignment of top-level International Classification of Diseases version 9 (ICD-9) codes to clinical records from human and veterinary data stores using minimal manual labor and feature curation. Automating top-level annotations could in turn enable rapid cohort identification, especially in a veterinary setting. To this end, we trained long short-term memory (LSTM) recurrent neural networks (RNNs) on 52,722 human and 89,591 veterinary records. We investigated the accuracy of both separate-domain and combined-domain models and probed model portability. We established relevant baseline classification performances by training Decision Trees (DT) and Random Forests (RF). We also investigated whether transforming the data using MetaMap Lite, a clinical natural language processing tool, affected classification performance. We showed that the LSTM-RNNs accurately classify veterinary and human text narratives into top-level categories with an average weighted macro F1 score of 0.74 and 0.68 respectively. In the "neoplasia" category, the model trained on veterinary data had a high validation accuracy in veterinary data and moderate accuracy in human data, with F1 scores of 0.91 and 0.70 respectively. Our LSTM method scored slightly higher than that of the DT and RF models. The use of LSTM-RNN models represents a scalable structure that could prove useful in cohort identification for comparative oncology studies. Digitization of human and veterinary health information will continue to be a reality, particularly in the form of unstructured narratives. Our approach is a step forward for these two domains to learn from and inform one another.</description><subject>Animals</subject><subject>Annotations</subject><subject>Artificial neural networks</subject><subject>Automation</subject><subject>Biology and Life Sciences</subject><subject>Cancer</subject><subject>Classification</subject><subject>Clinical coding</subject><subject>Clinical medicine</subject><subject>Codes</subject><subject>Computer and Information Sciences</subject><subject>Data Mining</subject><subject>Data processing</subject><subject>Data storage</subject><subject>Databases as Topic</subject><subject>Decision trees</subject><subject>Dictionaries</subject><subject>Disease</subject><subject>Domains</subject><subject>Electronic health records</subject><subject>Electronic medical records</subject><subject>Electronic records</subject><subject>Engineering and Technology</subject><subject>Humans</subject><subject>Long short-term memory</subject><subject>Machine learning</subject><subject>Management</subject><subject>Medical coding</subject><subject>Medical diagnosis</subject><subject>Medical records</subject><subject>Medicine and Health Sciences</subject><subject>Methods</subject><subject>Model accuracy</subject><subject>Narrative Medicine</subject><subject>Narratives</subject><subject>Natural language processing</subject><subject>Neural networks</subject><subject>Oncology</subject><subject>Physical work</subject><subject>Recurrent neural networks</subject><subject>Reproducibility of Results</subject><subject>Research and Analysis Methods</subject><subject>Schools</subject><subject>Semantics</subject><subject>Short term memory</subject><subject>Software</subject><subject>Species Specificity</subject><subject>Tagging</subject><subject>Technology application</subject><subject>Unstructured data</subject><subject>Veterinary colleges</subject><subject>Veterinary medicine</subject><issn>1932-6203</issn><issn>1932-6203</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>DOA</sourceid><recordid>eNqNkl2L1DAUhoso7of-A9GCsOjFjPlq0nohDIvrDiws6OBtOEmTmQ6ZZkzaRf-96U53mcpeSC9SznnOe3JO3ix7g9EcU4E_bX0fWnDzvW_NHBHKOBPPslNcUTLjBNHnR_8n2VmMW4QKWnL-MjuhpOApJ06z6yuIK1h_zhd953fQNTrvzO8u1w5ibGyjU8i3ubd538Yu9Lrrg6nznalTyuUthJCIOxNfZS8suGhej-d5trr6urq8nt3cflteLm5mmlekmymrVA3EihpjXFelqqzmQpcWQFtUEFYKSgELXhQlLjAuKDNIKV0ZVSFh6Hn27iC7dz7KcQdREoYTz3BZJWJ5IGoPW7kPzQ7CH-mhkfcBH9YSQprTGWnAaIWs4pVCTHAEhlImmCWEK1UZnLS-jN16lUbWpu0CuInoNNM2G7n2d1JQJASnSeDDKBD8r97ETu6aqI1z0Brf39-bE1EgPPR6_w_69HQjtYY0QNNan_rqQVQukhJlqERlouZPUOmrza7RyTC2SfFJwcdJQWIGG6yhj1Euf3z_f_b255S9OGI3Bly3id71g6niFGQHUAcfYzD2cckYycHvD9uQg9_l6PdU9vb4gR6LHgxO_wKl9fsF</recordid><startdate>20200622</startdate><enddate>20200622</enddate><creator>Venkataraman, Guhan Ram</creator><creator>Pineda, Arturo Lopez</creator><creator>Bear Don't Walk Iv, Oliver J</creator><creator>Zehnder, Ashley M</creator><creator>Ayyar, Sandeep</creator><creator>Page, Rodney L</creator><creator>Bustamante, Carlos D</creator><creator>Rivas, Manuel A</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>IOV</scope><scope>ISR</scope><scope>3V.</scope><scope>7QG</scope><scope>7QL</scope><scope>7QO</scope><scope>7RV</scope><scope>7SN</scope><scope>7SS</scope><scope>7T5</scope><scope>7TG</scope><scope>7TM</scope><scope>7U9</scope><scope>7X2</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>ATCPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>D1I</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>KB.</scope><scope>KB0</scope><scope>KL.</scope><scope>L6V</scope><scope>LK8</scope><scope>M0K</scope><scope>M0S</scope><scope>M1P</scope><scope>M7N</scope><scope>M7P</scope><scope>M7S</scope><scope>NAPCQ</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PATMY</scope><scope>PDBOC</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>PYCSY</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-1457-9925</orcidid></search><sort><creationdate>20200622</creationdate><title>FasTag: Automatic text classification of unstructured medical narratives</title><author>Venkataraman, Guhan Ram ; Pineda, Arturo Lopez ; Bear Don't Walk Iv, Oliver J ; Zehnder, Ashley M ; Ayyar, Sandeep ; Page, Rodney L ; Bustamante, Carlos D ; Rivas, Manuel A</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c692t-bfbbda2f7d111d98b9fc67c8faacf05248733a1765581511534e0bbc9eb907e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Animals</topic><topic>Annotations</topic><topic>Artificial neural networks</topic><topic>Automation</topic><topic>Biology and Life Sciences</topic><topic>Cancer</topic><topic>Classification</topic><topic>Clinical coding</topic><topic>Clinical medicine</topic><topic>Codes</topic><topic>Computer and Information Sciences</topic><topic>Data Mining</topic><topic>Data processing</topic><topic>Data storage</topic><topic>Databases as Topic</topic><topic>Decision trees</topic><topic>Dictionaries</topic><topic>Disease</topic><topic>Domains</topic><topic>Electronic health records</topic><topic>Electronic medical records</topic><topic>Electronic records</topic><topic>Engineering and Technology</topic><topic>Humans</topic><topic>Long short-term memory</topic><topic>Machine learning</topic><topic>Management</topic><topic>Medical coding</topic><topic>Medical diagnosis</topic><topic>Medical records</topic><topic>Medicine and Health Sciences</topic><topic>Methods</topic><topic>Model accuracy</topic><topic>Narrative Medicine</topic><topic>Narratives</topic><topic>Natural language processing</topic><topic>Neural networks</topic><topic>Oncology</topic><topic>Physical work</topic><topic>Recurrent neural networks</topic><topic>Reproducibility of Results</topic><topic>Research and Analysis Methods</topic><topic>Schools</topic><topic>Semantics</topic><topic>Short term memory</topic><topic>Software</topic><topic>Species Specificity</topic><topic>Tagging</topic><topic>Technology application</topic><topic>Unstructured data</topic><topic>Veterinary colleges</topic><topic>Veterinary medicine</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Venkataraman, Guhan Ram</creatorcontrib><creatorcontrib>Pineda, Arturo Lopez</creatorcontrib><creatorcontrib>Bear Don't Walk Iv, Oliver J</creatorcontrib><creatorcontrib>Zehnder, Ashley M</creatorcontrib><creatorcontrib>Ayyar, Sandeep</creatorcontrib><creatorcontrib>Page, Rodney L</creatorcontrib><creatorcontrib>Bustamante, Carlos D</creatorcontrib><creatorcontrib>Rivas, Manuel A</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Opposing Viewpoints</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Nursing &amp; Allied Health Database</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Immunology Abstracts</collection><collection>Meteorological &amp; Geoastrophysical Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Agricultural Science Collection</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Public Health Database</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>Agricultural &amp; Environmental Science Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Materials Science Collection</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Materials Science Database</collection><collection>Nursing &amp; Allied Health Database (Alumni Edition)</collection><collection>Meteorological &amp; Geoastrophysical Abstracts - Academic</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Agricultural Science Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biological Science Database</collection><collection>Engineering Database</collection><collection>Nursing &amp; Allied Health Premium</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Environmental Science Database</collection><collection>Materials Science Collection</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>Environmental Science Collection</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PloS one</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Venkataraman, Guhan Ram</au><au>Pineda, Arturo Lopez</au><au>Bear Don't Walk Iv, Oliver J</au><au>Zehnder, Ashley M</au><au>Ayyar, Sandeep</au><au>Page, Rodney L</au><au>Bustamante, Carlos D</au><au>Rivas, Manuel A</au><au>Clegg, Simon</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>FasTag: Automatic text classification of unstructured medical narratives</atitle><jtitle>PloS one</jtitle><addtitle>PLoS One</addtitle><date>2020-06-22</date><risdate>2020</risdate><volume>15</volume><issue>6</issue><spage>e0234647</spage><epage>e0234647</epage><pages>e0234647-e0234647</pages><issn>1932-6203</issn><eissn>1932-6203</eissn><abstract>Unstructured clinical narratives are continuously being recorded as part of delivery of care in electronic health records, and dedicated tagging staff spend considerable effort manually assigning clinical codes for billing purposes. Despite these efforts, however, label availability and accuracy are both suboptimal. In this retrospective study, we aimed to automate the assignment of top-level International Classification of Diseases version 9 (ICD-9) codes to clinical records from human and veterinary data stores using minimal manual labor and feature curation. Automating top-level annotations could in turn enable rapid cohort identification, especially in a veterinary setting. To this end, we trained long short-term memory (LSTM) recurrent neural networks (RNNs) on 52,722 human and 89,591 veterinary records. We investigated the accuracy of both separate-domain and combined-domain models and probed model portability. We established relevant baseline classification performances by training Decision Trees (DT) and Random Forests (RF). We also investigated whether transforming the data using MetaMap Lite, a clinical natural language processing tool, affected classification performance. We showed that the LSTM-RNNs accurately classify veterinary and human text narratives into top-level categories with an average weighted macro F1 score of 0.74 and 0.68 respectively. In the "neoplasia" category, the model trained on veterinary data had a high validation accuracy in veterinary data and moderate accuracy in human data, with F1 scores of 0.91 and 0.70 respectively. Our LSTM method scored slightly higher than that of the DT and RF models. The use of LSTM-RNN models represents a scalable structure that could prove useful in cohort identification for comparative oncology studies. Digitization of human and veterinary health information will continue to be a reality, particularly in the form of unstructured narratives. Our approach is a step forward for these two domains to learn from and inform one another.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>32569327</pmid><doi>10.1371/journal.pone.0234647</doi><tpages>e0234647</tpages><orcidid>https://orcid.org/0000-0003-1457-9925</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1932-6203
ispartof PloS one, 2020-06, Vol.15 (6), p.e0234647-e0234647
issn 1932-6203
1932-6203
language eng
recordid cdi_plos_journals_2415814189
source MEDLINE; DOAJ Directory of Open Access Journals; Public Library of Science (PLoS) Journals Open Access; EZB-FREE-00999 freely available EZB journals; PubMed Central; Free Full-Text Journals in Chemistry
subjects Animals
Annotations
Artificial neural networks
Automation
Biology and Life Sciences
Cancer
Classification
Clinical coding
Clinical medicine
Codes
Computer and Information Sciences
Data Mining
Data processing
Data storage
Databases as Topic
Decision trees
Dictionaries
Disease
Domains
Electronic health records
Electronic medical records
Electronic records
Engineering and Technology
Humans
Long short-term memory
Machine learning
Management
Medical coding
Medical diagnosis
Medical records
Medicine and Health Sciences
Methods
Model accuracy
Narrative Medicine
Narratives
Natural language processing
Neural networks
Oncology
Physical work
Recurrent neural networks
Reproducibility of Results
Research and Analysis Methods
Schools
Semantics
Short term memory
Software
Species Specificity
Tagging
Technology application
Unstructured data
Veterinary colleges
Veterinary medicine
title FasTag: Automatic text classification of unstructured medical narratives
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T17%3A23%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=FasTag:%20Automatic%20text%20classification%20of%20unstructured%20medical%20narratives&rft.jtitle=PloS%20one&rft.au=Venkataraman,%20Guhan%20Ram&rft.date=2020-06-22&rft.volume=15&rft.issue=6&rft.spage=e0234647&rft.epage=e0234647&rft.pages=e0234647-e0234647&rft.issn=1932-6203&rft.eissn=1932-6203&rft_id=info:doi/10.1371/journal.pone.0234647&rft_dat=%3Cgale_plos_%3EA627340808%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2415814189&rft_id=info:pmid/32569327&rft_galeid=A627340808&rft_doaj_id=oai_doaj_org_article_eaecb0fb69b04760ae33474f226bb9e1&rfr_iscdi=true