FasTag: Automatic text classification of unstructured medical narratives
Unstructured clinical narratives are continuously being recorded as part of delivery of care in electronic health records, and dedicated tagging staff spend considerable effort manually assigning clinical codes for billing purposes. Despite these efforts, however, label availability and accuracy are...
Gespeichert in:
Veröffentlicht in: | PloS one 2020-06, Vol.15 (6), p.e0234647-e0234647 |
---|---|
Hauptverfasser: | , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | e0234647 |
---|---|
container_issue | 6 |
container_start_page | e0234647 |
container_title | PloS one |
container_volume | 15 |
creator | Venkataraman, Guhan Ram Pineda, Arturo Lopez Bear Don't Walk Iv, Oliver J Zehnder, Ashley M Ayyar, Sandeep Page, Rodney L Bustamante, Carlos D Rivas, Manuel A |
description | Unstructured clinical narratives are continuously being recorded as part of delivery of care in electronic health records, and dedicated tagging staff spend considerable effort manually assigning clinical codes for billing purposes. Despite these efforts, however, label availability and accuracy are both suboptimal. In this retrospective study, we aimed to automate the assignment of top-level International Classification of Diseases version 9 (ICD-9) codes to clinical records from human and veterinary data stores using minimal manual labor and feature curation. Automating top-level annotations could in turn enable rapid cohort identification, especially in a veterinary setting. To this end, we trained long short-term memory (LSTM) recurrent neural networks (RNNs) on 52,722 human and 89,591 veterinary records. We investigated the accuracy of both separate-domain and combined-domain models and probed model portability. We established relevant baseline classification performances by training Decision Trees (DT) and Random Forests (RF). We also investigated whether transforming the data using MetaMap Lite, a clinical natural language processing tool, affected classification performance. We showed that the LSTM-RNNs accurately classify veterinary and human text narratives into top-level categories with an average weighted macro F1 score of 0.74 and 0.68 respectively. In the "neoplasia" category, the model trained on veterinary data had a high validation accuracy in veterinary data and moderate accuracy in human data, with F1 scores of 0.91 and 0.70 respectively. Our LSTM method scored slightly higher than that of the DT and RF models. The use of LSTM-RNN models represents a scalable structure that could prove useful in cohort identification for comparative oncology studies. Digitization of human and veterinary health information will continue to be a reality, particularly in the form of unstructured narratives. Our approach is a step forward for these two domains to learn from and inform one another. |
doi_str_mv | 10.1371/journal.pone.0234647 |
format | Article |
fullrecord | <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_2415814189</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A627340808</galeid><doaj_id>oai_doaj_org_article_eaecb0fb69b04760ae33474f226bb9e1</doaj_id><sourcerecordid>A627340808</sourcerecordid><originalsourceid>FETCH-LOGICAL-c692t-bfbbda2f7d111d98b9fc67c8faacf05248733a1765581511534e0bbc9eb907e3</originalsourceid><addsrcrecordid>eNqNkl2L1DAUhoso7of-A9GCsOjFjPlq0nohDIvrDiws6OBtOEmTmQ6ZZkzaRf-96U53mcpeSC9SznnOe3JO3ix7g9EcU4E_bX0fWnDzvW_NHBHKOBPPslNcUTLjBNHnR_8n2VmMW4QKWnL-MjuhpOApJ06z6yuIK1h_zhd953fQNTrvzO8u1w5ibGyjU8i3ubd538Yu9Lrrg6nznalTyuUthJCIOxNfZS8suGhej-d5trr6urq8nt3cflteLm5mmlekmymrVA3EihpjXFelqqzmQpcWQFtUEFYKSgELXhQlLjAuKDNIKV0ZVSFh6Hn27iC7dz7KcQdREoYTz3BZJWJ5IGoPW7kPzQ7CH-mhkfcBH9YSQprTGWnAaIWs4pVCTHAEhlImmCWEK1UZnLS-jN16lUbWpu0CuInoNNM2G7n2d1JQJASnSeDDKBD8r97ETu6aqI1z0Brf39-bE1EgPPR6_w_69HQjtYY0QNNan_rqQVQukhJlqERlouZPUOmrza7RyTC2SfFJwcdJQWIGG6yhj1Euf3z_f_b255S9OGI3Bly3id71g6niFGQHUAcfYzD2cckYycHvD9uQg9_l6PdU9vb4gR6LHgxO_wKl9fsF</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2415814189</pqid></control><display><type>article</type><title>FasTag: Automatic text classification of unstructured medical narratives</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Public Library of Science (PLoS) Journals Open Access</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><creator>Venkataraman, Guhan Ram ; Pineda, Arturo Lopez ; Bear Don't Walk Iv, Oliver J ; Zehnder, Ashley M ; Ayyar, Sandeep ; Page, Rodney L ; Bustamante, Carlos D ; Rivas, Manuel A</creator><contributor>Clegg, Simon</contributor><creatorcontrib>Venkataraman, Guhan Ram ; Pineda, Arturo Lopez ; Bear Don't Walk Iv, Oliver J ; Zehnder, Ashley M ; Ayyar, Sandeep ; Page, Rodney L ; Bustamante, Carlos D ; Rivas, Manuel A ; Clegg, Simon</creatorcontrib><description>Unstructured clinical narratives are continuously being recorded as part of delivery of care in electronic health records, and dedicated tagging staff spend considerable effort manually assigning clinical codes for billing purposes. Despite these efforts, however, label availability and accuracy are both suboptimal. In this retrospective study, we aimed to automate the assignment of top-level International Classification of Diseases version 9 (ICD-9) codes to clinical records from human and veterinary data stores using minimal manual labor and feature curation. Automating top-level annotations could in turn enable rapid cohort identification, especially in a veterinary setting. To this end, we trained long short-term memory (LSTM) recurrent neural networks (RNNs) on 52,722 human and 89,591 veterinary records. We investigated the accuracy of both separate-domain and combined-domain models and probed model portability. We established relevant baseline classification performances by training Decision Trees (DT) and Random Forests (RF). We also investigated whether transforming the data using MetaMap Lite, a clinical natural language processing tool, affected classification performance. We showed that the LSTM-RNNs accurately classify veterinary and human text narratives into top-level categories with an average weighted macro F1 score of 0.74 and 0.68 respectively. In the "neoplasia" category, the model trained on veterinary data had a high validation accuracy in veterinary data and moderate accuracy in human data, with F1 scores of 0.91 and 0.70 respectively. Our LSTM method scored slightly higher than that of the DT and RF models. The use of LSTM-RNN models represents a scalable structure that could prove useful in cohort identification for comparative oncology studies. Digitization of human and veterinary health information will continue to be a reality, particularly in the form of unstructured narratives. Our approach is a step forward for these two domains to learn from and inform one another.</description><identifier>ISSN: 1932-6203</identifier><identifier>EISSN: 1932-6203</identifier><identifier>DOI: 10.1371/journal.pone.0234647</identifier><identifier>PMID: 32569327</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Animals ; Annotations ; Artificial neural networks ; Automation ; Biology and Life Sciences ; Cancer ; Classification ; Clinical coding ; Clinical medicine ; Codes ; Computer and Information Sciences ; Data Mining ; Data processing ; Data storage ; Databases as Topic ; Decision trees ; Dictionaries ; Disease ; Domains ; Electronic health records ; Electronic medical records ; Electronic records ; Engineering and Technology ; Humans ; Long short-term memory ; Machine learning ; Management ; Medical coding ; Medical diagnosis ; Medical records ; Medicine and Health Sciences ; Methods ; Model accuracy ; Narrative Medicine ; Narratives ; Natural language processing ; Neural networks ; Oncology ; Physical work ; Recurrent neural networks ; Reproducibility of Results ; Research and Analysis Methods ; Schools ; Semantics ; Short term memory ; Software ; Species Specificity ; Tagging ; Technology application ; Unstructured data ; Veterinary colleges ; Veterinary medicine</subject><ispartof>PloS one, 2020-06, Vol.15 (6), p.e0234647-e0234647</ispartof><rights>COPYRIGHT 2020 Public Library of Science</rights><rights>2020 Venkataraman et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2020 Venkataraman et al 2020 Venkataraman et al</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c692t-bfbbda2f7d111d98b9fc67c8faacf05248733a1765581511534e0bbc9eb907e3</citedby><cites>FETCH-LOGICAL-c692t-bfbbda2f7d111d98b9fc67c8faacf05248733a1765581511534e0bbc9eb907e3</cites><orcidid>0000-0003-1457-9925</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7307763/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7307763/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,2102,2928,23866,27924,27925,53791,53793,79600,79601</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/32569327$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Clegg, Simon</contributor><creatorcontrib>Venkataraman, Guhan Ram</creatorcontrib><creatorcontrib>Pineda, Arturo Lopez</creatorcontrib><creatorcontrib>Bear Don't Walk Iv, Oliver J</creatorcontrib><creatorcontrib>Zehnder, Ashley M</creatorcontrib><creatorcontrib>Ayyar, Sandeep</creatorcontrib><creatorcontrib>Page, Rodney L</creatorcontrib><creatorcontrib>Bustamante, Carlos D</creatorcontrib><creatorcontrib>Rivas, Manuel A</creatorcontrib><title>FasTag: Automatic text classification of unstructured medical narratives</title><title>PloS one</title><addtitle>PLoS One</addtitle><description>Unstructured clinical narratives are continuously being recorded as part of delivery of care in electronic health records, and dedicated tagging staff spend considerable effort manually assigning clinical codes for billing purposes. Despite these efforts, however, label availability and accuracy are both suboptimal. In this retrospective study, we aimed to automate the assignment of top-level International Classification of Diseases version 9 (ICD-9) codes to clinical records from human and veterinary data stores using minimal manual labor and feature curation. Automating top-level annotations could in turn enable rapid cohort identification, especially in a veterinary setting. To this end, we trained long short-term memory (LSTM) recurrent neural networks (RNNs) on 52,722 human and 89,591 veterinary records. We investigated the accuracy of both separate-domain and combined-domain models and probed model portability. We established relevant baseline classification performances by training Decision Trees (DT) and Random Forests (RF). We also investigated whether transforming the data using MetaMap Lite, a clinical natural language processing tool, affected classification performance. We showed that the LSTM-RNNs accurately classify veterinary and human text narratives into top-level categories with an average weighted macro F1 score of 0.74 and 0.68 respectively. In the "neoplasia" category, the model trained on veterinary data had a high validation accuracy in veterinary data and moderate accuracy in human data, with F1 scores of 0.91 and 0.70 respectively. Our LSTM method scored slightly higher than that of the DT and RF models. The use of LSTM-RNN models represents a scalable structure that could prove useful in cohort identification for comparative oncology studies. Digitization of human and veterinary health information will continue to be a reality, particularly in the form of unstructured narratives. Our approach is a step forward for these two domains to learn from and inform one another.</description><subject>Animals</subject><subject>Annotations</subject><subject>Artificial neural networks</subject><subject>Automation</subject><subject>Biology and Life Sciences</subject><subject>Cancer</subject><subject>Classification</subject><subject>Clinical coding</subject><subject>Clinical medicine</subject><subject>Codes</subject><subject>Computer and Information Sciences</subject><subject>Data Mining</subject><subject>Data processing</subject><subject>Data storage</subject><subject>Databases as Topic</subject><subject>Decision trees</subject><subject>Dictionaries</subject><subject>Disease</subject><subject>Domains</subject><subject>Electronic health records</subject><subject>Electronic medical records</subject><subject>Electronic records</subject><subject>Engineering and Technology</subject><subject>Humans</subject><subject>Long short-term memory</subject><subject>Machine learning</subject><subject>Management</subject><subject>Medical coding</subject><subject>Medical diagnosis</subject><subject>Medical records</subject><subject>Medicine and Health Sciences</subject><subject>Methods</subject><subject>Model accuracy</subject><subject>Narrative Medicine</subject><subject>Narratives</subject><subject>Natural language processing</subject><subject>Neural networks</subject><subject>Oncology</subject><subject>Physical work</subject><subject>Recurrent neural networks</subject><subject>Reproducibility of Results</subject><subject>Research and Analysis Methods</subject><subject>Schools</subject><subject>Semantics</subject><subject>Short term memory</subject><subject>Software</subject><subject>Species Specificity</subject><subject>Tagging</subject><subject>Technology application</subject><subject>Unstructured data</subject><subject>Veterinary colleges</subject><subject>Veterinary medicine</subject><issn>1932-6203</issn><issn>1932-6203</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>DOA</sourceid><recordid>eNqNkl2L1DAUhoso7of-A9GCsOjFjPlq0nohDIvrDiws6OBtOEmTmQ6ZZkzaRf-96U53mcpeSC9SznnOe3JO3ix7g9EcU4E_bX0fWnDzvW_NHBHKOBPPslNcUTLjBNHnR_8n2VmMW4QKWnL-MjuhpOApJ06z6yuIK1h_zhd953fQNTrvzO8u1w5ibGyjU8i3ubd538Yu9Lrrg6nznalTyuUthJCIOxNfZS8suGhej-d5trr6urq8nt3cflteLm5mmlekmymrVA3EihpjXFelqqzmQpcWQFtUEFYKSgELXhQlLjAuKDNIKV0ZVSFh6Hn27iC7dz7KcQdREoYTz3BZJWJ5IGoPW7kPzQ7CH-mhkfcBH9YSQprTGWnAaIWs4pVCTHAEhlImmCWEK1UZnLS-jN16lUbWpu0CuInoNNM2G7n2d1JQJASnSeDDKBD8r97ETu6aqI1z0Brf39-bE1EgPPR6_w_69HQjtYY0QNNan_rqQVQukhJlqERlouZPUOmrza7RyTC2SfFJwcdJQWIGG6yhj1Euf3z_f_b255S9OGI3Bly3id71g6niFGQHUAcfYzD2cckYycHvD9uQg9_l6PdU9vb4gR6LHgxO_wKl9fsF</recordid><startdate>20200622</startdate><enddate>20200622</enddate><creator>Venkataraman, Guhan Ram</creator><creator>Pineda, Arturo Lopez</creator><creator>Bear Don't Walk Iv, Oliver J</creator><creator>Zehnder, Ashley M</creator><creator>Ayyar, Sandeep</creator><creator>Page, Rodney L</creator><creator>Bustamante, Carlos D</creator><creator>Rivas, Manuel A</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>IOV</scope><scope>ISR</scope><scope>3V.</scope><scope>7QG</scope><scope>7QL</scope><scope>7QO</scope><scope>7RV</scope><scope>7SN</scope><scope>7SS</scope><scope>7T5</scope><scope>7TG</scope><scope>7TM</scope><scope>7U9</scope><scope>7X2</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>ATCPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>D1I</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>KB.</scope><scope>KB0</scope><scope>KL.</scope><scope>L6V</scope><scope>LK8</scope><scope>M0K</scope><scope>M0S</scope><scope>M1P</scope><scope>M7N</scope><scope>M7P</scope><scope>M7S</scope><scope>NAPCQ</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PATMY</scope><scope>PDBOC</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>PYCSY</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-1457-9925</orcidid></search><sort><creationdate>20200622</creationdate><title>FasTag: Automatic text classification of unstructured medical narratives</title><author>Venkataraman, Guhan Ram ; Pineda, Arturo Lopez ; Bear Don't Walk Iv, Oliver J ; Zehnder, Ashley M ; Ayyar, Sandeep ; Page, Rodney L ; Bustamante, Carlos D ; Rivas, Manuel A</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c692t-bfbbda2f7d111d98b9fc67c8faacf05248733a1765581511534e0bbc9eb907e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Animals</topic><topic>Annotations</topic><topic>Artificial neural networks</topic><topic>Automation</topic><topic>Biology and Life Sciences</topic><topic>Cancer</topic><topic>Classification</topic><topic>Clinical coding</topic><topic>Clinical medicine</topic><topic>Codes</topic><topic>Computer and Information Sciences</topic><topic>Data Mining</topic><topic>Data processing</topic><topic>Data storage</topic><topic>Databases as Topic</topic><topic>Decision trees</topic><topic>Dictionaries</topic><topic>Disease</topic><topic>Domains</topic><topic>Electronic health records</topic><topic>Electronic medical records</topic><topic>Electronic records</topic><topic>Engineering and Technology</topic><topic>Humans</topic><topic>Long short-term memory</topic><topic>Machine learning</topic><topic>Management</topic><topic>Medical coding</topic><topic>Medical diagnosis</topic><topic>Medical records</topic><topic>Medicine and Health Sciences</topic><topic>Methods</topic><topic>Model accuracy</topic><topic>Narrative Medicine</topic><topic>Narratives</topic><topic>Natural language processing</topic><topic>Neural networks</topic><topic>Oncology</topic><topic>Physical work</topic><topic>Recurrent neural networks</topic><topic>Reproducibility of Results</topic><topic>Research and Analysis Methods</topic><topic>Schools</topic><topic>Semantics</topic><topic>Short term memory</topic><topic>Software</topic><topic>Species Specificity</topic><topic>Tagging</topic><topic>Technology application</topic><topic>Unstructured data</topic><topic>Veterinary colleges</topic><topic>Veterinary medicine</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Venkataraman, Guhan Ram</creatorcontrib><creatorcontrib>Pineda, Arturo Lopez</creatorcontrib><creatorcontrib>Bear Don't Walk Iv, Oliver J</creatorcontrib><creatorcontrib>Zehnder, Ashley M</creatorcontrib><creatorcontrib>Ayyar, Sandeep</creatorcontrib><creatorcontrib>Page, Rodney L</creatorcontrib><creatorcontrib>Bustamante, Carlos D</creatorcontrib><creatorcontrib>Rivas, Manuel A</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Opposing Viewpoints</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Nursing & Allied Health Database</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Immunology Abstracts</collection><collection>Meteorological & Geoastrophysical Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Agricultural Science Collection</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Public Health Database</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>Agricultural & Environmental Science Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Materials Science Collection</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Materials Science Database</collection><collection>Nursing & Allied Health Database (Alumni Edition)</collection><collection>Meteorological & Geoastrophysical Abstracts - Academic</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Agricultural Science Database</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biological Science Database</collection><collection>Engineering Database</collection><collection>Nursing & Allied Health Premium</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Environmental Science Database</collection><collection>Materials Science Collection</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>Environmental Science Collection</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PloS one</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Venkataraman, Guhan Ram</au><au>Pineda, Arturo Lopez</au><au>Bear Don't Walk Iv, Oliver J</au><au>Zehnder, Ashley M</au><au>Ayyar, Sandeep</au><au>Page, Rodney L</au><au>Bustamante, Carlos D</au><au>Rivas, Manuel A</au><au>Clegg, Simon</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>FasTag: Automatic text classification of unstructured medical narratives</atitle><jtitle>PloS one</jtitle><addtitle>PLoS One</addtitle><date>2020-06-22</date><risdate>2020</risdate><volume>15</volume><issue>6</issue><spage>e0234647</spage><epage>e0234647</epage><pages>e0234647-e0234647</pages><issn>1932-6203</issn><eissn>1932-6203</eissn><abstract>Unstructured clinical narratives are continuously being recorded as part of delivery of care in electronic health records, and dedicated tagging staff spend considerable effort manually assigning clinical codes for billing purposes. Despite these efforts, however, label availability and accuracy are both suboptimal. In this retrospective study, we aimed to automate the assignment of top-level International Classification of Diseases version 9 (ICD-9) codes to clinical records from human and veterinary data stores using minimal manual labor and feature curation. Automating top-level annotations could in turn enable rapid cohort identification, especially in a veterinary setting. To this end, we trained long short-term memory (LSTM) recurrent neural networks (RNNs) on 52,722 human and 89,591 veterinary records. We investigated the accuracy of both separate-domain and combined-domain models and probed model portability. We established relevant baseline classification performances by training Decision Trees (DT) and Random Forests (RF). We also investigated whether transforming the data using MetaMap Lite, a clinical natural language processing tool, affected classification performance. We showed that the LSTM-RNNs accurately classify veterinary and human text narratives into top-level categories with an average weighted macro F1 score of 0.74 and 0.68 respectively. In the "neoplasia" category, the model trained on veterinary data had a high validation accuracy in veterinary data and moderate accuracy in human data, with F1 scores of 0.91 and 0.70 respectively. Our LSTM method scored slightly higher than that of the DT and RF models. The use of LSTM-RNN models represents a scalable structure that could prove useful in cohort identification for comparative oncology studies. Digitization of human and veterinary health information will continue to be a reality, particularly in the form of unstructured narratives. Our approach is a step forward for these two domains to learn from and inform one another.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>32569327</pmid><doi>10.1371/journal.pone.0234647</doi><tpages>e0234647</tpages><orcidid>https://orcid.org/0000-0003-1457-9925</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1932-6203 |
ispartof | PloS one, 2020-06, Vol.15 (6), p.e0234647-e0234647 |
issn | 1932-6203 1932-6203 |
language | eng |
recordid | cdi_plos_journals_2415814189 |
source | MEDLINE; DOAJ Directory of Open Access Journals; Public Library of Science (PLoS) Journals Open Access; EZB-FREE-00999 freely available EZB journals; PubMed Central; Free Full-Text Journals in Chemistry |
subjects | Animals Annotations Artificial neural networks Automation Biology and Life Sciences Cancer Classification Clinical coding Clinical medicine Codes Computer and Information Sciences Data Mining Data processing Data storage Databases as Topic Decision trees Dictionaries Disease Domains Electronic health records Electronic medical records Electronic records Engineering and Technology Humans Long short-term memory Machine learning Management Medical coding Medical diagnosis Medical records Medicine and Health Sciences Methods Model accuracy Narrative Medicine Narratives Natural language processing Neural networks Oncology Physical work Recurrent neural networks Reproducibility of Results Research and Analysis Methods Schools Semantics Short term memory Software Species Specificity Tagging Technology application Unstructured data Veterinary colleges Veterinary medicine |
title | FasTag: Automatic text classification of unstructured medical narratives |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T17%3A23%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=FasTag:%20Automatic%20text%20classification%20of%20unstructured%20medical%20narratives&rft.jtitle=PloS%20one&rft.au=Venkataraman,%20Guhan%20Ram&rft.date=2020-06-22&rft.volume=15&rft.issue=6&rft.spage=e0234647&rft.epage=e0234647&rft.pages=e0234647-e0234647&rft.issn=1932-6203&rft.eissn=1932-6203&rft_id=info:doi/10.1371/journal.pone.0234647&rft_dat=%3Cgale_plos_%3EA627340808%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2415814189&rft_id=info:pmid/32569327&rft_galeid=A627340808&rft_doaj_id=oai_doaj_org_article_eaecb0fb69b04760ae33474f226bb9e1&rfr_iscdi=true |