FasTag: Automatic text classification of unstructured medical narratives

Unstructured clinical narratives are continuously being recorded as part of delivery of care in electronic health records, and dedicated tagging staff spend considerable effort manually assigning clinical codes for billing purposes. Despite these efforts, however, label availability and accuracy are...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	PloS one 2020-06, Vol.15 (6), p.e0234647-e0234647
Hauptverfasser:	Venkataraman, Guhan Ram, Pineda, Arturo Lopez, Bear Don't Walk Iv, Oliver J, Zehnder, Ashley M, Ayyar, Sandeep, Page, Rodney L, Bustamante, Carlos D, Rivas, Manuel A
Format:	Artikel
Sprache:	eng
Schlagworte:	Animals Annotations Artificial neural networks Automation Biology and Life Sciences Cancer Classification Clinical coding Clinical medicine Codes Computer and Information Sciences Data Mining Data processing Data storage Databases as Topic Decision trees Dictionaries Disease Domains Electronic health records Electronic medical records Electronic records Engineering and Technology Humans Long short-term memory Machine learning Management Medical coding Medical diagnosis Medical records Medicine and Health Sciences Methods Model accuracy Narrative Medicine Narratives Natural language processing Neural networks Oncology Physical work Recurrent neural networks Reproducibility of Results Research and Analysis Methods Schools Semantics Short term memory Software Species Specificity Tagging Technology application Unstructured data Veterinary colleges Veterinary medicine
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	e0234647
container_issue	6
container_start_page	e0234647
container_title	PloS one
container_volume	15
creator	Venkataraman, Guhan Ram Pineda, Arturo Lopez Bear Don't Walk Iv, Oliver J Zehnder, Ashley M Ayyar, Sandeep Page, Rodney L Bustamante, Carlos D Rivas, Manuel A
description	Unstructured clinical narratives are continuously being recorded as part of delivery of care in electronic health records, and dedicated tagging staff spend considerable effort manually assigning clinical codes for billing purposes. Despite these efforts, however, label availability and accuracy are both suboptimal. In this retrospective study, we aimed to automate the assignment of top-level International Classification of Diseases version 9 (ICD-9) codes to clinical records from human and veterinary data stores using minimal manual labor and feature curation. Automating top-level annotations could in turn enable rapid cohort identification, especially in a veterinary setting. To this end, we trained long short-term memory (LSTM) recurrent neural networks (RNNs) on 52,722 human and 89,591 veterinary records. We investigated the accuracy of both separate-domain and combined-domain models and probed model portability. We established relevant baseline classification performances by training Decision Trees (DT) and Random Forests (RF). We also investigated whether transforming the data using MetaMap Lite, a clinical natural language processing tool, affected classification performance. We showed that the LSTM-RNNs accurately classify veterinary and human text narratives into top-level categories with an average weighted macro F1 score of 0.74 and 0.68 respectively. In the "neoplasia" category, the model trained on veterinary data had a high validation accuracy in veterinary data and moderate accuracy in human data, with F1 scores of 0.91 and 0.70 respectively. Our LSTM method scored slightly higher than that of the DT and RF models. The use of LSTM-RNN models represents a scalable structure that could prove useful in cohort identification for comparative oncology studies. Digitization of human and veterinary health information will continue to be a reality, particularly in the form of unstructured narratives. Our approach is a step forward for these two domains to learn from and inform one another.
doi_str_mv	10.1371/journal.pone.0234647
format	Article
fullrecord	<record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_2415814189</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A627340808</galeid><doaj_id>oai_doaj_org_article_eaecb0fb69b04760ae33474f226bb9e1</doaj_id><sourcerecordid>A627340808</sourcerecordid><originalsourceid>FETCH-LOGICAL-c692t-bfbbda2f7d111d98b9fc67c8faacf05248733a1765581511534e0bbc9eb907e3</originalsourceid><addsrcrecordid>eNqNkl2L1DAUhoso7of-A9GCsOjFjPlq0nohDIvrDiws6OBtOEmTmQ6ZZkzaRf-96U53mcpeSC9SznnOe3JO3ix7g9EcU4E_bX0fWnDzvW_NHBHKOBPPslNcUTLjBNHnR_8n2VmMW4QKWnL-MjuhpOApJ06z6yuIK1h_zhd953fQNTrvzO8u1w5ibGyjU8i3ubd538Yu9Lrrg6nznalTyuUthJCIOxNfZS8suGhej-d5trr6urq8nt3cflteLm5mmlekmymrVA3EihpjXFelqqzmQpcWQFtUEFYKSgELXhQlLjAuKDNIKV0ZVSFh6Hn27iC7dz7KcQdREoYTz3BZJWJ5IGoPW7kPzQ7CH-mhkfcBH9YSQprTGWnAaIWs4pVCTHAEhlImmCWEK1UZnLS-jN16lUbWpu0CuInoNNM2G7n2d1JQJASnSeDDKBD8r97ETu6aqI1z0Brf39-bE1EgPPR6_w_69HQjtYY0QNNan_rqQVQukhJlqERlouZPUOmrza7RyTC2SfFJwcdJQWIGG6yhj1Euf3z_f_b255S9OGI3Bly3id71g6niFGQHUAcfYzD2cckYycHvD9uQg9_l6PdU9vb4gR6LHgxO_wKl9fsF</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2415814189</pqid></control><display><type>article</type><title>FasTag: Automatic text classification of unstructured medical narratives</title><source>MEDLINE</source><source>DOAJ Directory of Open Access Journals</source><source>Public Library of Science (PLoS) Journals Open Access</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><creator>Venkataraman, Guhan Ram ; Pineda, Arturo Lopez ; Bear Don't Walk Iv, Oliver J ; Zehnder, Ashley M ; Ayyar, Sandeep ; Page, Rodney L ; Bustamante, Carlos D ; Rivas, Manuel A</creator><contributor>Clegg, Simon</contributor><creatorcontrib>Venkataraman, Guhan Ram ; Pineda, Arturo Lopez ; Bear Don't Walk Iv, Oliver J ; Zehnder, Ashley M ; Ayyar, Sandeep ; Page, Rodney L ; Bustamante, Carlos D ; Rivas, Manuel A ; Clegg, Simon</creatorcontrib><description>Unstructured clinical narratives are continuously being recorded as part of delivery of care in electronic health records, and dedicated tagging staff spend considerable effort manually assigning clinical codes for billing purposes. Despite these efforts, however, label availability and accuracy are both suboptimal. In this retrospective study, we aimed to automate the assignment of top-level International Classification of Diseases version 9 (ICD-9) codes to clinical records from human and veterinary data stores using minimal manual labor and feature curation. Automating top-level annotations could in turn enable rapid cohort identification, especially in a veterinary setting. To this end, we trained long short-term memory (LSTM) recurrent neural networks (RNNs) on 52,722 human and 89,591 veterinary records. We investigated the accuracy of both separate-domain and combined-domain models and probed model portability. We established relevant baseline classification performances by training Decision Trees (DT) and Random Forests (RF). We also investigated whether transforming the data using MetaMap Lite, a clinical natural language processing tool, affected classification performance. We showed that the LSTM-RNNs accurately classify veterinary and human text narratives into top-level categories with an average weighted macro F1 score of 0.74 and 0.68 respectively. In the "neoplasia" category, the model trained on veterinary data had a high validation accuracy in veterinary data and moderate accuracy in human data, with F1 scores of 0.91 and 0.70 respectively. Our LSTM method scored slightly higher than that of the DT and RF models. The use of LSTM-RNN models represents a scalable structure that could prove useful in cohort identification for comparative oncology studies. Digitization of human and veterinary health information will continue to be a reality, particularly in the form of unstructured narratives. Our approach is a step forward for these two domains to learn from and inform one another.</description><identifier>ISSN: 1932-6203</identifier><identifier>EISSN: 1932-6203</identifier><identifier>DOI: 10.1371/journal.pone.0234647</identifier><identifier>PMID: 32569327</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Animals ; Annotations ; Artificial neural networks ; Automation ; Biology and Life Sciences ; Cancer ; Classification ; Clinical coding ; Clinical medicine ; Codes ; Computer and Information Sciences ; Data Mining ; Data processing ; Data storage ; Databases as Topic ; Decision trees ; Dictionaries ; Disease ; Domains ; Electronic health records ; Electronic medical records ; Electronic records ; Engineering and Technology ; Humans ; Long short-term memory ; Machine learning ; Management ; Medical coding ; Medical diagnosis ; Medical records ; Medicine and Health Sciences ; Methods ; Model accuracy ; Narrative Medicine ; Narratives ; Natural language processing ; Neural networks ; Oncology ; Physical work ; Recurrent neural networks ; Reproducibility of Results ; Research and Analysis Methods ; Schools ; Semantics ; Short term memory ; Software ; Species Specificity ; Tagging ; Technology application ; Unstructured data ; Veterinary colleges ; Veterinary medicine</subject><ispartof>PloS one, 2020-06, Vol.15 (6), p.e0234647-e0234647</ispartof><rights>COPYRIGHT 2020 Public Library of Science</rights><rights>2020 Venkataraman et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2020 Venkataraman et al 2020 Venkataraman et al</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c692t-bfbbda2f7d111d98b9fc67c8faacf05248733a1765581511534e0bbc9eb907e3</citedby><cites>FETCH-LOGICAL-c692t-bfbbda2f7d111d98b9fc67c8faacf05248733a1765581511534e0bbc9eb907e3</cites><orcidid>0000-0003-1457-9925</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7307763/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7307763/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,2102,2928,23866,27924,27925,53791,53793,79600,79601</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/32569327$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Clegg, Simon</contributor><creatorcontrib>Venkataraman, Guhan Ram</creatorcontrib><creatorcontrib>Pineda, Arturo Lopez</creatorcontrib><creatorcontrib>Bear Don't Walk Iv, Oliver J</creatorcontrib><creatorcontrib>Zehnder, Ashley M</creatorcontrib><creatorcontrib>Ayyar, Sandeep</creatorcontrib><creatorcontrib>Page, Rodney L</creatorcontrib><creatorcontrib>Bustamante, Carlos D</creatorcontrib><creatorcontrib>Rivas, Manuel A</creatorcontrib><title>FasTag: Automatic text classification of unstructured medical narratives</title><title>PloS one</title><addtitle>PLoS One</addtitle><description>Unstructured clinical narratives are continuously being recorded as part of delivery of care in electronic health records, and dedicated tagging staff spend considerable effort manually assigning clinical codes for billing purposes. Despite these efforts, however, label availability and accuracy are both suboptimal. In this retrospective study, we aimed to automate the assignment of top-level International Classification of Diseases version 9 (ICD-9) codes to clinical records from human and veterinary data stores using minimal manual labor and feature curation. Automating top-level annotations could in turn enable rapid cohort identification, especially in a veterinary setting. To this end, we trained long short-term memory (LSTM) recurrent neural networks (RNNs) on 52,722 human and 89,591 veterinary records. We investigated the accuracy of both separate-domain and combined-domain models and probed model portability. We established relevant baseline classification performances by training Decision Trees (DT) and Random Forests (RF). We also investigated whether transforming the data using MetaMap Lite, a clinical natural language processing tool, affected classification performance. We showed that the LSTM-RNNs accurately classify veterinary and human text narratives into top-level categories with an average weighted macro F1 score of 0.74 and 0.68 respectively. In the "neoplasia" category, the model trained on veterinary data had a high validation accuracy in veterinary data and moderate accuracy in human data, with F1 scores of 0.91 and 0.70 respectively. Our LSTM method scored slightly higher than that of the DT and RF models. The use of LSTM-RNN models represents a scalable structure that could prove useful in cohort identification for comparative oncology studies. Digitization of human and veterinary health information will continue to be a reality, particularly in the form of unstructured narratives. Our approach is a step forward for these two domains to learn from and inform one another.</description><subject>Animals</subject><subject>Annotations</subject><subject>Artificial neural networks</subject><subject>Automation</subject><subject>Biology and Life Sciences</subject><subject>Cancer</subject><subject>Classification</subject><subject>Clinical coding</subject><subject>Clinical medicine</subject><subject>Codes</subject><subject>Computer and Information Sciences</subject><subject>Data Mining</subject><subject>Data processing</subject><subject>Data storage</subject><subject>Databases as Topic</subject><subject>Decision trees</subject><subject>Dictionaries</subject><subject>Disease</subject><subject>Domains</subject><subject>Electronic health records</subject><subject>Electronic medical records</subject><subject>Electronic records</subject><subject>Engineering and Technology</subject><subject>Humans</subject><subject>Long short-term memory</subject><subject>Machine learning</subject><subject>Management</subject><subject>Medical coding</subject><subject>Medical diagnosis</subject><subject>Medical records</subject><subject>Medicine and Health Sciences</subject><subject>Methods</subject><subject>Model accuracy</subject><subject>Narrative Medicine</subject><subject>Narratives</subject><subject>Natural language processing</subject><subject>Neural networks</subject><subject>Oncology</subject><subject>Physical work</subject><subject>Recurrent neural networks</subject><subject>Reproducibility of Results</subject><subject>Research and Analysis Methods</subject><subject>Schools</subject><subject>Semantics</subject><subject>Short term memory</subject><subject>Software</subject><subject>Species Specificity</subject><subject>Tagging</subject><subject>Technology application</subject><subject>Unstructured data</subject><subject>Veterinary colleges</subject><subject>Veterinary medicine</subject><issn>1932-6203</issn><issn>1932-6203</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>DOA</sourceid><recordid>eNqNkl2L1DAUhoso7of-A9GCsOjFjPlq0nohDIvrDiws6OBtOEmTmQ6ZZkzaRf-96U53mcpeSC9SznnOe3JO3ix7g9EcU4E_bX0fWnDzvW_NHBHKOBPPslNcUTLjBNHnR_8n2VmMW4QKWnL-MjuhpOApJ06z6yuIK1h_zhd953fQNTrvzO8u1w5ibGyjU8i3ubd538Yu9Lrrg6nznalTyuUthJCIOxNfZS8suGhej-d5trr6urq8nt3cflteLm5mmlekmymrVA3EihpjXFelqqzmQpcWQFtUEFYKSgELXhQlLjAuKDNIKV0ZVSFh6Hn27iC7dz7KcQdREoYTz3BZJWJ5IGoPW7kPzQ7CH-mhkfcBH9YSQprTGWnAaIWs4pVCTHAEhlImmCWEK1UZnLS-jN16lUbWpu0CuInoNNM2G7n2d1JQJASnSeDDKBD8r97ETu6aqI1z0Brf39-bE1EgPPR6_w_69HQjtYY0QNNan_rqQVQukhJlqERlouZPUOmrza7RyTC2SfFJwcdJQWIGG6yhj1Euf3z_f_b255S9OGI3Bly3id71g6niFGQHUAcfYzD2cckYycHvD9uQg9_l6PdU9vb4gR6LHgxO_wKl9fsF</recordid><startdate>20200622</startdate><enddate>20200622</enddate><creator>Venkataraman, Guhan Ram</creator><creator>Pineda, Arturo Lopez</creator><creator>Bear Don't Walk Iv, Oliver J</creator><creator>Zehnder, Ashley M</creator><creator>Ayyar, Sandeep</creator><creator>Page, Rodney L</creator><creator>Bustamante, Carlos D</creator><creator>Rivas, Manuel A</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>IOV</scope><scope>ISR</scope><scope>3V.</scope><scope>7QG</scope><scope>7QL</scope><scope>7QO</scope><scope>7RV</scope><scope>7SN</scope><scope>7SS</scope><scope>7T5</scope><scope>7TG</scope><scope>7TM</scope><scope>7U9</scope><scope>7X2</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>ATCPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>D1I</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>KB.</scope><scope>KB0</scope><scope>KL.</scope><scope>L6V</scope><scope>LK8</scope><scope>M0K</scope><scope>M0S</scope><scope>M1P</scope><scope>M7N</scope><scope>M7P</scope><scope>M7S</scope><scope>NAPCQ</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PATMY</scope><scope>PDBOC</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>PYCSY</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-1457-9925</orcidid></search><sort><creationdate>20200622</creationdate><title>FasTag: Automatic text classification of unstructured medical narratives</title><author>Venkataraman, Guhan Ram ; Pineda, Arturo Lopez ; Bear Don't Walk Iv, Oliver J ; Zehnder, Ashley M ; Ayyar, Sandeep ; Page, Rodney L ; Bustamante, Carlos D ; Rivas, Manuel A</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c692t-bfbbda2f7d111d98b9fc67c8faacf05248733a1765581511534e0bbc9eb907e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Animals</topic><topic>Annotations</topic><topic>Artificial neural networks</topic><topic>Automation</topic><topic>Biology and Life Sciences</topic><topic>Cancer</topic><topic>Classification</topic><topic>Clinical coding</topic><topic>Clinical medicine</topic><topic>Codes</topic><topic>Computer and Information Sciences</topic><topic>Data Mining</topic><topic>Data processing</topic><topic>Data storage</topic><topic>Databases as Topic</topic><topic>Decision trees</topic><topic>Dictionaries</topic><topic>Disease</topic><topic>Domains</topic><topic>Electronic health records</topic><topic>Electronic medical records</topic><topic>Electronic records</topic><topic>Engineering and Technology</topic><topic>Humans</topic><topic>Long short-term memory</topic><topic>Machine learning</topic><topic>Management</topic><topic>Medical coding</topic><topic>Medical diagnosis</topic><topic>Medical records</topic><topic>Medicine and Health Sciences</topic><topic>Methods</topic><topic>Model accuracy</topic><topic>Narrative Medicine</topic><topic>Narratives</topic><topic>Natural language processing</topic><topic>Neural networks</topic><topic>Oncology</topic><topic>Physical work</topic><topic>Recurrent neural networks</topic><topic>Reproducibility of Results</topic><topic>Research and Analysis Methods</topic><topic>Schools</topic><topic>Semantics</topic><topic>Short term memory</topic><topic>Software</topic><topic>Species Specificity</topic><topic>Tagging</topic><topic>Technology application</topic><topic>Unstructured data</topic><topic>Veterinary colleges</topic><topic>Veterinary medicine</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Venkataraman, Guhan Ram</creatorcontrib><creatorcontrib>Pineda, Arturo Lopez</creatorcontrib><creatorcontrib>Bear Don't Walk Iv, Oliver J</creatorcontrib><creatorcontrib>Zehnder, Ashley M</creatorcontrib><creatorcontrib>Ayyar, Sandeep</creatorcontrib><creatorcontrib>Page, Rodney L</creatorcontrib><creatorcontrib>Bustamante, Carlos D</creatorcontrib><creatorcontrib>Rivas, Manuel A</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Opposing Viewpoints</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Nursing & Allied Health Database</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Immunology Abstracts</collection><collection>Meteorological & Geoastrophysical Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Agricultural Science Collection</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Public Health Database</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>Agricultural & Environmental Science Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Materials Science Collection</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Materials Science Database</collection><collection>Nursing & Allied Health Database (Alumni Edition)</collection><collection>Meteorological & Geoastrophysical Abstracts - Academic</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Agricultural Science Database</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biological Science Database</collection><collection>Engineering Database</collection><collection>Nursing & Allied Health Premium</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Environmental Science Database</collection><collection>Materials Science Collection</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>Environmental Science Collection</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PloS one</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Venkataraman, Guhan Ram</au><au>Pineda, Arturo Lopez</au><au>Bear Don't Walk Iv, Oliver J</au><au>Zehnder, Ashley M</au><au>Ayyar, Sandeep</au><au>Page, Rodney L</au><au>Bustamante, Carlos D</au><au>Rivas, Manuel A</au><au>Clegg, Simon</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>FasTag: Automatic text classification of unstructured medical narratives</atitle><jtitle>PloS one</jtitle><addtitle>PLoS One</addtitle><date>2020-06-22</date><risdate>2020</risdate><volume>15</volume><issue>6</issue><spage>e0234647</spage><epage>e0234647</epage><pages>e0234647-e0234647</pages><issn>1932-6203</issn><eissn>1932-6203</eissn><abstract>Unstructured clinical narratives are continuously being recorded as part of delivery of care in electronic health records, and dedicated tagging staff spend considerable effort manually assigning clinical codes for billing purposes. Despite these efforts, however, label availability and accuracy are both suboptimal. In this retrospective study, we aimed to automate the assignment of top-level International Classification of Diseases version 9 (ICD-9) codes to clinical records from human and veterinary data stores using minimal manual labor and feature curation. Automating top-level annotations could in turn enable rapid cohort identification, especially in a veterinary setting. To this end, we trained long short-term memory (LSTM) recurrent neural networks (RNNs) on 52,722 human and 89,591 veterinary records. We investigated the accuracy of both separate-domain and combined-domain models and probed model portability. We established relevant baseline classification performances by training Decision Trees (DT) and Random Forests (RF). We also investigated whether transforming the data using MetaMap Lite, a clinical natural language processing tool, affected classification performance. We showed that the LSTM-RNNs accurately classify veterinary and human text narratives into top-level categories with an average weighted macro F1 score of 0.74 and 0.68 respectively. In the "neoplasia" category, the model trained on veterinary data had a high validation accuracy in veterinary data and moderate accuracy in human data, with F1 scores of 0.91 and 0.70 respectively. Our LSTM method scored slightly higher than that of the DT and RF models. The use of LSTM-RNN models represents a scalable structure that could prove useful in cohort identification for comparative oncology studies. Digitization of human and veterinary health information will continue to be a reality, particularly in the form of unstructured narratives. Our approach is a step forward for these two domains to learn from and inform one another.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>32569327</pmid><doi>10.1371/journal.pone.0234647</doi><tpages>e0234647</tpages><orcidid>https://orcid.org/0000-0003-1457-9925</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1932-6203
ispartof	PloS one, 2020-06, Vol.15 (6), p.e0234647-e0234647
issn	1932-6203 1932-6203
language	eng
recordid	cdi_plos_journals_2415814189
source	MEDLINE; DOAJ Directory of Open Access Journals; Public Library of Science (PLoS) Journals Open Access; EZB-FREE-00999 freely available EZB journals; PubMed Central; Free Full-Text Journals in Chemistry
subjects	Animals Annotations Artificial neural networks Automation Biology and Life Sciences Cancer Classification Clinical coding Clinical medicine Codes Computer and Information Sciences Data Mining Data processing Data storage Databases as Topic Decision trees Dictionaries Disease Domains Electronic health records Electronic medical records Electronic records Engineering and Technology Humans Long short-term memory Machine learning Management Medical coding Medical diagnosis Medical records Medicine and Health Sciences Methods Model accuracy Narrative Medicine Narratives Natural language processing Neural networks Oncology Physical work Recurrent neural networks Reproducibility of Results Research and Analysis Methods Schools Semantics Short term memory Software Species Specificity Tagging Technology application Unstructured data Veterinary colleges Veterinary medicine
title	FasTag: Automatic text classification of unstructured medical narratives
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T17%3A23%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=FasTag:%20Automatic%20text%20classification%20of%20unstructured%20medical%20narratives&rft.jtitle=PloS%20one&rft.au=Venkataraman,%20Guhan%20Ram&rft.date=2020-06-22&rft.volume=15&rft.issue=6&rft.spage=e0234647&rft.epage=e0234647&rft.pages=e0234647-e0234647&rft.issn=1932-6203&rft.eissn=1932-6203&rft_id=info:doi/10.1371/journal.pone.0234647&rft_dat=%3Cgale_plos_%3EA627340808%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2415814189&rft_id=info:pmid/32569327&rft_galeid=A627340808&rft_doaj_id=oai_doaj_org_article_eaecb0fb69b04760ae33474f226bb9e1&rfr_iscdi=true