Using topic modelling for unsupervised annotation of electronic health records to identify an outbreak of disease in UK dogs
A key goal of disease surveillance is to identify outbreaks of known or novel diseases in a timely manner. Such an outbreak occurred in the UK associated with acute vomiting in dogs between December 2019 and March 2020. We tracked this outbreak using the clinical free text component of anonymised el...
Gespeichert in:
Veröffentlicht in: | PloS one 2021-12, Vol.16 (12), p.e0260402-e0260402 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | e0260402 |
---|---|
container_issue | 12 |
container_start_page | e0260402 |
container_title | PloS one |
container_volume | 16 |
creator | Noble, Peter-John Mäntylä Appleton, Charlotte Radford, Alan David Nenadic, Goran |
description | A key goal of disease surveillance is to identify outbreaks of known or novel diseases in a timely manner. Such an outbreak occurred in the UK associated with acute vomiting in dogs between December 2019 and March 2020. We tracked this outbreak using the clinical free text component of anonymised electronic health records (EHRs) collected from a sentinel network of participating veterinary practices. We sourced the free text (narrative) component of each EHR supplemented with one of 10 practitioner-derived main presenting complaints (MPCs), with the 'gastroenteric' MPC identifying cases involved in the disease outbreak. Such clinician-derived annotation systems can suffer from poor compliance requiring retrospective, often manual, coding, thereby limiting real-time usability, especially where an outbreak of a novel disease might not present clinically as a currently recognised syndrome or MPC. Here, we investigate the use of an unsupervised method of EHR annotation using latent Dirichlet allocation topic-modelling to identify topics inherent within the clinical narrative component of EHRs. The model comprised 30 topics which were used to annotate EHRs spanning the natural disease outbreak and investigate whether any given topic might mirror the outbreak time-course. Narratives were annotated using the Gensim Library LdaModel module for the topic best representing the text within them. Counts for narratives labelled with one of the topics significantly matched the disease outbreak based on the practitioner-derived 'gastroenteric' MPC (Spearman correlation 0.978); no other topics showed a similar time course. Using artificially injected outbreaks, it was possible to see other topics that would match other MPCs including respiratory disease. The underlying topics were readily evaluated using simple word-cloud representations and using a freely available package (LDAVis) providing rapid insight into the clinical basis of each topic. This work clearly shows that unsupervised record annotation using topic modelling linked to simple text visualisations can provide an easily interrogable method to identify and characterise outbreaks and other anomalies of known and previously un-characterised diseases based on changes in clinical narratives. |
doi_str_mv | 10.1371/journal.pone.0260402 |
format | Article |
fullrecord | <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_2608443626</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A686069987</galeid><doaj_id>oai_doaj_org_article_16872df6ecdc41a5b9604886a51b49c5</doaj_id><sourcerecordid>A686069987</sourcerecordid><originalsourceid>FETCH-LOGICAL-c692t-aa7c51ed2e233356e480abfe537ddbe4c5152f99c5c790ca5acd5eca91e55efd3</originalsourceid><addsrcrecordid>eNqNk1-L1DAUxYso7rr6DUQDgujDjEnTpO2LsCz-GVxYUMfXkCa3naydZDZJFxf88KZOd5nKPkgf2ia_c25ykptlzwleElqSd5du8Fb2y52zsMQ5xwXOH2THpKb5gueYPjz4PsqehHCJMaMV54-zI1pUVV6S4jj7vQ7Gdii6nVFo6zT0_fjfOo8GG4Yd-GsTQCNprYsyGmeRaxH0oKJ3Nmk2IPu4QR6U8zokI2Q02Gjam6RBboiNB_lzFOlkJAMgY9H6C9KuC0-zR63sAzyb3ifZ-uOH72efF-cXn1Znp-cLxes8LqQsFSOgc8gppYxDUWHZtMBoqXUDRZpkeVvXiqmyxkoyqTQDJWsCjEGr6Un2cu-7610QU3BBpMyqoqA854lY7Qnt5KXYebOV_kY4acTfAec7IX00qgdBeFXmuuWgtCqIZE2doq8qLhlpirSG5PV-qjY0W9AqpeFlPzOdz1izEZ27FhVnNSdlMngzGXh3NUCIYmuCSicjLbhhv25Gi1Q0oa_-Qe_f3UR1Mm3A2Nalumo0Fae84pjXdTWWXd5DpUfD1qh0y1qTxmeCtzNBYiL8ip0cQhCrb1__n734MWdfH7D7CxZcP4y3L8zBYg8q70Lw0N6FTLAYm-Q2DTE2iZiaJMleHB7Qnei2K-gfRWQPLA</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2608443626</pqid></control><display><type>article</type><title>Using topic modelling for unsupervised annotation of electronic health records to identify an outbreak of disease in UK dogs</title><source>Public Library of Science (PLoS) Journals Open Access</source><source>MEDLINE</source><source>EZB Free E-Journals</source><source>DOAJ Directory of Open Access Journals</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><creator>Noble, Peter-John Mäntylä ; Appleton, Charlotte ; Radford, Alan David ; Nenadic, Goran</creator><contributor>Dórea, Fernanda C.</contributor><creatorcontrib>Noble, Peter-John Mäntylä ; Appleton, Charlotte ; Radford, Alan David ; Nenadic, Goran ; Dórea, Fernanda C.</creatorcontrib><description>A key goal of disease surveillance is to identify outbreaks of known or novel diseases in a timely manner. Such an outbreak occurred in the UK associated with acute vomiting in dogs between December 2019 and March 2020. We tracked this outbreak using the clinical free text component of anonymised electronic health records (EHRs) collected from a sentinel network of participating veterinary practices. We sourced the free text (narrative) component of each EHR supplemented with one of 10 practitioner-derived main presenting complaints (MPCs), with the 'gastroenteric' MPC identifying cases involved in the disease outbreak. Such clinician-derived annotation systems can suffer from poor compliance requiring retrospective, often manual, coding, thereby limiting real-time usability, especially where an outbreak of a novel disease might not present clinically as a currently recognised syndrome or MPC. Here, we investigate the use of an unsupervised method of EHR annotation using latent Dirichlet allocation topic-modelling to identify topics inherent within the clinical narrative component of EHRs. The model comprised 30 topics which were used to annotate EHRs spanning the natural disease outbreak and investigate whether any given topic might mirror the outbreak time-course. Narratives were annotated using the Gensim Library LdaModel module for the topic best representing the text within them. Counts for narratives labelled with one of the topics significantly matched the disease outbreak based on the practitioner-derived 'gastroenteric' MPC (Spearman correlation 0.978); no other topics showed a similar time course. Using artificially injected outbreaks, it was possible to see other topics that would match other MPCs including respiratory disease. The underlying topics were readily evaluated using simple word-cloud representations and using a freely available package (LDAVis) providing rapid insight into the clinical basis of each topic. This work clearly shows that unsupervised record annotation using topic modelling linked to simple text visualisations can provide an easily interrogable method to identify and characterise outbreaks and other anomalies of known and previously un-characterised diseases based on changes in clinical narratives.</description><identifier>ISSN: 1932-6203</identifier><identifier>EISSN: 1932-6203</identifier><identifier>DOI: 10.1371/journal.pone.0260402</identifier><identifier>PMID: 34882714</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Animals ; Annotations ; Anomalies ; Biology and Life Sciences ; Computer and Information Sciences ; Data Curation ; Dirichlet problem ; Disease Outbreaks - veterinary ; Dog Diseases - epidemiology ; Dogs ; Electronic Health Records ; Electronic medical records ; Electronic records ; Epidemics ; Gastroenteritis - epidemiology ; Gastroenteritis - veterinary ; Graphical representations ; Health aspects ; Health surveillance ; Identification methods ; Immunization ; Kidney diseases ; Management ; Medical records ; Medicine and Health Sciences ; Modelling ; Narratives ; Outbreaks ; Population Surveillance ; Pruritus ; Respiratory diseases ; Social Sciences ; Software ; United Kingdom - epidemiology ; Unsupervised Machine Learning ; Vomiting</subject><ispartof>PloS one, 2021-12, Vol.16 (12), p.e0260402-e0260402</ispartof><rights>COPYRIGHT 2021 Public Library of Science</rights><rights>2021 Noble et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2021 Noble et al 2021 Noble et al</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c692t-aa7c51ed2e233356e480abfe537ddbe4c5152f99c5c790ca5acd5eca91e55efd3</citedby><cites>FETCH-LOGICAL-c692t-aa7c51ed2e233356e480abfe537ddbe4c5152f99c5c790ca5acd5eca91e55efd3</cites><orcidid>0000-0002-2275-2014</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8659617/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8659617/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,2102,2928,23866,27924,27925,53791,53793,79600,79601</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/34882714$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Dórea, Fernanda C.</contributor><creatorcontrib>Noble, Peter-John Mäntylä</creatorcontrib><creatorcontrib>Appleton, Charlotte</creatorcontrib><creatorcontrib>Radford, Alan David</creatorcontrib><creatorcontrib>Nenadic, Goran</creatorcontrib><title>Using topic modelling for unsupervised annotation of electronic health records to identify an outbreak of disease in UK dogs</title><title>PloS one</title><addtitle>PLoS One</addtitle><description>A key goal of disease surveillance is to identify outbreaks of known or novel diseases in a timely manner. Such an outbreak occurred in the UK associated with acute vomiting in dogs between December 2019 and March 2020. We tracked this outbreak using the clinical free text component of anonymised electronic health records (EHRs) collected from a sentinel network of participating veterinary practices. We sourced the free text (narrative) component of each EHR supplemented with one of 10 practitioner-derived main presenting complaints (MPCs), with the 'gastroenteric' MPC identifying cases involved in the disease outbreak. Such clinician-derived annotation systems can suffer from poor compliance requiring retrospective, often manual, coding, thereby limiting real-time usability, especially where an outbreak of a novel disease might not present clinically as a currently recognised syndrome or MPC. Here, we investigate the use of an unsupervised method of EHR annotation using latent Dirichlet allocation topic-modelling to identify topics inherent within the clinical narrative component of EHRs. The model comprised 30 topics which were used to annotate EHRs spanning the natural disease outbreak and investigate whether any given topic might mirror the outbreak time-course. Narratives were annotated using the Gensim Library LdaModel module for the topic best representing the text within them. Counts for narratives labelled with one of the topics significantly matched the disease outbreak based on the practitioner-derived 'gastroenteric' MPC (Spearman correlation 0.978); no other topics showed a similar time course. Using artificially injected outbreaks, it was possible to see other topics that would match other MPCs including respiratory disease. The underlying topics were readily evaluated using simple word-cloud representations and using a freely available package (LDAVis) providing rapid insight into the clinical basis of each topic. This work clearly shows that unsupervised record annotation using topic modelling linked to simple text visualisations can provide an easily interrogable method to identify and characterise outbreaks and other anomalies of known and previously un-characterised diseases based on changes in clinical narratives.</description><subject>Animals</subject><subject>Annotations</subject><subject>Anomalies</subject><subject>Biology and Life Sciences</subject><subject>Computer and Information Sciences</subject><subject>Data Curation</subject><subject>Dirichlet problem</subject><subject>Disease Outbreaks - veterinary</subject><subject>Dog Diseases - epidemiology</subject><subject>Dogs</subject><subject>Electronic Health Records</subject><subject>Electronic medical records</subject><subject>Electronic records</subject><subject>Epidemics</subject><subject>Gastroenteritis - epidemiology</subject><subject>Gastroenteritis - veterinary</subject><subject>Graphical representations</subject><subject>Health aspects</subject><subject>Health surveillance</subject><subject>Identification methods</subject><subject>Immunization</subject><subject>Kidney diseases</subject><subject>Management</subject><subject>Medical records</subject><subject>Medicine and Health Sciences</subject><subject>Modelling</subject><subject>Narratives</subject><subject>Outbreaks</subject><subject>Population Surveillance</subject><subject>Pruritus</subject><subject>Respiratory diseases</subject><subject>Social Sciences</subject><subject>Software</subject><subject>United Kingdom - epidemiology</subject><subject>Unsupervised Machine Learning</subject><subject>Vomiting</subject><issn>1932-6203</issn><issn>1932-6203</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>DOA</sourceid><recordid>eNqNk1-L1DAUxYso7rr6DUQDgujDjEnTpO2LsCz-GVxYUMfXkCa3naydZDZJFxf88KZOd5nKPkgf2ia_c25ykptlzwleElqSd5du8Fb2y52zsMQ5xwXOH2THpKb5gueYPjz4PsqehHCJMaMV54-zI1pUVV6S4jj7vQ7Gdii6nVFo6zT0_fjfOo8GG4Yd-GsTQCNprYsyGmeRaxH0oKJ3Nmk2IPu4QR6U8zokI2Q02Gjam6RBboiNB_lzFOlkJAMgY9H6C9KuC0-zR63sAzyb3ifZ-uOH72efF-cXn1Znp-cLxes8LqQsFSOgc8gppYxDUWHZtMBoqXUDRZpkeVvXiqmyxkoyqTQDJWsCjEGr6Un2cu-7610QU3BBpMyqoqA854lY7Qnt5KXYebOV_kY4acTfAec7IX00qgdBeFXmuuWgtCqIZE2doq8qLhlpirSG5PV-qjY0W9AqpeFlPzOdz1izEZ27FhVnNSdlMngzGXh3NUCIYmuCSicjLbhhv25Gi1Q0oa_-Qe_f3UR1Mm3A2Nalumo0Fae84pjXdTWWXd5DpUfD1qh0y1qTxmeCtzNBYiL8ip0cQhCrb1__n734MWdfH7D7CxZcP4y3L8zBYg8q70Lw0N6FTLAYm-Q2DTE2iZiaJMleHB7Qnei2K-gfRWQPLA</recordid><startdate>20211209</startdate><enddate>20211209</enddate><creator>Noble, Peter-John Mäntylä</creator><creator>Appleton, Charlotte</creator><creator>Radford, Alan David</creator><creator>Nenadic, Goran</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>IOV</scope><scope>ISR</scope><scope>3V.</scope><scope>7QG</scope><scope>7QL</scope><scope>7QO</scope><scope>7RV</scope><scope>7SN</scope><scope>7SS</scope><scope>7T5</scope><scope>7TG</scope><scope>7TM</scope><scope>7U9</scope><scope>7X2</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>ATCPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>COVID</scope><scope>D1I</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>KB.</scope><scope>KB0</scope><scope>KL.</scope><scope>L6V</scope><scope>LK8</scope><scope>M0K</scope><scope>M0S</scope><scope>M1P</scope><scope>M7N</scope><scope>M7P</scope><scope>M7S</scope><scope>NAPCQ</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PATMY</scope><scope>PDBOC</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>PYCSY</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-2275-2014</orcidid></search><sort><creationdate>20211209</creationdate><title>Using topic modelling for unsupervised annotation of electronic health records to identify an outbreak of disease in UK dogs</title><author>Noble, Peter-John Mäntylä ; Appleton, Charlotte ; Radford, Alan David ; Nenadic, Goran</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c692t-aa7c51ed2e233356e480abfe537ddbe4c5152f99c5c790ca5acd5eca91e55efd3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Animals</topic><topic>Annotations</topic><topic>Anomalies</topic><topic>Biology and Life Sciences</topic><topic>Computer and Information Sciences</topic><topic>Data Curation</topic><topic>Dirichlet problem</topic><topic>Disease Outbreaks - veterinary</topic><topic>Dog Diseases - epidemiology</topic><topic>Dogs</topic><topic>Electronic Health Records</topic><topic>Electronic medical records</topic><topic>Electronic records</topic><topic>Epidemics</topic><topic>Gastroenteritis - epidemiology</topic><topic>Gastroenteritis - veterinary</topic><topic>Graphical representations</topic><topic>Health aspects</topic><topic>Health surveillance</topic><topic>Identification methods</topic><topic>Immunization</topic><topic>Kidney diseases</topic><topic>Management</topic><topic>Medical records</topic><topic>Medicine and Health Sciences</topic><topic>Modelling</topic><topic>Narratives</topic><topic>Outbreaks</topic><topic>Population Surveillance</topic><topic>Pruritus</topic><topic>Respiratory diseases</topic><topic>Social Sciences</topic><topic>Software</topic><topic>United Kingdom - epidemiology</topic><topic>Unsupervised Machine Learning</topic><topic>Vomiting</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Noble, Peter-John Mäntylä</creatorcontrib><creatorcontrib>Appleton, Charlotte</creatorcontrib><creatorcontrib>Radford, Alan David</creatorcontrib><creatorcontrib>Nenadic, Goran</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Opposing Viewpoints</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Nursing & Allied Health Database</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Immunology Abstracts</collection><collection>Meteorological & Geoastrophysical Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Agricultural Science Collection</collection><collection>Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Public Health Database</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>Agricultural & Environmental Science Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>Coronavirus Research Database</collection><collection>ProQuest Materials Science Collection</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Materials Science Database</collection><collection>Nursing & Allied Health Database (Alumni Edition)</collection><collection>Meteorological & Geoastrophysical Abstracts - Academic</collection><collection>ProQuest Engineering Collection</collection><collection>Biological Sciences</collection><collection>Agriculture Science Database</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>PML(ProQuest Medical Library)</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biological Science Database</collection><collection>Engineering Database</collection><collection>Nursing & Allied Health Premium</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Environmental Science Database</collection><collection>Materials Science Collection</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>Environmental Science Collection</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PloS one</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Noble, Peter-John Mäntylä</au><au>Appleton, Charlotte</au><au>Radford, Alan David</au><au>Nenadic, Goran</au><au>Dórea, Fernanda C.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Using topic modelling for unsupervised annotation of electronic health records to identify an outbreak of disease in UK dogs</atitle><jtitle>PloS one</jtitle><addtitle>PLoS One</addtitle><date>2021-12-09</date><risdate>2021</risdate><volume>16</volume><issue>12</issue><spage>e0260402</spage><epage>e0260402</epage><pages>e0260402-e0260402</pages><issn>1932-6203</issn><eissn>1932-6203</eissn><abstract>A key goal of disease surveillance is to identify outbreaks of known or novel diseases in a timely manner. Such an outbreak occurred in the UK associated with acute vomiting in dogs between December 2019 and March 2020. We tracked this outbreak using the clinical free text component of anonymised electronic health records (EHRs) collected from a sentinel network of participating veterinary practices. We sourced the free text (narrative) component of each EHR supplemented with one of 10 practitioner-derived main presenting complaints (MPCs), with the 'gastroenteric' MPC identifying cases involved in the disease outbreak. Such clinician-derived annotation systems can suffer from poor compliance requiring retrospective, often manual, coding, thereby limiting real-time usability, especially where an outbreak of a novel disease might not present clinically as a currently recognised syndrome or MPC. Here, we investigate the use of an unsupervised method of EHR annotation using latent Dirichlet allocation topic-modelling to identify topics inherent within the clinical narrative component of EHRs. The model comprised 30 topics which were used to annotate EHRs spanning the natural disease outbreak and investigate whether any given topic might mirror the outbreak time-course. Narratives were annotated using the Gensim Library LdaModel module for the topic best representing the text within them. Counts for narratives labelled with one of the topics significantly matched the disease outbreak based on the practitioner-derived 'gastroenteric' MPC (Spearman correlation 0.978); no other topics showed a similar time course. Using artificially injected outbreaks, it was possible to see other topics that would match other MPCs including respiratory disease. The underlying topics were readily evaluated using simple word-cloud representations and using a freely available package (LDAVis) providing rapid insight into the clinical basis of each topic. This work clearly shows that unsupervised record annotation using topic modelling linked to simple text visualisations can provide an easily interrogable method to identify and characterise outbreaks and other anomalies of known and previously un-characterised diseases based on changes in clinical narratives.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>34882714</pmid><doi>10.1371/journal.pone.0260402</doi><tpages>e0260402</tpages><orcidid>https://orcid.org/0000-0002-2275-2014</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1932-6203 |
ispartof | PloS one, 2021-12, Vol.16 (12), p.e0260402-e0260402 |
issn | 1932-6203 1932-6203 |
language | eng |
recordid | cdi_plos_journals_2608443626 |
source | Public Library of Science (PLoS) Journals Open Access; MEDLINE; EZB Free E-Journals; DOAJ Directory of Open Access Journals; PubMed Central; Free Full-Text Journals in Chemistry |
subjects | Animals Annotations Anomalies Biology and Life Sciences Computer and Information Sciences Data Curation Dirichlet problem Disease Outbreaks - veterinary Dog Diseases - epidemiology Dogs Electronic Health Records Electronic medical records Electronic records Epidemics Gastroenteritis - epidemiology Gastroenteritis - veterinary Graphical representations Health aspects Health surveillance Identification methods Immunization Kidney diseases Management Medical records Medicine and Health Sciences Modelling Narratives Outbreaks Population Surveillance Pruritus Respiratory diseases Social Sciences Software United Kingdom - epidemiology Unsupervised Machine Learning Vomiting |
title | Using topic modelling for unsupervised annotation of electronic health records to identify an outbreak of disease in UK dogs |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T16%3A36%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Using%20topic%20modelling%20for%20unsupervised%20annotation%20of%20electronic%20health%20records%20to%20identify%20an%20outbreak%20of%20disease%20in%20UK%20dogs&rft.jtitle=PloS%20one&rft.au=Noble,%20Peter-John%20M%C3%A4ntyl%C3%A4&rft.date=2021-12-09&rft.volume=16&rft.issue=12&rft.spage=e0260402&rft.epage=e0260402&rft.pages=e0260402-e0260402&rft.issn=1932-6203&rft.eissn=1932-6203&rft_id=info:doi/10.1371/journal.pone.0260402&rft_dat=%3Cgale_plos_%3EA686069987%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2608443626&rft_id=info:pmid/34882714&rft_galeid=A686069987&rft_doaj_id=oai_doaj_org_article_16872df6ecdc41a5b9604886a51b49c5&rfr_iscdi=true |