Using topic modelling for unsupervised annotation of electronic health records to identify an outbreak of disease in UK dogs

A key goal of disease surveillance is to identify outbreaks of known or novel diseases in a timely manner. Such an outbreak occurred in the UK associated with acute vomiting in dogs between December 2019 and March 2020. We tracked this outbreak using the clinical free text component of anonymised el...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:PloS one 2021-12, Vol.16 (12), p.e0260402-e0260402
Hauptverfasser: Noble, Peter-John Mäntylä, Appleton, Charlotte, Radford, Alan David, Nenadic, Goran
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page e0260402
container_issue 12
container_start_page e0260402
container_title PloS one
container_volume 16
creator Noble, Peter-John Mäntylä
Appleton, Charlotte
Radford, Alan David
Nenadic, Goran
description A key goal of disease surveillance is to identify outbreaks of known or novel diseases in a timely manner. Such an outbreak occurred in the UK associated with acute vomiting in dogs between December 2019 and March 2020. We tracked this outbreak using the clinical free text component of anonymised electronic health records (EHRs) collected from a sentinel network of participating veterinary practices. We sourced the free text (narrative) component of each EHR supplemented with one of 10 practitioner-derived main presenting complaints (MPCs), with the 'gastroenteric' MPC identifying cases involved in the disease outbreak. Such clinician-derived annotation systems can suffer from poor compliance requiring retrospective, often manual, coding, thereby limiting real-time usability, especially where an outbreak of a novel disease might not present clinically as a currently recognised syndrome or MPC. Here, we investigate the use of an unsupervised method of EHR annotation using latent Dirichlet allocation topic-modelling to identify topics inherent within the clinical narrative component of EHRs. The model comprised 30 topics which were used to annotate EHRs spanning the natural disease outbreak and investigate whether any given topic might mirror the outbreak time-course. Narratives were annotated using the Gensim Library LdaModel module for the topic best representing the text within them. Counts for narratives labelled with one of the topics significantly matched the disease outbreak based on the practitioner-derived 'gastroenteric' MPC (Spearman correlation 0.978); no other topics showed a similar time course. Using artificially injected outbreaks, it was possible to see other topics that would match other MPCs including respiratory disease. The underlying topics were readily evaluated using simple word-cloud representations and using a freely available package (LDAVis) providing rapid insight into the clinical basis of each topic. This work clearly shows that unsupervised record annotation using topic modelling linked to simple text visualisations can provide an easily interrogable method to identify and characterise outbreaks and other anomalies of known and previously un-characterised diseases based on changes in clinical narratives.
doi_str_mv 10.1371/journal.pone.0260402
format Article
fullrecord <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_2608443626</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A686069987</galeid><doaj_id>oai_doaj_org_article_16872df6ecdc41a5b9604886a51b49c5</doaj_id><sourcerecordid>A686069987</sourcerecordid><originalsourceid>FETCH-LOGICAL-c692t-aa7c51ed2e233356e480abfe537ddbe4c5152f99c5c790ca5acd5eca91e55efd3</originalsourceid><addsrcrecordid>eNqNk1-L1DAUxYso7rr6DUQDgujDjEnTpO2LsCz-GVxYUMfXkCa3naydZDZJFxf88KZOd5nKPkgf2ia_c25ykptlzwleElqSd5du8Fb2y52zsMQ5xwXOH2THpKb5gueYPjz4PsqehHCJMaMV54-zI1pUVV6S4jj7vQ7Gdii6nVFo6zT0_fjfOo8GG4Yd-GsTQCNprYsyGmeRaxH0oKJ3Nmk2IPu4QR6U8zokI2Q02Gjam6RBboiNB_lzFOlkJAMgY9H6C9KuC0-zR63sAzyb3ifZ-uOH72efF-cXn1Znp-cLxes8LqQsFSOgc8gppYxDUWHZtMBoqXUDRZpkeVvXiqmyxkoyqTQDJWsCjEGr6Un2cu-7610QU3BBpMyqoqA854lY7Qnt5KXYebOV_kY4acTfAec7IX00qgdBeFXmuuWgtCqIZE2doq8qLhlpirSG5PV-qjY0W9AqpeFlPzOdz1izEZ27FhVnNSdlMngzGXh3NUCIYmuCSicjLbhhv25Gi1Q0oa_-Qe_f3UR1Mm3A2Nalumo0Fae84pjXdTWWXd5DpUfD1qh0y1qTxmeCtzNBYiL8ip0cQhCrb1__n734MWdfH7D7CxZcP4y3L8zBYg8q70Lw0N6FTLAYm-Q2DTE2iZiaJMleHB7Qnei2K-gfRWQPLA</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2608443626</pqid></control><display><type>article</type><title>Using topic modelling for unsupervised annotation of electronic health records to identify an outbreak of disease in UK dogs</title><source>Public Library of Science (PLoS) Journals Open Access</source><source>MEDLINE</source><source>EZB Free E-Journals</source><source>DOAJ Directory of Open Access Journals</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><creator>Noble, Peter-John Mäntylä ; Appleton, Charlotte ; Radford, Alan David ; Nenadic, Goran</creator><contributor>Dórea, Fernanda C.</contributor><creatorcontrib>Noble, Peter-John Mäntylä ; Appleton, Charlotte ; Radford, Alan David ; Nenadic, Goran ; Dórea, Fernanda C.</creatorcontrib><description>A key goal of disease surveillance is to identify outbreaks of known or novel diseases in a timely manner. Such an outbreak occurred in the UK associated with acute vomiting in dogs between December 2019 and March 2020. We tracked this outbreak using the clinical free text component of anonymised electronic health records (EHRs) collected from a sentinel network of participating veterinary practices. We sourced the free text (narrative) component of each EHR supplemented with one of 10 practitioner-derived main presenting complaints (MPCs), with the 'gastroenteric' MPC identifying cases involved in the disease outbreak. Such clinician-derived annotation systems can suffer from poor compliance requiring retrospective, often manual, coding, thereby limiting real-time usability, especially where an outbreak of a novel disease might not present clinically as a currently recognised syndrome or MPC. Here, we investigate the use of an unsupervised method of EHR annotation using latent Dirichlet allocation topic-modelling to identify topics inherent within the clinical narrative component of EHRs. The model comprised 30 topics which were used to annotate EHRs spanning the natural disease outbreak and investigate whether any given topic might mirror the outbreak time-course. Narratives were annotated using the Gensim Library LdaModel module for the topic best representing the text within them. Counts for narratives labelled with one of the topics significantly matched the disease outbreak based on the practitioner-derived 'gastroenteric' MPC (Spearman correlation 0.978); no other topics showed a similar time course. Using artificially injected outbreaks, it was possible to see other topics that would match other MPCs including respiratory disease. The underlying topics were readily evaluated using simple word-cloud representations and using a freely available package (LDAVis) providing rapid insight into the clinical basis of each topic. This work clearly shows that unsupervised record annotation using topic modelling linked to simple text visualisations can provide an easily interrogable method to identify and characterise outbreaks and other anomalies of known and previously un-characterised diseases based on changes in clinical narratives.</description><identifier>ISSN: 1932-6203</identifier><identifier>EISSN: 1932-6203</identifier><identifier>DOI: 10.1371/journal.pone.0260402</identifier><identifier>PMID: 34882714</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Animals ; Annotations ; Anomalies ; Biology and Life Sciences ; Computer and Information Sciences ; Data Curation ; Dirichlet problem ; Disease Outbreaks - veterinary ; Dog Diseases - epidemiology ; Dogs ; Electronic Health Records ; Electronic medical records ; Electronic records ; Epidemics ; Gastroenteritis - epidemiology ; Gastroenteritis - veterinary ; Graphical representations ; Health aspects ; Health surveillance ; Identification methods ; Immunization ; Kidney diseases ; Management ; Medical records ; Medicine and Health Sciences ; Modelling ; Narratives ; Outbreaks ; Population Surveillance ; Pruritus ; Respiratory diseases ; Social Sciences ; Software ; United Kingdom - epidemiology ; Unsupervised Machine Learning ; Vomiting</subject><ispartof>PloS one, 2021-12, Vol.16 (12), p.e0260402-e0260402</ispartof><rights>COPYRIGHT 2021 Public Library of Science</rights><rights>2021 Noble et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2021 Noble et al 2021 Noble et al</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c692t-aa7c51ed2e233356e480abfe537ddbe4c5152f99c5c790ca5acd5eca91e55efd3</citedby><cites>FETCH-LOGICAL-c692t-aa7c51ed2e233356e480abfe537ddbe4c5152f99c5c790ca5acd5eca91e55efd3</cites><orcidid>0000-0002-2275-2014</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8659617/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8659617/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,2102,2928,23866,27924,27925,53791,53793,79600,79601</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/34882714$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Dórea, Fernanda C.</contributor><creatorcontrib>Noble, Peter-John Mäntylä</creatorcontrib><creatorcontrib>Appleton, Charlotte</creatorcontrib><creatorcontrib>Radford, Alan David</creatorcontrib><creatorcontrib>Nenadic, Goran</creatorcontrib><title>Using topic modelling for unsupervised annotation of electronic health records to identify an outbreak of disease in UK dogs</title><title>PloS one</title><addtitle>PLoS One</addtitle><description>A key goal of disease surveillance is to identify outbreaks of known or novel diseases in a timely manner. Such an outbreak occurred in the UK associated with acute vomiting in dogs between December 2019 and March 2020. We tracked this outbreak using the clinical free text component of anonymised electronic health records (EHRs) collected from a sentinel network of participating veterinary practices. We sourced the free text (narrative) component of each EHR supplemented with one of 10 practitioner-derived main presenting complaints (MPCs), with the 'gastroenteric' MPC identifying cases involved in the disease outbreak. Such clinician-derived annotation systems can suffer from poor compliance requiring retrospective, often manual, coding, thereby limiting real-time usability, especially where an outbreak of a novel disease might not present clinically as a currently recognised syndrome or MPC. Here, we investigate the use of an unsupervised method of EHR annotation using latent Dirichlet allocation topic-modelling to identify topics inherent within the clinical narrative component of EHRs. The model comprised 30 topics which were used to annotate EHRs spanning the natural disease outbreak and investigate whether any given topic might mirror the outbreak time-course. Narratives were annotated using the Gensim Library LdaModel module for the topic best representing the text within them. Counts for narratives labelled with one of the topics significantly matched the disease outbreak based on the practitioner-derived 'gastroenteric' MPC (Spearman correlation 0.978); no other topics showed a similar time course. Using artificially injected outbreaks, it was possible to see other topics that would match other MPCs including respiratory disease. The underlying topics were readily evaluated using simple word-cloud representations and using a freely available package (LDAVis) providing rapid insight into the clinical basis of each topic. This work clearly shows that unsupervised record annotation using topic modelling linked to simple text visualisations can provide an easily interrogable method to identify and characterise outbreaks and other anomalies of known and previously un-characterised diseases based on changes in clinical narratives.</description><subject>Animals</subject><subject>Annotations</subject><subject>Anomalies</subject><subject>Biology and Life Sciences</subject><subject>Computer and Information Sciences</subject><subject>Data Curation</subject><subject>Dirichlet problem</subject><subject>Disease Outbreaks - veterinary</subject><subject>Dog Diseases - epidemiology</subject><subject>Dogs</subject><subject>Electronic Health Records</subject><subject>Electronic medical records</subject><subject>Electronic records</subject><subject>Epidemics</subject><subject>Gastroenteritis - epidemiology</subject><subject>Gastroenteritis - veterinary</subject><subject>Graphical representations</subject><subject>Health aspects</subject><subject>Health surveillance</subject><subject>Identification methods</subject><subject>Immunization</subject><subject>Kidney diseases</subject><subject>Management</subject><subject>Medical records</subject><subject>Medicine and Health Sciences</subject><subject>Modelling</subject><subject>Narratives</subject><subject>Outbreaks</subject><subject>Population Surveillance</subject><subject>Pruritus</subject><subject>Respiratory diseases</subject><subject>Social Sciences</subject><subject>Software</subject><subject>United Kingdom - epidemiology</subject><subject>Unsupervised Machine Learning</subject><subject>Vomiting</subject><issn>1932-6203</issn><issn>1932-6203</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><sourceid>DOA</sourceid><recordid>eNqNk1-L1DAUxYso7rr6DUQDgujDjEnTpO2LsCz-GVxYUMfXkCa3naydZDZJFxf88KZOd5nKPkgf2ia_c25ykptlzwleElqSd5du8Fb2y52zsMQ5xwXOH2THpKb5gueYPjz4PsqehHCJMaMV54-zI1pUVV6S4jj7vQ7Gdii6nVFo6zT0_fjfOo8GG4Yd-GsTQCNprYsyGmeRaxH0oKJ3Nmk2IPu4QR6U8zokI2Q02Gjam6RBboiNB_lzFOlkJAMgY9H6C9KuC0-zR63sAzyb3ifZ-uOH72efF-cXn1Znp-cLxes8LqQsFSOgc8gppYxDUWHZtMBoqXUDRZpkeVvXiqmyxkoyqTQDJWsCjEGr6Un2cu-7610QU3BBpMyqoqA854lY7Qnt5KXYebOV_kY4acTfAec7IX00qgdBeFXmuuWgtCqIZE2doq8qLhlpirSG5PV-qjY0W9AqpeFlPzOdz1izEZ27FhVnNSdlMngzGXh3NUCIYmuCSicjLbhhv25Gi1Q0oa_-Qe_f3UR1Mm3A2Nalumo0Fae84pjXdTWWXd5DpUfD1qh0y1qTxmeCtzNBYiL8ip0cQhCrb1__n734MWdfH7D7CxZcP4y3L8zBYg8q70Lw0N6FTLAYm-Q2DTE2iZiaJMleHB7Qnei2K-gfRWQPLA</recordid><startdate>20211209</startdate><enddate>20211209</enddate><creator>Noble, Peter-John Mäntylä</creator><creator>Appleton, Charlotte</creator><creator>Radford, Alan David</creator><creator>Nenadic, Goran</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>IOV</scope><scope>ISR</scope><scope>3V.</scope><scope>7QG</scope><scope>7QL</scope><scope>7QO</scope><scope>7RV</scope><scope>7SN</scope><scope>7SS</scope><scope>7T5</scope><scope>7TG</scope><scope>7TM</scope><scope>7U9</scope><scope>7X2</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>ATCPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>COVID</scope><scope>D1I</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>KB.</scope><scope>KB0</scope><scope>KL.</scope><scope>L6V</scope><scope>LK8</scope><scope>M0K</scope><scope>M0S</scope><scope>M1P</scope><scope>M7N</scope><scope>M7P</scope><scope>M7S</scope><scope>NAPCQ</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PATMY</scope><scope>PDBOC</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>PYCSY</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-2275-2014</orcidid></search><sort><creationdate>20211209</creationdate><title>Using topic modelling for unsupervised annotation of electronic health records to identify an outbreak of disease in UK dogs</title><author>Noble, Peter-John Mäntylä ; Appleton, Charlotte ; Radford, Alan David ; Nenadic, Goran</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c692t-aa7c51ed2e233356e480abfe537ddbe4c5152f99c5c790ca5acd5eca91e55efd3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Animals</topic><topic>Annotations</topic><topic>Anomalies</topic><topic>Biology and Life Sciences</topic><topic>Computer and Information Sciences</topic><topic>Data Curation</topic><topic>Dirichlet problem</topic><topic>Disease Outbreaks - veterinary</topic><topic>Dog Diseases - epidemiology</topic><topic>Dogs</topic><topic>Electronic Health Records</topic><topic>Electronic medical records</topic><topic>Electronic records</topic><topic>Epidemics</topic><topic>Gastroenteritis - epidemiology</topic><topic>Gastroenteritis - veterinary</topic><topic>Graphical representations</topic><topic>Health aspects</topic><topic>Health surveillance</topic><topic>Identification methods</topic><topic>Immunization</topic><topic>Kidney diseases</topic><topic>Management</topic><topic>Medical records</topic><topic>Medicine and Health Sciences</topic><topic>Modelling</topic><topic>Narratives</topic><topic>Outbreaks</topic><topic>Population Surveillance</topic><topic>Pruritus</topic><topic>Respiratory diseases</topic><topic>Social Sciences</topic><topic>Software</topic><topic>United Kingdom - epidemiology</topic><topic>Unsupervised Machine Learning</topic><topic>Vomiting</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Noble, Peter-John Mäntylä</creatorcontrib><creatorcontrib>Appleton, Charlotte</creatorcontrib><creatorcontrib>Radford, Alan David</creatorcontrib><creatorcontrib>Nenadic, Goran</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Opposing Viewpoints</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Nursing &amp; Allied Health Database</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Immunology Abstracts</collection><collection>Meteorological &amp; Geoastrophysical Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Agricultural Science Collection</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Public Health Database</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>Agricultural &amp; Environmental Science Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>Coronavirus Research Database</collection><collection>ProQuest Materials Science Collection</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Materials Science Database</collection><collection>Nursing &amp; Allied Health Database (Alumni Edition)</collection><collection>Meteorological &amp; Geoastrophysical Abstracts - Academic</collection><collection>ProQuest Engineering Collection</collection><collection>Biological Sciences</collection><collection>Agriculture Science Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>PML(ProQuest Medical Library)</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biological Science Database</collection><collection>Engineering Database</collection><collection>Nursing &amp; Allied Health Premium</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Environmental Science Database</collection><collection>Materials Science Collection</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>Environmental Science Collection</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PloS one</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Noble, Peter-John Mäntylä</au><au>Appleton, Charlotte</au><au>Radford, Alan David</au><au>Nenadic, Goran</au><au>Dórea, Fernanda C.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Using topic modelling for unsupervised annotation of electronic health records to identify an outbreak of disease in UK dogs</atitle><jtitle>PloS one</jtitle><addtitle>PLoS One</addtitle><date>2021-12-09</date><risdate>2021</risdate><volume>16</volume><issue>12</issue><spage>e0260402</spage><epage>e0260402</epage><pages>e0260402-e0260402</pages><issn>1932-6203</issn><eissn>1932-6203</eissn><abstract>A key goal of disease surveillance is to identify outbreaks of known or novel diseases in a timely manner. Such an outbreak occurred in the UK associated with acute vomiting in dogs between December 2019 and March 2020. We tracked this outbreak using the clinical free text component of anonymised electronic health records (EHRs) collected from a sentinel network of participating veterinary practices. We sourced the free text (narrative) component of each EHR supplemented with one of 10 practitioner-derived main presenting complaints (MPCs), with the 'gastroenteric' MPC identifying cases involved in the disease outbreak. Such clinician-derived annotation systems can suffer from poor compliance requiring retrospective, often manual, coding, thereby limiting real-time usability, especially where an outbreak of a novel disease might not present clinically as a currently recognised syndrome or MPC. Here, we investigate the use of an unsupervised method of EHR annotation using latent Dirichlet allocation topic-modelling to identify topics inherent within the clinical narrative component of EHRs. The model comprised 30 topics which were used to annotate EHRs spanning the natural disease outbreak and investigate whether any given topic might mirror the outbreak time-course. Narratives were annotated using the Gensim Library LdaModel module for the topic best representing the text within them. Counts for narratives labelled with one of the topics significantly matched the disease outbreak based on the practitioner-derived 'gastroenteric' MPC (Spearman correlation 0.978); no other topics showed a similar time course. Using artificially injected outbreaks, it was possible to see other topics that would match other MPCs including respiratory disease. The underlying topics were readily evaluated using simple word-cloud representations and using a freely available package (LDAVis) providing rapid insight into the clinical basis of each topic. This work clearly shows that unsupervised record annotation using topic modelling linked to simple text visualisations can provide an easily interrogable method to identify and characterise outbreaks and other anomalies of known and previously un-characterised diseases based on changes in clinical narratives.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>34882714</pmid><doi>10.1371/journal.pone.0260402</doi><tpages>e0260402</tpages><orcidid>https://orcid.org/0000-0002-2275-2014</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1932-6203
ispartof PloS one, 2021-12, Vol.16 (12), p.e0260402-e0260402
issn 1932-6203
1932-6203
language eng
recordid cdi_plos_journals_2608443626
source Public Library of Science (PLoS) Journals Open Access; MEDLINE; EZB Free E-Journals; DOAJ Directory of Open Access Journals; PubMed Central; Free Full-Text Journals in Chemistry
subjects Animals
Annotations
Anomalies
Biology and Life Sciences
Computer and Information Sciences
Data Curation
Dirichlet problem
Disease Outbreaks - veterinary
Dog Diseases - epidemiology
Dogs
Electronic Health Records
Electronic medical records
Electronic records
Epidemics
Gastroenteritis - epidemiology
Gastroenteritis - veterinary
Graphical representations
Health aspects
Health surveillance
Identification methods
Immunization
Kidney diseases
Management
Medical records
Medicine and Health Sciences
Modelling
Narratives
Outbreaks
Population Surveillance
Pruritus
Respiratory diseases
Social Sciences
Software
United Kingdom - epidemiology
Unsupervised Machine Learning
Vomiting
title Using topic modelling for unsupervised annotation of electronic health records to identify an outbreak of disease in UK dogs
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T16%3A36%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Using%20topic%20modelling%20for%20unsupervised%20annotation%20of%20electronic%20health%20records%20to%20identify%20an%20outbreak%20of%20disease%20in%20UK%20dogs&rft.jtitle=PloS%20one&rft.au=Noble,%20Peter-John%20M%C3%A4ntyl%C3%A4&rft.date=2021-12-09&rft.volume=16&rft.issue=12&rft.spage=e0260402&rft.epage=e0260402&rft.pages=e0260402-e0260402&rft.issn=1932-6203&rft.eissn=1932-6203&rft_id=info:doi/10.1371/journal.pone.0260402&rft_dat=%3Cgale_plos_%3EA686069987%3C/gale_plos_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2608443626&rft_id=info:pmid/34882714&rft_galeid=A686069987&rft_doaj_id=oai_doaj_org_article_16872df6ecdc41a5b9604886a51b49c5&rfr_iscdi=true