Mining e-cigarette adverse events in social media using Bi-LSTM recurrent neural network with word embedding representation

Abstract Objective Recent years have seen increased worldwide popularity of e-cigarette use. However, the risks of e-cigarettes are underexamined. Most e-cigarette adverse event studies have achieved low detection rates due to limited subject sample sizes in the experiments and surveys. Social media...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of the American Medical Informatics Association : JAMIA 2018-01, Vol.25 (1), p.72-80
Hauptverfasser:	Xie, Jiaheng, Liu, Xiao, Dajun Zeng, Daniel
Format:	Artikel
Sprache:	eng
Schlagworte:	Data Mining - methods Electronic Nicotine Delivery Systems Humans Neural Networks (Computer) Research and Applications Semantics Social Media Vaping - adverse effects
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	80
container_issue	1
container_start_page	72
container_title	Journal of the American Medical Informatics Association : JAMIA
container_volume	25
creator	Xie, Jiaheng Liu, Xiao Dajun Zeng, Daniel
description	Abstract Objective Recent years have seen increased worldwide popularity of e-cigarette use. However, the risks of e-cigarettes are underexamined. Most e-cigarette adverse event studies have achieved low detection rates due to limited subject sample sizes in the experiments and surveys. Social media provides a large data repository of consumers’ e-cigarette feedback and experiences, which are useful for e-cigarette safety surveillance. However, it is difficult to automatically interpret the informal and nontechnical consumer vocabulary about e-cigarettes in social media. This issue hinders the use of social media content for e-cigarette safety surveillance. Recent developments in deep neural network methods have shown promise for named entity extraction from noisy text. Motivated by these observations, we aimed to design a deep neural network approach to extract e-cigarette safety information in social media. Methods Our deep neural language model utilizes word embedding as the representation of text input and recognizes named entity types with the state-of-the-art Bidirectional Long Short-Term Memory (Bi-LSTM) Recurrent Neural Network. Results Our Bi-LSTM model achieved the best performance compared to 3 baseline models, with a precision of 94.10%, a recall of 91.80%, and an F-measure of 92.94%. We identified 1591 unique adverse events and 9930 unique e-cigarette components (ie, chemicals, flavors, and devices) from our research testbed. Conclusion Although the conditional random field baseline model had slightly better precision than our approach, our Bi-LSTM model achieved much higher recall, resulting in the best F-measure. Our method can be generalized to extract medical concepts from social media for other medical applications.
doi_str_mv	10.1093/jamia/ocx045
format	Article
fullrecord	<record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_6455898</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/jamia/ocx045</oup_id><sourcerecordid>1899404203</sourcerecordid><originalsourceid>FETCH-LOGICAL-c416t-5eb7cc230fbb393258343d7c2dc9f12d7b994304f78c2bf9f34a1e6d719cf2783</originalsourceid><addsrcrecordid>eNp9kb1vFDEQxS0UREKgo0buQsESf67XTSSIwod0EQVBorO89uzF4dY-7N1Lovzz-LgQQZNqRprfvHmjh9ArSt5RovnxlR2DPU7uhgj5BB1QyVSjlfixV3vSqkYSpvbR81KuCKEt4_IZ2medJJJ15ADdnYcY4hJD48LSZpgmwNZvIBfAsIE4FRwiLskFu8Ij-GDxXLYLH0Kz-HZxjjO4OecK4ghzrlCE6Trln_g6TJe4dh7D2IP326UM6wylwnYKKb5ATwe7KvDyvh6i7x_PLk4_N4uvn76cvl80TtB2aiT0yjnGydD3XHMmOy64V455pwfKvOq1FpyIQXWO9YMeuLAUWq-odgNTHT9EJzvd9dzXF1y9X42adQ6jzbcm2WD-n8RwaZZpY1ohZae3Am_uBXL6NUOZzBiKg9XKRkhzMbSrDohghFf07Q51OZWSYXg4Q4nZ5mX-5GV2eVX89b_WHuC_AVXgaAekef241G_AmKPW</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1899404203</pqid></control><display><type>article</type><title>Mining e-cigarette adverse events in social media using Bi-LSTM recurrent neural network with word embedding representation</title><source>MEDLINE</source><source>Oxford University Press Journals All Titles (1996-Current)</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><creator>Xie, Jiaheng ; Liu, Xiao ; Dajun Zeng, Daniel</creator><creatorcontrib>Xie, Jiaheng ; Liu, Xiao ; Dajun Zeng, Daniel</creatorcontrib><description>Abstract Objective Recent years have seen increased worldwide popularity of e-cigarette use. However, the risks of e-cigarettes are underexamined. Most e-cigarette adverse event studies have achieved low detection rates due to limited subject sample sizes in the experiments and surveys. Social media provides a large data repository of consumers’ e-cigarette feedback and experiences, which are useful for e-cigarette safety surveillance. However, it is difficult to automatically interpret the informal and nontechnical consumer vocabulary about e-cigarettes in social media. This issue hinders the use of social media content for e-cigarette safety surveillance. Recent developments in deep neural network methods have shown promise for named entity extraction from noisy text. Motivated by these observations, we aimed to design a deep neural network approach to extract e-cigarette safety information in social media. Methods Our deep neural language model utilizes word embedding as the representation of text input and recognizes named entity types with the state-of-the-art Bidirectional Long Short-Term Memory (Bi-LSTM) Recurrent Neural Network. Results Our Bi-LSTM model achieved the best performance compared to 3 baseline models, with a precision of 94.10%, a recall of 91.80%, and an F-measure of 92.94%. We identified 1591 unique adverse events and 9930 unique e-cigarette components (ie, chemicals, flavors, and devices) from our research testbed. Conclusion Although the conditional random field baseline model had slightly better precision than our approach, our Bi-LSTM model achieved much higher recall, resulting in the best F-measure. Our method can be generalized to extract medical concepts from social media for other medical applications.</description><identifier>ISSN: 1067-5027</identifier><identifier>EISSN: 1527-974X</identifier><identifier>DOI: 10.1093/jamia/ocx045</identifier><identifier>PMID: 28505280</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Data Mining - methods ; Electronic Nicotine Delivery Systems ; Humans ; Neural Networks (Computer) ; Research and Applications ; Semantics ; Social Media ; Vaping - adverse effects</subject><ispartof>Journal of the American Medical Informatics Association : JAMIA, 2018-01, Vol.25 (1), p.72-80</ispartof><rights>The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com 2017</rights><rights>The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c416t-5eb7cc230fbb393258343d7c2dc9f12d7b994304f78c2bf9f34a1e6d719cf2783</citedby><cites>FETCH-LOGICAL-c416t-5eb7cc230fbb393258343d7c2dc9f12d7b994304f78c2bf9f34a1e6d719cf2783</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC6455898/pdf/$$EPDF$$P50$$Gpubmedcentral$$H</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC6455898/$$EHTML$$P50$$Gpubmedcentral$$H</linktohtml><link.rule.ids>230,314,727,780,784,885,1584,27924,27925,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/28505280$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Xie, Jiaheng</creatorcontrib><creatorcontrib>Liu, Xiao</creatorcontrib><creatorcontrib>Dajun Zeng, Daniel</creatorcontrib><title>Mining e-cigarette adverse events in social media using Bi-LSTM recurrent neural network with word embedding representation</title><title>Journal of the American Medical Informatics Association : JAMIA</title><addtitle>J Am Med Inform Assoc</addtitle><description>Abstract Objective Recent years have seen increased worldwide popularity of e-cigarette use. However, the risks of e-cigarettes are underexamined. Most e-cigarette adverse event studies have achieved low detection rates due to limited subject sample sizes in the experiments and surveys. Social media provides a large data repository of consumers’ e-cigarette feedback and experiences, which are useful for e-cigarette safety surveillance. However, it is difficult to automatically interpret the informal and nontechnical consumer vocabulary about e-cigarettes in social media. This issue hinders the use of social media content for e-cigarette safety surveillance. Recent developments in deep neural network methods have shown promise for named entity extraction from noisy text. Motivated by these observations, we aimed to design a deep neural network approach to extract e-cigarette safety information in social media. Methods Our deep neural language model utilizes word embedding as the representation of text input and recognizes named entity types with the state-of-the-art Bidirectional Long Short-Term Memory (Bi-LSTM) Recurrent Neural Network. Results Our Bi-LSTM model achieved the best performance compared to 3 baseline models, with a precision of 94.10%, a recall of 91.80%, and an F-measure of 92.94%. We identified 1591 unique adverse events and 9930 unique e-cigarette components (ie, chemicals, flavors, and devices) from our research testbed. Conclusion Although the conditional random field baseline model had slightly better precision than our approach, our Bi-LSTM model achieved much higher recall, resulting in the best F-measure. Our method can be generalized to extract medical concepts from social media for other medical applications.</description><subject>Data Mining - methods</subject><subject>Electronic Nicotine Delivery Systems</subject><subject>Humans</subject><subject>Neural Networks (Computer)</subject><subject>Research and Applications</subject><subject>Semantics</subject><subject>Social Media</subject><subject>Vaping - adverse effects</subject><issn>1067-5027</issn><issn>1527-974X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNp9kb1vFDEQxS0UREKgo0buQsESf67XTSSIwod0EQVBorO89uzF4dY-7N1Lovzz-LgQQZNqRprfvHmjh9ArSt5RovnxlR2DPU7uhgj5BB1QyVSjlfixV3vSqkYSpvbR81KuCKEt4_IZ2medJJJ15ADdnYcY4hJD48LSZpgmwNZvIBfAsIE4FRwiLskFu8Ij-GDxXLYLH0Kz-HZxjjO4OecK4ghzrlCE6Trln_g6TJe4dh7D2IP326UM6wylwnYKKb5ATwe7KvDyvh6i7x_PLk4_N4uvn76cvl80TtB2aiT0yjnGydD3XHMmOy64V455pwfKvOq1FpyIQXWO9YMeuLAUWq-odgNTHT9EJzvd9dzXF1y9X42adQ6jzbcm2WD-n8RwaZZpY1ohZae3Am_uBXL6NUOZzBiKg9XKRkhzMbSrDohghFf07Q51OZWSYXg4Q4nZ5mX-5GV2eVX89b_WHuC_AVXgaAekef241G_AmKPW</recordid><startdate>20180101</startdate><enddate>20180101</enddate><creator>Xie, Jiaheng</creator><creator>Liu, Xiao</creator><creator>Dajun Zeng, Daniel</creator><general>Oxford University Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20180101</creationdate><title>Mining e-cigarette adverse events in social media using Bi-LSTM recurrent neural network with word embedding representation</title><author>Xie, Jiaheng ; Liu, Xiao ; Dajun Zeng, Daniel</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c416t-5eb7cc230fbb393258343d7c2dc9f12d7b994304f78c2bf9f34a1e6d719cf2783</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Data Mining - methods</topic><topic>Electronic Nicotine Delivery Systems</topic><topic>Humans</topic><topic>Neural Networks (Computer)</topic><topic>Research and Applications</topic><topic>Semantics</topic><topic>Social Media</topic><topic>Vaping - adverse effects</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Xie, Jiaheng</creatorcontrib><creatorcontrib>Liu, Xiao</creatorcontrib><creatorcontrib>Dajun Zeng, Daniel</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Journal of the American Medical Informatics Association : JAMIA</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Xie, Jiaheng</au><au>Liu, Xiao</au><au>Dajun Zeng, Daniel</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Mining e-cigarette adverse events in social media using Bi-LSTM recurrent neural network with word embedding representation</atitle><jtitle>Journal of the American Medical Informatics Association : JAMIA</jtitle><addtitle>J Am Med Inform Assoc</addtitle><date>2018-01-01</date><risdate>2018</risdate><volume>25</volume><issue>1</issue><spage>72</spage><epage>80</epage><pages>72-80</pages><issn>1067-5027</issn><eissn>1527-974X</eissn><abstract>Abstract Objective Recent years have seen increased worldwide popularity of e-cigarette use. However, the risks of e-cigarettes are underexamined. Most e-cigarette adverse event studies have achieved low detection rates due to limited subject sample sizes in the experiments and surveys. Social media provides a large data repository of consumers’ e-cigarette feedback and experiences, which are useful for e-cigarette safety surveillance. However, it is difficult to automatically interpret the informal and nontechnical consumer vocabulary about e-cigarettes in social media. This issue hinders the use of social media content for e-cigarette safety surveillance. Recent developments in deep neural network methods have shown promise for named entity extraction from noisy text. Motivated by these observations, we aimed to design a deep neural network approach to extract e-cigarette safety information in social media. Methods Our deep neural language model utilizes word embedding as the representation of text input and recognizes named entity types with the state-of-the-art Bidirectional Long Short-Term Memory (Bi-LSTM) Recurrent Neural Network. Results Our Bi-LSTM model achieved the best performance compared to 3 baseline models, with a precision of 94.10%, a recall of 91.80%, and an F-measure of 92.94%. We identified 1591 unique adverse events and 9930 unique e-cigarette components (ie, chemicals, flavors, and devices) from our research testbed. Conclusion Although the conditional random field baseline model had slightly better precision than our approach, our Bi-LSTM model achieved much higher recall, resulting in the best F-measure. Our method can be generalized to extract medical concepts from social media for other medical applications.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>28505280</pmid><doi>10.1093/jamia/ocx045</doi><tpages>9</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1067-5027
ispartof	Journal of the American Medical Informatics Association : JAMIA, 2018-01, Vol.25 (1), p.72-80
issn	1067-5027 1527-974X
language	eng
recordid	cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_6455898
source	MEDLINE; Oxford University Press Journals All Titles (1996-Current); EZB-FREE-00999 freely available EZB journals; PubMed Central
subjects	Data Mining - methods Electronic Nicotine Delivery Systems Humans Neural Networks (Computer) Research and Applications Semantics Social Media Vaping - adverse effects
title	Mining e-cigarette adverse events in social media using Bi-LSTM recurrent neural network with word embedding representation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T15%3A38%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Mining%20e-cigarette%20adverse%20events%20in%20social%20media%20using%20Bi-LSTM%20recurrent%20neural%20network%20with%20word%20embedding%20representation&rft.jtitle=Journal%20of%20the%20American%20Medical%20Informatics%20Association%20:%20JAMIA&rft.au=Xie,%20Jiaheng&rft.date=2018-01-01&rft.volume=25&rft.issue=1&rft.spage=72&rft.epage=80&rft.pages=72-80&rft.issn=1067-5027&rft.eissn=1527-974X&rft_id=info:doi/10.1093/jamia/ocx045&rft_dat=%3Cproquest_pubme%3E1899404203%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1899404203&rft_id=info:pmid/28505280&rft_oup_id=10.1093/jamia/ocx045&rfr_iscdi=true