De-identification of electronic health record using neural network

According to a recent study, around 99% of hospitals across the US now use electronic health record systems (EHRs). One of the most common types of EHR is the unstructured textual data, and unlocking hidden details from this data is critical for improving current medical practices and research endea...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Scientific reports 2020-10, Vol.10 (1), p.18600-18600, Article 18600
Hauptverfasser: Ahmed, Tanbir, Aziz, Md Momin Al, Mohammed, Noman
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 18600
container_issue 1
container_start_page 18600
container_title Scientific reports
container_volume 10
creator Ahmed, Tanbir
Aziz, Md Momin Al
Mohammed, Noman
description According to a recent study, around 99% of hospitals across the US now use electronic health record systems (EHRs). One of the most common types of EHR is the unstructured textual data, and unlocking hidden details from this data is critical for improving current medical practices and research endeavors. However, these textual data contain sensitive information, which could compromise our privacy. Therefore, medical textual data cannot be released publicly without undergoing any privacy-protective measures. De-identification is a process of detecting and removing all sensitive information present in EHRs, and it is a necessary step towards privacy-preserving EHR data sharing. Over the last decade, there have been several proposals to de-identify textual data using manual, rule-based, and machine learning methods. In this article, we propose new methods to de-identify textual data based on the self-attention mechanism and stacked Recurrent Neural Network. To the best of our knowledge, we are the first to employ these techniques. Experimental results on three different datasets show that our model performs better than all state-of-the-art mechanism irrespective of the dataset. Additionally, our proposed method is significantly faster than the existing techniques. Finally, we introduced three utility metrics to judge the quality of the de-identified data.
doi_str_mv 10.1038/s41598-020-75544-1
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_7596089</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2456416320</sourcerecordid><originalsourceid>FETCH-LOGICAL-c499t-451eefba3ff6503bbf04db5f5e8e823f41871a52d78ae6eba88d17608acca323</originalsourceid><addsrcrecordid>eNp9kUtLxDAUhYMoKqN_wFXBjZtqnm26EXwrDLiZfUjTm5loJxmTVvHfGx3xtTCbG8h3DufmIHRA8DHBTJ4kTkQjS0xxWQvBeUk20C7FXJSUUbr5476D9lN6wPkI2nDSbKMdxgilNRO76PwSSteBH5x1Rg8u-CLYAnowQwzemWIBuh8WRQQTYleMyfl54WGMus9jeAnxcQ9tWd0n2P-cEzS7vppd3JbT-5u7i7NpaXjTDCUXBMC2mllbCcza1mLetcIKkCAps5zImmhBu1pqqKDVUnakrrDUxmhG2QSdrm1XY7uEzuTMOYRaRbfU8VUF7dTvF-8Wah6eVS2a7NJkg6NPgxieRkiDWrpkoO-1hzAmRbmoOKkYxRk9_IM-hDH6vF2mapK_u8F1puiaMjGkFMF-hSFYvZek1iWpXJL6KEmRLGJrUcqwn0P8tv5H9QaXvpQG</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2471544907</pqid></control><display><type>article</type><title>De-identification of electronic health record using neural network</title><source>DOAJ Directory of Open Access Journals</source><source>Springer Nature OA Free Journals</source><source>Nature Free</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><creator>Ahmed, Tanbir ; Aziz, Md Momin Al ; Mohammed, Noman</creator><creatorcontrib>Ahmed, Tanbir ; Aziz, Md Momin Al ; Mohammed, Noman</creatorcontrib><description>According to a recent study, around 99% of hospitals across the US now use electronic health record systems (EHRs). One of the most common types of EHR is the unstructured textual data, and unlocking hidden details from this data is critical for improving current medical practices and research endeavors. However, these textual data contain sensitive information, which could compromise our privacy. Therefore, medical textual data cannot be released publicly without undergoing any privacy-protective measures. De-identification is a process of detecting and removing all sensitive information present in EHRs, and it is a necessary step towards privacy-preserving EHR data sharing. Over the last decade, there have been several proposals to de-identify textual data using manual, rule-based, and machine learning methods. In this article, we propose new methods to de-identify textual data based on the self-attention mechanism and stacked Recurrent Neural Network. To the best of our knowledge, we are the first to employ these techniques. Experimental results on three different datasets show that our model performs better than all state-of-the-art mechanism irrespective of the dataset. Additionally, our proposed method is significantly faster than the existing techniques. Finally, we introduced three utility metrics to judge the quality of the de-identified data.</description><identifier>ISSN: 2045-2322</identifier><identifier>EISSN: 2045-2322</identifier><identifier>DOI: 10.1038/s41598-020-75544-1</identifier><identifier>PMID: 33122735</identifier><language>eng</language><publisher>London: Nature Publishing Group UK</publisher><subject>639/166/985 ; 692/700/1538 ; Electronic medical records ; Humanities and Social Sciences ; Learning algorithms ; Machine learning ; multidisciplinary ; Neural networks ; Privacy ; Science ; Science (multidisciplinary)</subject><ispartof>Scientific reports, 2020-10, Vol.10 (1), p.18600-18600, Article 18600</ispartof><rights>The Author(s) 2020</rights><rights>The Author(s) 2020. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c499t-451eefba3ff6503bbf04db5f5e8e823f41871a52d78ae6eba88d17608acca323</citedby><cites>FETCH-LOGICAL-c499t-451eefba3ff6503bbf04db5f5e8e823f41871a52d78ae6eba88d17608acca323</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7596089/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7596089/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,864,885,27924,27925,41120,42189,51576,53791,53793</link.rule.ids></links><search><creatorcontrib>Ahmed, Tanbir</creatorcontrib><creatorcontrib>Aziz, Md Momin Al</creatorcontrib><creatorcontrib>Mohammed, Noman</creatorcontrib><title>De-identification of electronic health record using neural network</title><title>Scientific reports</title><addtitle>Sci Rep</addtitle><description>According to a recent study, around 99% of hospitals across the US now use electronic health record systems (EHRs). One of the most common types of EHR is the unstructured textual data, and unlocking hidden details from this data is critical for improving current medical practices and research endeavors. However, these textual data contain sensitive information, which could compromise our privacy. Therefore, medical textual data cannot be released publicly without undergoing any privacy-protective measures. De-identification is a process of detecting and removing all sensitive information present in EHRs, and it is a necessary step towards privacy-preserving EHR data sharing. Over the last decade, there have been several proposals to de-identify textual data using manual, rule-based, and machine learning methods. In this article, we propose new methods to de-identify textual data based on the self-attention mechanism and stacked Recurrent Neural Network. To the best of our knowledge, we are the first to employ these techniques. Experimental results on three different datasets show that our model performs better than all state-of-the-art mechanism irrespective of the dataset. Additionally, our proposed method is significantly faster than the existing techniques. Finally, we introduced three utility metrics to judge the quality of the de-identified data.</description><subject>639/166/985</subject><subject>692/700/1538</subject><subject>Electronic medical records</subject><subject>Humanities and Social Sciences</subject><subject>Learning algorithms</subject><subject>Machine learning</subject><subject>multidisciplinary</subject><subject>Neural networks</subject><subject>Privacy</subject><subject>Science</subject><subject>Science (multidisciplinary)</subject><issn>2045-2322</issn><issn>2045-2322</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>C6C</sourceid><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GNUQQ</sourceid><recordid>eNp9kUtLxDAUhYMoKqN_wFXBjZtqnm26EXwrDLiZfUjTm5loJxmTVvHfGx3xtTCbG8h3DufmIHRA8DHBTJ4kTkQjS0xxWQvBeUk20C7FXJSUUbr5476D9lN6wPkI2nDSbKMdxgilNRO76PwSSteBH5x1Rg8u-CLYAnowQwzemWIBuh8WRQQTYleMyfl54WGMus9jeAnxcQ9tWd0n2P-cEzS7vppd3JbT-5u7i7NpaXjTDCUXBMC2mllbCcza1mLetcIKkCAps5zImmhBu1pqqKDVUnakrrDUxmhG2QSdrm1XY7uEzuTMOYRaRbfU8VUF7dTvF-8Wah6eVS2a7NJkg6NPgxieRkiDWrpkoO-1hzAmRbmoOKkYxRk9_IM-hDH6vF2mapK_u8F1puiaMjGkFMF-hSFYvZek1iWpXJL6KEmRLGJrUcqwn0P8tv5H9QaXvpQG</recordid><startdate>20201029</startdate><enddate>20201029</enddate><creator>Ahmed, Tanbir</creator><creator>Aziz, Md Momin Al</creator><creator>Mohammed, Noman</creator><general>Nature Publishing Group UK</general><general>Nature Publishing Group</general><scope>C6C</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7X7</scope><scope>7XB</scope><scope>88A</scope><scope>88E</scope><scope>88I</scope><scope>8FE</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>LK8</scope><scope>M0S</scope><scope>M1P</scope><scope>M2P</scope><scope>M7P</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20201029</creationdate><title>De-identification of electronic health record using neural network</title><author>Ahmed, Tanbir ; Aziz, Md Momin Al ; Mohammed, Noman</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c499t-451eefba3ff6503bbf04db5f5e8e823f41871a52d78ae6eba88d17608acca323</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>639/166/985</topic><topic>692/700/1538</topic><topic>Electronic medical records</topic><topic>Humanities and Social Sciences</topic><topic>Learning algorithms</topic><topic>Machine learning</topic><topic>multidisciplinary</topic><topic>Neural networks</topic><topic>Privacy</topic><topic>Science</topic><topic>Science (multidisciplinary)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ahmed, Tanbir</creatorcontrib><creatorcontrib>Aziz, Md Momin Al</creatorcontrib><creatorcontrib>Mohammed, Noman</creatorcontrib><collection>Springer Nature OA Free Journals</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Biology Database (Alumni Edition)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Science Database (Alumni Edition)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>ProQuest Biological Science Collection</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Science Database</collection><collection>Biological Science Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Scientific reports</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ahmed, Tanbir</au><au>Aziz, Md Momin Al</au><au>Mohammed, Noman</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>De-identification of electronic health record using neural network</atitle><jtitle>Scientific reports</jtitle><stitle>Sci Rep</stitle><date>2020-10-29</date><risdate>2020</risdate><volume>10</volume><issue>1</issue><spage>18600</spage><epage>18600</epage><pages>18600-18600</pages><artnum>18600</artnum><issn>2045-2322</issn><eissn>2045-2322</eissn><abstract>According to a recent study, around 99% of hospitals across the US now use electronic health record systems (EHRs). One of the most common types of EHR is the unstructured textual data, and unlocking hidden details from this data is critical for improving current medical practices and research endeavors. However, these textual data contain sensitive information, which could compromise our privacy. Therefore, medical textual data cannot be released publicly without undergoing any privacy-protective measures. De-identification is a process of detecting and removing all sensitive information present in EHRs, and it is a necessary step towards privacy-preserving EHR data sharing. Over the last decade, there have been several proposals to de-identify textual data using manual, rule-based, and machine learning methods. In this article, we propose new methods to de-identify textual data based on the self-attention mechanism and stacked Recurrent Neural Network. To the best of our knowledge, we are the first to employ these techniques. Experimental results on three different datasets show that our model performs better than all state-of-the-art mechanism irrespective of the dataset. Additionally, our proposed method is significantly faster than the existing techniques. Finally, we introduced three utility metrics to judge the quality of the de-identified data.</abstract><cop>London</cop><pub>Nature Publishing Group UK</pub><pmid>33122735</pmid><doi>10.1038/s41598-020-75544-1</doi><tpages>1</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2045-2322
ispartof Scientific reports, 2020-10, Vol.10 (1), p.18600-18600, Article 18600
issn 2045-2322
2045-2322
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_7596089
source DOAJ Directory of Open Access Journals; Springer Nature OA Free Journals; Nature Free; EZB-FREE-00999 freely available EZB journals; PubMed Central; Free Full-Text Journals in Chemistry
subjects 639/166/985
692/700/1538
Electronic medical records
Humanities and Social Sciences
Learning algorithms
Machine learning
multidisciplinary
Neural networks
Privacy
Science
Science (multidisciplinary)
title De-identification of electronic health record using neural network
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T01%3A28%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=De-identification%20of%20electronic%20health%20record%20using%20neural%20network&rft.jtitle=Scientific%20reports&rft.au=Ahmed,%20Tanbir&rft.date=2020-10-29&rft.volume=10&rft.issue=1&rft.spage=18600&rft.epage=18600&rft.pages=18600-18600&rft.artnum=18600&rft.issn=2045-2322&rft.eissn=2045-2322&rft_id=info:doi/10.1038/s41598-020-75544-1&rft_dat=%3Cproquest_pubme%3E2456416320%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2471544907&rft_id=info:pmid/33122735&rfr_iscdi=true