Pragmatic De-Identification of Cross-Domain Unstructured Documents: A Utility-Preserving Approach with Relation Extraction Filtering

The volume of information, and in particular personal information, generated each day is increasing at a staggering rate. The ability to leverage such information depends greatly on being able to satisfy the many compliance and privacy regulations that are appearing all over the world. We present RE...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	AMIA Summits on Translational Science proceedings 2024, Vol.2024, p.85-94
Hauptverfasser:	Nedoshivina, Liubov, Halimi, Anisa, Bettencourt-Silva, Joao, Braghin, Stefano
Format:	Artikel
Sprache:	eng
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	94
container_issue
container_start_page	85
container_title	AMIA Summits on Translational Science proceedings
container_volume	2024
creator	Nedoshivina, Liubov Halimi, Anisa Bettencourt-Silva, Joao Braghin, Stefano
description	The volume of information, and in particular personal information, generated each day is increasing at a staggering rate. The ability to leverage such information depends greatly on being able to satisfy the many compliance and privacy regulations that are appearing all over the world. We present READI, a utility preserving framework for the unstructured document de-identification. READI leverages Named Entity Recognition and Relation Extraction technology to improve the quality of the entity detection, thus improving the overall quality of the data de-identification process. In this proof of concept study, we evaluate the proposed approach on two different datasets and compare with the existing state-of-the-art approaches. We show that Relation Extraction-based Approach for De-Identification (READI) notably reduces the number of false positives and improves the utility of the de-identified text.
format	Article
fullrecord	<record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_proquest_miscellaneous_3064140017</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3064140017</sourcerecordid><originalsourceid>FETCH-LOGICAL-p569-8a35780f8bbca7ee382ec14818a8695f23c161dff3a0fa502fbdccf9e67a42613</originalsourceid><addsrcrecordid>eNo1kEtrwzAQhE2hNCHNXyg69mKQLFmWezN5tIFAQ0nORpZXiYpfleS2ufeHVzTpXHYZPobduYmmCUlpzDCnk2ju3DsOYoznKbuLJlSIJMM8n0Y_OyuPrfRGoSXEmxo6b7RRweg71Gu0sL1z8bJvpenQoXPejsqPFmq07NXYBtw9oQIdvGmMP8c7Cw7sp-mOqBgG20t1Ql_Gn9AbNJfM1be3Uv2ta9N4sIG9j261bBzMr3MW7der_eIl3r4-bxbFNh5SnsdC0jQTWIuqUjIDoCIBRZggQorwl06oIpzUWlOJtUxxoqtaKZ0DzyRLOKGz6PESGw77GMH5sjVOQdPIDvrRlRRzRhjGJAvowxUdqxbqcrCmlfZc_hdHfwFmBGxd</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3064140017</pqid></control><display><type>article</type><title>Pragmatic De-Identification of Cross-Domain Unstructured Documents: A Utility-Preserving Approach with Relation Extraction Filtering</title><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central</source><creator>Nedoshivina, Liubov ; Halimi, Anisa ; Bettencourt-Silva, Joao ; Braghin, Stefano</creator><creatorcontrib>Nedoshivina, Liubov ; Halimi, Anisa ; Bettencourt-Silva, Joao ; Braghin, Stefano</creatorcontrib><description>The volume of information, and in particular personal information, generated each day is increasing at a staggering rate. The ability to leverage such information depends greatly on being able to satisfy the many compliance and privacy regulations that are appearing all over the world. We present READI, a utility preserving framework for the unstructured document de-identification. READI leverages Named Entity Recognition and Relation Extraction technology to improve the quality of the entity detection, thus improving the overall quality of the data de-identification process. In this proof of concept study, we evaluate the proposed approach on two different datasets and compare with the existing state-of-the-art approaches. We show that Relation Extraction-based Approach for De-Identification (READI) notably reduces the number of false positives and improves the utility of the de-identified text.</description><identifier>EISSN: 2153-4063</identifier><identifier>PMID: 38827069</identifier><language>eng</language><publisher>United States</publisher><ispartof>AMIA Summits on Translational Science proceedings, 2024, Vol.2024, p.85-94</ispartof><rights>2024 AMIA - All rights reserved.</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,4010</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38827069$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Nedoshivina, Liubov</creatorcontrib><creatorcontrib>Halimi, Anisa</creatorcontrib><creatorcontrib>Bettencourt-Silva, Joao</creatorcontrib><creatorcontrib>Braghin, Stefano</creatorcontrib><title>Pragmatic De-Identification of Cross-Domain Unstructured Documents: A Utility-Preserving Approach with Relation Extraction Filtering</title><title>AMIA Summits on Translational Science proceedings</title><addtitle>AMIA Jt Summits Transl Sci Proc</addtitle><description>The volume of information, and in particular personal information, generated each day is increasing at a staggering rate. The ability to leverage such information depends greatly on being able to satisfy the many compliance and privacy regulations that are appearing all over the world. We present READI, a utility preserving framework for the unstructured document de-identification. READI leverages Named Entity Recognition and Relation Extraction technology to improve the quality of the entity detection, thus improving the overall quality of the data de-identification process. In this proof of concept study, we evaluate the proposed approach on two different datasets and compare with the existing state-of-the-art approaches. We show that Relation Extraction-based Approach for De-Identification (READI) notably reduces the number of false positives and improves the utility of the de-identified text.</description><issn>2153-4063</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNo1kEtrwzAQhE2hNCHNXyg69mKQLFmWezN5tIFAQ0nORpZXiYpfleS2ufeHVzTpXHYZPobduYmmCUlpzDCnk2ju3DsOYoznKbuLJlSIJMM8n0Y_OyuPrfRGoSXEmxo6b7RRweg71Gu0sL1z8bJvpenQoXPejsqPFmq07NXYBtw9oQIdvGmMP8c7Cw7sp-mOqBgG20t1Ql_Gn9AbNJfM1be3Uv2ta9N4sIG9j261bBzMr3MW7der_eIl3r4-bxbFNh5SnsdC0jQTWIuqUjIDoCIBRZggQorwl06oIpzUWlOJtUxxoqtaKZ0DzyRLOKGz6PESGw77GMH5sjVOQdPIDvrRlRRzRhjGJAvowxUdqxbqcrCmlfZc_hdHfwFmBGxd</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Nedoshivina, Liubov</creator><creator>Halimi, Anisa</creator><creator>Bettencourt-Silva, Joao</creator><creator>Braghin, Stefano</creator><scope>NPM</scope><scope>7X8</scope></search><sort><creationdate>2024</creationdate><title>Pragmatic De-Identification of Cross-Domain Unstructured Documents: A Utility-Preserving Approach with Relation Extraction Filtering</title><author>Nedoshivina, Liubov ; Halimi, Anisa ; Bettencourt-Silva, Joao ; Braghin, Stefano</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-p569-8a35780f8bbca7ee382ec14818a8695f23c161dff3a0fa502fbdccf9e67a42613</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><toplevel>online_resources</toplevel><creatorcontrib>Nedoshivina, Liubov</creatorcontrib><creatorcontrib>Halimi, Anisa</creatorcontrib><creatorcontrib>Bettencourt-Silva, Joao</creatorcontrib><creatorcontrib>Braghin, Stefano</creatorcontrib><collection>PubMed</collection><collection>MEDLINE - Academic</collection><jtitle>AMIA Summits on Translational Science proceedings</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nedoshivina, Liubov</au><au>Halimi, Anisa</au><au>Bettencourt-Silva, Joao</au><au>Braghin, Stefano</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Pragmatic De-Identification of Cross-Domain Unstructured Documents: A Utility-Preserving Approach with Relation Extraction Filtering</atitle><jtitle>AMIA Summits on Translational Science proceedings</jtitle><addtitle>AMIA Jt Summits Transl Sci Proc</addtitle><date>2024</date><risdate>2024</risdate><volume>2024</volume><spage>85</spage><epage>94</epage><pages>85-94</pages><eissn>2153-4063</eissn><abstract>The volume of information, and in particular personal information, generated each day is increasing at a staggering rate. The ability to leverage such information depends greatly on being able to satisfy the many compliance and privacy regulations that are appearing all over the world. We present READI, a utility preserving framework for the unstructured document de-identification. READI leverages Named Entity Recognition and Relation Extraction technology to improve the quality of the entity detection, thus improving the overall quality of the data de-identification process. In this proof of concept study, we evaluate the proposed approach on two different datasets and compare with the existing state-of-the-art approaches. We show that Relation Extraction-based Approach for De-Identification (READI) notably reduces the number of false positives and improves the utility of the de-identified text.</abstract><cop>United States</cop><pmid>38827069</pmid><tpages>10</tpages></addata></record>
fulltext	fulltext
identifier	EISSN: 2153-4063
ispartof	AMIA Summits on Translational Science proceedings, 2024, Vol.2024, p.85-94
issn	2153-4063
language	eng
recordid	cdi_proquest_miscellaneous_3064140017
source	Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central
title	Pragmatic De-Identification of Cross-Domain Unstructured Documents: A Utility-Preserving Approach with Relation Extraction Filtering
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-13T01%3A23%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Pragmatic%20De-Identification%20of%20Cross-Domain%20Unstructured%20Documents:%20A%20Utility-Preserving%20Approach%20with%20Relation%20Extraction%20Filtering&rft.jtitle=AMIA%20Summits%20on%20Translational%20Science%20proceedings&rft.au=Nedoshivina,%20Liubov&rft.date=2024&rft.volume=2024&rft.spage=85&rft.epage=94&rft.pages=85-94&rft.eissn=2153-4063&rft_id=info:doi/&rft_dat=%3Cproquest_pubme%3E3064140017%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3064140017&rft_id=info:pmid/38827069&rfr_iscdi=true