PrivGenDB: Efficient and privacy-preserving query executions over encrypted SNP-Phenotype database

Privacy and security issues limit the query executions over genomics datasets, notably single nucleotide polymorphisms (SNPs), raised by the sensitivity of this type of data. Therefore, it is important to ensure that executing queries on these datasets do not reveal sensitive information, such as th...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Informatics in medicine unlocked 2022, Vol.31, p.100988, Article 100988
Hauptverfasser: Jafarbeiki, Sara, Sakzad, Amin, Kasra Kermanshahi, Shabnam, Gaire, Raj, Steinfeld, Ron, Lai, Shangqi, Abraham, Gad, Thapa, Chandra
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Privacy and security issues limit the query executions over genomics datasets, notably single nucleotide polymorphisms (SNPs), raised by the sensitivity of this type of data. Therefore, it is important to ensure that executing queries on these datasets do not reveal sensitive information, such as the identity of the individuals and their genetic traits, to a data server. In this paper, we propose and present a novel model, we call PrivGenDB, to ensure the confidentiality of SNP-phenotype data while executing queries. The confidentiality in PrivGenDB is enabled by its system architecture and the search functionality provided by searchable symmetric encryption (SSE). To the best of our knowledge, PrivGenDB construction is the first SSE-based approach ensuring the confidentiality of SNP-phenotype data as the current SSE-based approaches for genomic data are limited only to substring search and range queries on a sequence of genomic data. Besides, a new data encoding mechanism is proposed and incorporated in the PrivGenDB model. This enables PrivGenDB to handle the dataset containing both genotype and phenotype and also support storing and managing other metadata, like gender and ethnicity, privately. Furthermore, different queries, namely Count, Boolean, Negation and k′-out-of-k match queries used for genomic data analysis, are supported and executed by PrivGenDB. The execution of these queries on genomic data in PrivGenDB is efficient and scalable for biomedical research and services. These are demonstrated by our analytical and empirical analysis presented in this paper. Specifically, our empirical studies on a dataset with 5,000 entries (records) containing 1,000 SNPs demonstrate that a count/Boolean query and a k′-out-of-k match query over 40 SNPs take approximately 4.3s and 86.4μs, respectively, outperforming the existing schemes.
ISSN:2352-9148
2352-9148
DOI:10.1016/j.imu.2022.100988