Optimization of Privacy Budget Allocation In Differential Privacy-Based Public Transit Trajectory Data Publishing for Smart Mobility Applications

Trajectory datasets have been widely used in transportation research, but the risk of privacy breach comes with data sharing. Privacy budget allocation is a key step of the differential privacy (DP)-based privacy-preserving data publishing (PPDP) algorithm development, as it directly impacts the dat...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on intelligent transportation systems 2023-12, Vol.24 (12), p.15158-15168
Hauptverfasser: Chen, Chenxi, Hu, Xianbiao, Li, Yang, Tang, Qing
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Trajectory datasets have been widely used in transportation research, but the risk of privacy breach comes with data sharing. Privacy budget allocation is a key step of the differential privacy (DP)-based privacy-preserving data publishing (PPDP) algorithm development, as it directly impacts the data utility of the released dataset. Most prior research used simple logic to allocate privacy budgets, such as evenly distributing them among different tree levels, without theoretical support to reach optimality. This manuscript presents the development of an optimal privacy budget allocation algorithm for transit smart card data, with the goal of publishing non-interactive sanitized trajectory data under a differential privacy definition. To this end, the smart card trajectory data are first stored in a prefix tree structure, and a query probability model is developed to quantitatively measure the probability of a trajectory location pair being queried. Next, the privacy budget is optimized for each prefix tree node to minimize the query error, while satisfying the differential privacy definition. The Lagrangian relaxation method is adopted to derive the optimal privacy budget values, and several propositions on the solution property are proposed and proved. Real-life metro smart card data from Shenzhen, China that include a total of 2.8 million individual travelers and over 220 million records are used in the case study section. The developed algorithm is demonstrated to output a sanitized dataset with the highest utilities when compared with three benchmark algorithms. Sensitivity analysis shows that the resulting data utility remains stable when the privacy budget changes. The runtime of the proposed algorithm is less than 160 seconds in all experiments, exhibiting good computational efficiency.
ISSN:1524-9050
1558-0016
DOI:10.1109/TITS.2023.3309783