An Application Programming Interface (API) Sensitive Data Identification Method Based on the Federated Large Language Model

The traditional methods for identifying sensitive data in APIs mainly encompass rule-based and machine learning-based approaches. However, these methods suffer from inadequacies in terms of security and robustness, exhibit high false positive rates, and struggle to cope with evolving threat landscap...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Applied sciences 2024-11, Vol.14 (22), p.10162
Hauptverfasser:	Wu, Jianping, Chen, Lifeng, Fang, Siyuan, Wu, Chunming
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Algorithms API Application programming interface Applications programming Data integrity Data security Deep learning Efficiency federated learning Generative artificial intelligence Human error Identification Language large language model Large language models Machine learning Methods Network security Privacy sensitive data identification
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The traditional methods for identifying sensitive data in APIs mainly encompass rule-based and machine learning-based approaches. However, these methods suffer from inadequacies in terms of security and robustness, exhibit high false positive rates, and struggle to cope with evolving threat landscapes. This paper proposes a method for detecting sensitive data in APIs based on the Federated Large Language Model (FedAPILLM). This method applies the large language model Qwen2.5 and the LoRA instruction tuning technique within the framework of federated learning (FL) to the field of data security. Under the premise of protecting data privacy, a domain-specific corpus and knowledge base are constructed for pre-training and fine-tuning, resulting in a large language model specifically designed for identifying sensitive data in APIs. This paper conducts comparative experiments involving Llama3 8B, Llama3.1 8B, and Qwen2.5 14B. The results demonstrate that Qwen2.5 14B can achieve similar or better performance levels compared to the Llama3.1 8B model with fewer training iterations.
ISSN:	2076-3417 2076-3417
DOI:	10.3390/app142210162