An Application Programming Interface (API) Sensitive Data Identification Method Based on the Federated Large Language Model
The traditional methods for identifying sensitive data in APIs mainly encompass rule-based and machine learning-based approaches. However, these methods suffer from inadequacies in terms of security and robustness, exhibit high false positive rates, and struggle to cope with evolving threat landscap...
Gespeichert in:
Veröffentlicht in: | Applied sciences 2024-11, Vol.14 (22), p.10162 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The traditional methods for identifying sensitive data in APIs mainly encompass rule-based and machine learning-based approaches. However, these methods suffer from inadequacies in terms of security and robustness, exhibit high false positive rates, and struggle to cope with evolving threat landscapes. This paper proposes a method for detecting sensitive data in APIs based on the Federated Large Language Model (FedAPILLM). This method applies the large language model Qwen2.5 and the LoRA instruction tuning technique within the framework of federated learning (FL) to the field of data security. Under the premise of protecting data privacy, a domain-specific corpus and knowledge base are constructed for pre-training and fine-tuning, resulting in a large language model specifically designed for identifying sensitive data in APIs. This paper conducts comparative experiments involving Llama3 8B, Llama3.1 8B, and Qwen2.5 14B. The results demonstrate that Qwen2.5 14B can achieve similar or better performance levels compared to the Llama3.1 8B model with fewer training iterations. |
---|---|
ISSN: | 2076-3417 2076-3417 |
DOI: | 10.3390/app142210162 |