Can Open-source LLMs Enhance Data Synthesis for Toxic Detection?: An Experimental Study
Effective toxic content detection relies heavily on high-quality and diverse data, which serves as the foundation for robust content moderation models. This study explores the potential of open-source LLMs for harmful data synthesis, utilizing prompt engineering and fine-tuning techniques to enhance...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Effective toxic content detection relies heavily on high-quality and diverse
data, which serves as the foundation for robust content moderation models. This
study explores the potential of open-source LLMs for harmful data synthesis,
utilizing prompt engineering and fine-tuning techniques to enhance data quality
and diversity. In a two-stage evaluation, we first examine the capabilities of
six open-source LLMs in generating harmful data across multiple datasets using
prompt engineering. In the second stage, we fine-tune these models to improve
data generation while addressing challenges such as hallucination, data
duplication, and overfitting. Our findings reveal that Mistral excels in
generating high-quality and diverse harmful data with minimal hallucination.
Furthermore, fine-tuning enhances data quality, offering scalable and
cost-effective solutions for augmenting datasets for specific toxic content
detection tasks. These results emphasize the significance of data synthesis in
building robust, standalone detection models and highlight the potential of
open-source LLMs to advance smaller downstream content moderation systems. We
implemented this approach in real-world industrial settings, demonstrating the
feasibility and efficiency of fine-tuned open-source LLMs for harmful data
synthesis. |
---|---|
DOI: | 10.48550/arxiv.2411.15175 |