GiusBERTo: A Legal Language Model for Personal Data De-identification in Italian Court of Auditors Decisions
Recent advances in Natural Language Processing have demonstrated the effectiveness of pretrained language models like BERT for a variety of downstream tasks. We present GiusBERTo, the first BERT-based model specialized for anonymizing personal data in Italian legal documents. GiusBERTo is trained on...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recent advances in Natural Language Processing have demonstrated the
effectiveness of pretrained language models like BERT for a variety of
downstream tasks. We present GiusBERTo, the first BERT-based model specialized
for anonymizing personal data in Italian legal documents. GiusBERTo is trained
on a large dataset of Court of Auditors decisions to recognize entities to
anonymize, including names, dates, locations, while retaining contextual
relevance. We evaluate GiusBERTo on a held-out test set and achieve 97%
token-level accuracy. GiusBERTo provides the Italian legal community with an
accurate and tailored BERT model for de-identification, balancing privacy and
data protection. |
---|---|
DOI: | 10.48550/arxiv.2406.15032 |