Jargon: A Suite of Language Models and Evaluation Tasks for French Specialized Domains: Data

This repository contains the data for the following paper: Vincent Segonne, Aidan Mannion, Laura Cristina Alonzo Canul, Alexandre Daniel Audibert, Xingyu Liu, Cécile Macaire, Adrien Pupier, Yongxin Zhou, Mathilde Aguiar, Felix E. Herron, Magali Norré, Massih R Amini, Pierrette Bouillon, Iris Eshkol-...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Segonne, Vincent, Mannion, Aidan, Alonzo Canul, Laura Cristina, Audibert, Alexandre, Liu, Xingyu, Macaire, Cécile, Pupier, Adrien, Zhou, Yongxin, Aguiar, Mathilde, Herron, Felix, Norré, Magali, Amini, Massih-Reza, Bouillon, Pierrette, Eshkol-Taravella, Iris, Esperança-Rodier, Emmanuelle, François, Thomas, Goeuriot, Lorraine, Goulian, Jérôme, Lafourcade, Mathieu, Lecouteux, Benjamin, Portet, François, Ringeval, Fabien, Vandeghinste, Vincent, Coavoux, Maximin, Dinarelli, Marco, Schwab, Didier
Format: Dataset
Sprache:eng
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This repository contains the data for the following paper: Vincent Segonne, Aidan Mannion, Laura Cristina Alonzo Canul, Alexandre Daniel Audibert, Xingyu Liu, Cécile Macaire, Adrien Pupier, Yongxin Zhou, Mathilde Aguiar, Felix E. Herron, Magali Norré, Massih R Amini, Pierrette Bouillon, Iris Eshkol-Taravella, Emmanuelle Esperança-Rodier, Thomas François, Lorraine Goeuriot, Jérôme Goulian, Mathieu Lafourcade, et al.. 2024. Jargon: A Suite of Language Models and Evaluation Tasks for French Specialized Domains. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 9463–9476, Torino, Italia. ELRA and ICCL. 1) pretraining data for the Jargon specialized language models 2) ECTHR_FR dataset for text classification in the French legal domain  
DOI:10.5281/zenodo.10865546