Repr\'esentations lexicales pour la d\'etection non supervis\'ee d'\'ev\'enements dans un flux de tweets : \'etude sur des corpus fran\c{c}ais et anglais
In this work, we evaluate the performance of recent text embeddings for the automatic detection of events in a stream of tweets. We model this task as a dynamic clustering problem.Our experiments are conducted on a publicly available corpus of tweets in English and on a similar dataset in French ann...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this work, we evaluate the performance of recent text embeddings for the
automatic detection of events in a stream of tweets. We model this task as a
dynamic clustering problem.Our experiments are conducted on a publicly
available corpus of tweets in English and on a similar dataset in French
annotated by our team. We show that recent techniques based on deep neural
networks (ELMo, Universal Sentence Encoder, BERT, SBERT), although promising on
many applications, are not very suitable for this task. We also experiment with
different types of fine-tuning to improve these results on French data.
Finally, we propose a detailed analysis of the results obtained, showing the
superiority of tf-idf approaches for this task. |
---|---|
DOI: | 10.48550/arxiv.2001.04139 |