$Repr\'esentations lexicales pour la d\'etection non supervis\'ee d'\'ev\'enements dans un flux de tweets : \'etude sur des corpus fran\c{c}ais et anglais$

Repr\'esentations lexicales pour la d\'etection non supervis\'ee d'\'ev\'enements dans un flux de tweets : \'etude sur des corpus fran\c{c}ais et anglais

In this work, we evaluate the performance of recent text embeddings for the automatic detection of events in a stream of tweets. We model this task as a dynamic clustering problem.Our experiments are conducted on a publicly available corpus of tweets in English and on a similar dataset in French ann...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Mazoyer, Béatrice, Hervé, Nicolas, Hudelot, Céline, Cage, Julia
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Information Retrieval Computer Science - Social and Information Networks
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Mazoyer, Béatrice Hervé, Nicolas Hudelot, Céline Cage, Julia
description	In this work, we evaluate the performance of recent text embeddings for the automatic detection of events in a stream of tweets. We model this task as a dynamic clustering problem.Our experiments are conducted on a publicly available corpus of tweets in English and on a similar dataset in French annotated by our team. We show that recent techniques based on deep neural networks (ELMo, Universal Sentence Encoder, BERT, SBERT), although promising on many applications, are not very suitable for this task. We also experiment with different types of fine-tuning to improve these results on French data. Finally, we propose a detailed analysis of the results obtained, showing the superiority of tf-idf approaches for this task.
doi_str_mv	10.48550/arxiv.2001.04139
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2001_04139</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2001_04139</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2001_041393</originalsourceid><addsrcrecordid>eNqFj0FrAjEQhXPpQWp_gKfOzZNrtiqoV7H0LD0Ky5DMlkDMhkyyXSn9If23nRXvHh5vePP44Ck1q3W13m42eolpcH31pnVd6XW92k3U34liOs-JKWTMrgsMngZn0BND7EoCj2ClkMmMbwgiLpFS71hiAjsX60WBLgJhsCiQEqD1ZQBLkL-JJN7DSCkSsFCt4E2XYmFoE4az-TG_6BgoA4YvL-dUPbXomV7u_qxe34-fh4_FbUQTk7tgujbjmOY2ZvW48Q8ALlcu</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Repr\'esentations lexicales pour la d\'etection non supervis\'ee d'\'ev\'enements dans un flux de tweets : \'etude sur des corpus fran\c{c}ais et anglais</title><source>arXiv.org</source><creator>Mazoyer, Béatrice ; Hervé, Nicolas ; Hudelot, Céline ; Cage, Julia</creator><creatorcontrib>Mazoyer, Béatrice ; Hervé, Nicolas ; Hudelot, Céline ; Cage, Julia</creatorcontrib><description>In this work, we evaluate the performance of recent text embeddings for the automatic detection of events in a stream of tweets. We model this task as a dynamic clustering problem.Our experiments are conducted on a publicly available corpus of tweets in English and on a similar dataset in French annotated by our team. We show that recent techniques based on deep neural networks (ELMo, Universal Sentence Encoder, BERT, SBERT), although promising on many applications, are not very suitable for this task. We also experiment with different types of fine-tuning to improve these results on French data. Finally, we propose a detailed analysis of the results obtained, showing the superiority of tf-idf approaches for this task.</description><identifier>DOI: 10.48550/arxiv.2001.04139</identifier><language>eng</language><subject>Computer Science - Information Retrieval ; Computer Science - Social and Information Networks</subject><creationdate>2020-01</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2001.04139$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2001.04139$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Mazoyer, Béatrice</creatorcontrib><creatorcontrib>Hervé, Nicolas</creatorcontrib><creatorcontrib>Hudelot, Céline</creatorcontrib><creatorcontrib>Cage, Julia</creatorcontrib><title>Repr\'esentations lexicales pour la d\'etection non supervis\'ee d'\'ev\'enements dans un flux de tweets : \'etude sur des corpus fran\c{c}ais et anglais</title><description>In this work, we evaluate the performance of recent text embeddings for the automatic detection of events in a stream of tweets. We model this task as a dynamic clustering problem.Our experiments are conducted on a publicly available corpus of tweets in English and on a similar dataset in French annotated by our team. We show that recent techniques based on deep neural networks (ELMo, Universal Sentence Encoder, BERT, SBERT), although promising on many applications, are not very suitable for this task. We also experiment with different types of fine-tuning to improve these results on French data. Finally, we propose a detailed analysis of the results obtained, showing the superiority of tf-idf approaches for this task.</description><subject>Computer Science - Information Retrieval</subject><subject>Computer Science - Social and Information Networks</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFj0FrAjEQhXPpQWp_gKfOzZNrtiqoV7H0LD0Ky5DMlkDMhkyyXSn9If23nRXvHh5vePP44Ck1q3W13m42eolpcH31pnVd6XW92k3U34liOs-JKWTMrgsMngZn0BND7EoCj2ClkMmMbwgiLpFS71hiAjsX60WBLgJhsCiQEqD1ZQBLkL-JJN7DSCkSsFCt4E2XYmFoE4az-TG_6BgoA4YvL-dUPbXomV7u_qxe34-fh4_FbUQTk7tgujbjmOY2ZvW48Q8ALlcu</recordid><startdate>20200113</startdate><enddate>20200113</enddate><creator>Mazoyer, Béatrice</creator><creator>Hervé, Nicolas</creator><creator>Hudelot, Céline</creator><creator>Cage, Julia</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20200113</creationdate><title>Repr\'esentations lexicales pour la d\'etection non supervis\'ee d'\'ev\'enements dans un flux de tweets : \'etude sur des corpus fran\c{c}ais et anglais</title><author>Mazoyer, Béatrice ; Hervé, Nicolas ; Hudelot, Céline ; Cage, Julia</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2001_041393</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Information Retrieval</topic><topic>Computer Science - Social and Information Networks</topic><toplevel>online_resources</toplevel><creatorcontrib>Mazoyer, Béatrice</creatorcontrib><creatorcontrib>Hervé, Nicolas</creatorcontrib><creatorcontrib>Hudelot, Céline</creatorcontrib><creatorcontrib>Cage, Julia</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Mazoyer, Béatrice</au><au>Hervé, Nicolas</au><au>Hudelot, Céline</au><au>Cage, Julia</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Repr\'esentations lexicales pour la d\'etection non supervis\'ee d'\'ev\'enements dans un flux de tweets : \'etude sur des corpus fran\c{c}ais et anglais</atitle><date>2020-01-13</date><risdate>2020</risdate><abstract>In this work, we evaluate the performance of recent text embeddings for the automatic detection of events in a stream of tweets. We model this task as a dynamic clustering problem.Our experiments are conducted on a publicly available corpus of tweets in English and on a similar dataset in French annotated by our team. We show that recent techniques based on deep neural networks (ELMo, Universal Sentence Encoder, BERT, SBERT), although promising on many applications, are not very suitable for this task. We also experiment with different types of fine-tuning to improve these results on French data. Finally, we propose a detailed analysis of the results obtained, showing the superiority of tf-idf approaches for this task.</abstract><doi>10.48550/arxiv.2001.04139</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2001.04139
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2001_04139
source	arXiv.org
subjects	Computer Science - Information Retrieval Computer Science - Social and Information Networks
title	Repr\'esentations lexicales pour la d\'etection non supervis\'ee d'\'ev\'enements dans un flux de tweets : \'etude sur des corpus fran\c{c}ais et anglais
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T21%3A18%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Repr%5C'esentations%20lexicales%20pour%20la%20d%5C'etection%20non%20supervis%5C'ee%20d'%5C'ev%5C'enements%20dans%20un%20flux%20de%20tweets%20:%20%5C'etude%20sur%20des%20corpus%20fran%5Cc%7Bc%7Dais%20et%20anglais&rft.au=Mazoyer,%20B%C3%A9atrice&rft.date=2020-01-13&rft_id=info:doi/10.48550/arxiv.2001.04139&rft_dat=%3Carxiv_GOX%3E2001_04139%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true