Repr\'esentations lexicales pour la d\'etection non supervis\'ee d'\'ev\'enements dans un flux de tweets : \'etude sur des corpus fran\c{c}ais et anglais
In this work, we evaluate the performance of recent text embeddings for the automatic detection of events in a stream of tweets. We model this task as a dynamic clustering problem.Our experiments are conducted on a publicly available corpus of tweets in English and on a similar dataset in French ann...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Mazoyer, Béatrice Hervé, Nicolas Hudelot, Céline Cage, Julia |
description | In this work, we evaluate the performance of recent text embeddings for the
automatic detection of events in a stream of tweets. We model this task as a
dynamic clustering problem.Our experiments are conducted on a publicly
available corpus of tweets in English and on a similar dataset in French
annotated by our team. We show that recent techniques based on deep neural
networks (ELMo, Universal Sentence Encoder, BERT, SBERT), although promising on
many applications, are not very suitable for this task. We also experiment with
different types of fine-tuning to improve these results on French data.
Finally, we propose a detailed analysis of the results obtained, showing the
superiority of tf-idf approaches for this task. |
doi_str_mv | 10.48550/arxiv.2001.04139 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2001_04139</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2001_04139</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2001_041393</originalsourceid><addsrcrecordid>eNqFj0FrAjEQhXPpQWp_gKfOzZNrtiqoV7H0LD0Ky5DMlkDMhkyyXSn9If23nRXvHh5vePP44Ck1q3W13m42eolpcH31pnVd6XW92k3U34liOs-JKWTMrgsMngZn0BND7EoCj2ClkMmMbwgiLpFS71hiAjsX60WBLgJhsCiQEqD1ZQBLkL-JJN7DSCkSsFCt4E2XYmFoE4az-TG_6BgoA4YvL-dUPbXomV7u_qxe34-fh4_FbUQTk7tgujbjmOY2ZvW48Q8ALlcu</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Repr\'esentations lexicales pour la d\'etection non supervis\'ee d'\'ev\'enements dans un flux de tweets : \'etude sur des corpus fran\c{c}ais et anglais</title><source>arXiv.org</source><creator>Mazoyer, Béatrice ; Hervé, Nicolas ; Hudelot, Céline ; Cage, Julia</creator><creatorcontrib>Mazoyer, Béatrice ; Hervé, Nicolas ; Hudelot, Céline ; Cage, Julia</creatorcontrib><description>In this work, we evaluate the performance of recent text embeddings for the
automatic detection of events in a stream of tweets. We model this task as a
dynamic clustering problem.Our experiments are conducted on a publicly
available corpus of tweets in English and on a similar dataset in French
annotated by our team. We show that recent techniques based on deep neural
networks (ELMo, Universal Sentence Encoder, BERT, SBERT), although promising on
many applications, are not very suitable for this task. We also experiment with
different types of fine-tuning to improve these results on French data.
Finally, we propose a detailed analysis of the results obtained, showing the
superiority of tf-idf approaches for this task.</description><identifier>DOI: 10.48550/arxiv.2001.04139</identifier><language>eng</language><subject>Computer Science - Information Retrieval ; Computer Science - Social and Information Networks</subject><creationdate>2020-01</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2001.04139$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2001.04139$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Mazoyer, Béatrice</creatorcontrib><creatorcontrib>Hervé, Nicolas</creatorcontrib><creatorcontrib>Hudelot, Céline</creatorcontrib><creatorcontrib>Cage, Julia</creatorcontrib><title>Repr\'esentations lexicales pour la d\'etection non supervis\'ee d'\'ev\'enements dans un flux de tweets : \'etude sur des corpus fran\c{c}ais et anglais</title><description>In this work, we evaluate the performance of recent text embeddings for the
automatic detection of events in a stream of tweets. We model this task as a
dynamic clustering problem.Our experiments are conducted on a publicly
available corpus of tweets in English and on a similar dataset in French
annotated by our team. We show that recent techniques based on deep neural
networks (ELMo, Universal Sentence Encoder, BERT, SBERT), although promising on
many applications, are not very suitable for this task. We also experiment with
different types of fine-tuning to improve these results on French data.
Finally, we propose a detailed analysis of the results obtained, showing the
superiority of tf-idf approaches for this task.</description><subject>Computer Science - Information Retrieval</subject><subject>Computer Science - Social and Information Networks</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFj0FrAjEQhXPpQWp_gKfOzZNrtiqoV7H0LD0Ky5DMlkDMhkyyXSn9If23nRXvHh5vePP44Ck1q3W13m42eolpcH31pnVd6XW92k3U34liOs-JKWTMrgsMngZn0BND7EoCj2ClkMmMbwgiLpFS71hiAjsX60WBLgJhsCiQEqD1ZQBLkL-JJN7DSCkSsFCt4E2XYmFoE4az-TG_6BgoA4YvL-dUPbXomV7u_qxe34-fh4_FbUQTk7tgujbjmOY2ZvW48Q8ALlcu</recordid><startdate>20200113</startdate><enddate>20200113</enddate><creator>Mazoyer, Béatrice</creator><creator>Hervé, Nicolas</creator><creator>Hudelot, Céline</creator><creator>Cage, Julia</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20200113</creationdate><title>Repr\'esentations lexicales pour la d\'etection non supervis\'ee d'\'ev\'enements dans un flux de tweets : \'etude sur des corpus fran\c{c}ais et anglais</title><author>Mazoyer, Béatrice ; Hervé, Nicolas ; Hudelot, Céline ; Cage, Julia</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2001_041393</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Information Retrieval</topic><topic>Computer Science - Social and Information Networks</topic><toplevel>online_resources</toplevel><creatorcontrib>Mazoyer, Béatrice</creatorcontrib><creatorcontrib>Hervé, Nicolas</creatorcontrib><creatorcontrib>Hudelot, Céline</creatorcontrib><creatorcontrib>Cage, Julia</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Mazoyer, Béatrice</au><au>Hervé, Nicolas</au><au>Hudelot, Céline</au><au>Cage, Julia</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Repr\'esentations lexicales pour la d\'etection non supervis\'ee d'\'ev\'enements dans un flux de tweets : \'etude sur des corpus fran\c{c}ais et anglais</atitle><date>2020-01-13</date><risdate>2020</risdate><abstract>In this work, we evaluate the performance of recent text embeddings for the
automatic detection of events in a stream of tweets. We model this task as a
dynamic clustering problem.Our experiments are conducted on a publicly
available corpus of tweets in English and on a similar dataset in French
annotated by our team. We show that recent techniques based on deep neural
networks (ELMo, Universal Sentence Encoder, BERT, SBERT), although promising on
many applications, are not very suitable for this task. We also experiment with
different types of fine-tuning to improve these results on French data.
Finally, we propose a detailed analysis of the results obtained, showing the
superiority of tf-idf approaches for this task.</abstract><doi>10.48550/arxiv.2001.04139</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2001.04139 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2001_04139 |
source | arXiv.org |
subjects | Computer Science - Information Retrieval Computer Science - Social and Information Networks |
title | Repr\'esentations lexicales pour la d\'etection non supervis\'ee d'\'ev\'enements dans un flux de tweets : \'etude sur des corpus fran\c{c}ais et anglais |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T21%3A18%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Repr%5C'esentations%20lexicales%20pour%20la%20d%5C'etection%20non%20supervis%5C'ee%20d'%5C'ev%5C'enements%20dans%20un%20flux%20de%20tweets%20:%20%5C'etude%20sur%20des%20corpus%20fran%5Cc%7Bc%7Dais%20et%20anglais&rft.au=Mazoyer,%20B%C3%A9atrice&rft.date=2020-01-13&rft_id=info:doi/10.48550/arxiv.2001.04139&rft_dat=%3Carxiv_GOX%3E2001_04139%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |