EmojiLM: Modeling the New Emoji Language
With the rapid development of the internet, online social media welcomes people with different backgrounds through its diverse content. The increasing usage of emoji becomes a noticeable trend thanks to emoji's rich information beyond cultural or linguistic borders. However, the current study o...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | With the rapid development of the internet, online social media welcomes
people with different backgrounds through its diverse content. The increasing
usage of emoji becomes a noticeable trend thanks to emoji's rich information
beyond cultural or linguistic borders. However, the current study on emojis is
limited to single emoji prediction and there are limited data resources
available for further study of the interesting linguistic phenomenon. To this
end, we synthesize a large text-emoji parallel corpus, Text2Emoji, from a large
language model. Based on the parallel corpus, we distill a sequence-to-sequence
model, EmojiLM, which is specialized in the text-emoji bidirectional
translation. Extensive experiments on public benchmarks and human evaluation
demonstrate that our proposed model outperforms strong baselines and the
parallel corpus benefits emoji-related downstream tasks. |
---|---|
DOI: | 10.48550/arxiv.2311.01751 |