A SMS-like language analyzer for Spanish
The usage of specific language codes and chat and SMS-like messages is a major trend in electronic communications. This fact makes Natural Language Processing quite hard, even at the simplest step for text message tokenization, due to the widespread usage of non-alphanumeric symbols, frequent typos...
Gespeichert in:
Veröffentlicht in: | Linguamática (Braga, Portugal) Portugal), 2013-07, Vol.5 (1), p.31-39 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | spa |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The usage of specific language codes and chat and SMS-like messages is a major trend in electronic communications. This fact makes Natural Language Processing quite hard, even at the simplest step for text message tokenization, due to the widespread usage of non-alphanumeric symbols, frequent typos and non-standard word separators. In this work we present a new approach for text message tokenization, specific for the Spanish language as used in Social Networks and in electronic communications. Our system has been integrated in a more general application for age-detection in Social Networks developed in the research and development project WENDY, and it has been quantitatively evaluated both in a direct fashion, and indirectly by its impact on the general age-detection application, showing very promising results. Adapted from the source document |
---|---|
ISSN: | 1647-0818 1647-0818 |