Early author profiling on Twitter using profile features with multi-resolution

•The Early Author Profiling task is feasible in social media documents.•Very small amounts of textual evidence can be modeled from early stages.•The proposed meta-words capture discriminative information of target profiles.•The proposed multi-resolution representation boosted the early recognition....

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2020-02, Vol.140, p.112909, Article 112909
Hauptverfasser: López-Monroy, A. Pastor, González, Fabio A., Solorio, Thamar
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•The Early Author Profiling task is feasible in social media documents.•Very small amounts of textual evidence can be modeled from early stages.•The proposed meta-words capture discriminative information of target profiles.•The proposed multi-resolution representation boosted the early recognition. The Author Profiling (AP) task aims to predict demographic characteristics about the authors from documents (e.g., age, gender, native language). The research so far has focused only on forensic scenarios by performing post-analysis using all the available text evidence. This paper introduces the task of Early Author Profiling (EAP) in Twitter. The goal is to effectively recognize profiles using as few tweets as possible from the user history. The task is highly relevant to support social media analysis and different problems related to security and marketing, where prevention and anticipation is crucial. This work proposes a novel strategy that combines a state of the art representation for early text classification and specialized word-vectors for author profiling tasks. In this strategy we build prototypical features called Profile based Meta-Words, which allow us to model AP information at different levels of granularity. Our evaluation shows that the proposed methodology is well suited for profiling little text evidence (e.g., a handful of tweets) in early stages, but as more tweets become available other granularities better encode larger amounts of text in late stages. We evaluated the proposed ideas on gender and language variety identification for English and Spanish, and showed that the proposal outperforms state of the art methodologies.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2019.112909