Identificación de ideología política mediante un modelo Transformer para estilometría y Clasificación por votos en Machine Learning
The main objective of this article is the determination of the ideological inclination of Twitter users in Ecuador. The collected data were obtained from the Twitter platform, these were stored in Datasets, processed and labeled to feed the classifier methods which trained to perform the prediction...
Gespeichert in:
Veröffentlicht in: | Polo del Conocimiento: Revista científico - profesional 2022, Vol.7 (9), p.1457-1474 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | spa |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The main objective of this article is the determination of the ideological inclination of Twitter
users in Ecuador. The collected data were obtained from the Twitter platform, these were stored
in Datasets, processed and labeled to feed the classifier methods which trained to perform the
prediction of political ideology through the use of Transformer and Voting Classifier models in
Machine Learning, Cross Validation will be used to enhance and evaluate during training
classifier models such as Logistic Regression, Random Forest, Decision Tree, Multilayer
Perceptron and Gradient Boosting. The pre-trained Transformer model for Spanish called
Roberta-large-bne will be executed for the extraction of stylometric features found in texts, in
addition to phraseological features such as MeanWordLen, LexicalDiversity, MeanSentenceLen,
StdevSentenceLen, MeanParagraphLen, DocumentLen and frequently used words taken from the
Spanish corpus called CREA, this process allowed to form a final vector of features which will
be used for training. The aim is to classify political ideology based on short texts taken from
Twitter and analyze the results of each classifier to validate which is the most suitable for the
classification and prediction task, these results will serve as a feasibility indicator for similar
studies in the future.
El objetivo principal de este artículo es la determinación de la inclinación ideológica de usuarios de Twitter en Ecuador. Los datos recopilados se obtuvieron de la plataforma Twitter, estos se almacenaron en Datasets, se procesaron y etiquetaron para alimentar los métodos clasificadores los cuales entrenaron para realizar la predicción de ideología política a través del uso de modelos Transformer y Voting Classifier en Machine Learning, se usará Validación Cruzada para potenciar y evaluar durante el entrenamiento a modelos clasificadores como Logistic Regression, Random Forest, Decision Tree, Multilayer Perceptron y Gradient Boosting. Se ejecutará el modelo Transformer pre-entrenado para el español llamado Roberta-large-bne destinado para la extracción de características estilométricas halladas en textos, además se tendrá características fraseológicas como MeanWordLen, LexicalDiversity, MeanSentenceLen, StdevSentenceLen, MeanParagraphLen, DocumentLen y, de palabras de uso frecuente tomadas del corpus en español llamado CREA, este proceso permitió formar un vector final de características los cuales servirán para el entrenamiento. Se busca clas |
---|---|
ISSN: | 2550-682X |