Identificación de ideología política mediante un modelo Transformer para estilometría y Clasificación por votos en Machine Learning

The main objective of this article is the determination of the ideological inclination of Twitter users in Ecuador. The collected data were obtained from the Twitter platform, these were stored in Datasets, processed and labeled to feed the classifier methods which trained to perform the prediction...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Polo del Conocimiento: Revista científico - profesional 2022, Vol.7 (9), p.1457-1474
Hauptverfasser: Mendoza Morán, Verónica, Ferruzola Sánchez, William, Aspiazu Torres, Abel, Espin Riofrio, César Humberto
Format: Artikel
Sprache:spa
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The main objective of this article is the determination of the ideological inclination of Twitter users in Ecuador. The collected data were obtained from the Twitter platform, these were stored in Datasets, processed and labeled to feed the classifier methods which trained to perform the prediction of political ideology through the use of Transformer and Voting Classifier models in Machine Learning, Cross Validation will be used to enhance and evaluate during training classifier models such as Logistic Regression, Random Forest, Decision Tree, Multilayer Perceptron and Gradient Boosting. The pre-trained Transformer model for Spanish called Roberta-large-bne will be executed for the extraction of stylometric features found in texts, in addition to phraseological features such as MeanWordLen, LexicalDiversity, MeanSentenceLen, StdevSentenceLen, MeanParagraphLen, DocumentLen and frequently used words taken from the Spanish corpus called CREA, this process allowed to form a final vector of features which will be used for training. The aim is to classify political ideology based on short texts taken from Twitter and analyze the results of each classifier to validate which is the most suitable for the classification and prediction task, these results will serve as a feasibility indicator for similar studies in the future. El objetivo principal de este artículo es la determinación de la inclinación ideológica de usuarios de Twitter en Ecuador. Los datos recopilados se obtuvieron de la plataforma Twitter, estos se almacenaron en Datasets, se procesaron y etiquetaron para alimentar los métodos clasificadores los cuales entrenaron para realizar la predicción de ideología política a través del uso de modelos Transformer y Voting Classifier en Machine Learning, se usará Validación Cruzada para potenciar y evaluar durante el entrenamiento a modelos clasificadores como Logistic Regression, Random Forest, Decision Tree, Multilayer Perceptron y Gradient Boosting. Se ejecutará el modelo Transformer pre-entrenado para el español llamado Roberta-large-bne destinado para la extracción de características estilométricas halladas en textos, además se tendrá características fraseológicas como MeanWordLen, LexicalDiversity, MeanSentenceLen, StdevSentenceLen, MeanParagraphLen, DocumentLen y, de palabras de uso frecuente tomadas del corpus en español llamado CREA, este proceso permitió formar un vector final de características los cuales servirán para el entrenamiento. Se busca clas
ISSN:2550-682X