ProSOUL: A Framework to Identify Propaganda From Online Urdu Content

Today, the rapid dissemination of information on digital platforms has seen the emergence of information pollution such as misinformation, disinformation, fake news, and different types of propaganda. Information pollution has become a serious threat to the online digital world and has posed several...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2020, Vol.8, p.186039-186054
Hauptverfasser: Kausar, Soufia, Tahir, Bilal, Mehmood, Muhammad Amir
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Today, the rapid dissemination of information on digital platforms has seen the emergence of information pollution such as misinformation, disinformation, fake news, and different types of propaganda. Information pollution has become a serious threat to the online digital world and has posed several challenges to social media platforms and governments around the world. In this article, we propose Pro paganda S potting in O nline U rdu L anguage ( ProSOUL ) - a framework to identify content and sources of propaganda spread in the Urdu language. First, we develop a labelled dataset of 11,574 Urdu news to train the machine learning classifiers. Next, we develop the Linguistic Inquiry and Word Count (LIWC) dictionary to extract psycho-linguistic features of Urdu text. We evaluate the performance of different classifiers by varying n-gram, News Landscape (NELA), Word2Vec, and Bidirectional Encoder Representations from Transformers (BERT) features. Our results show that the combination of NELA, word n-gram, and character n-gram features outperform with 0.91 accuracy for Urdu text classification. In addition, Word2Vec embedding outperforms BERT features in classification of the Urdu text with 0.87 accuracy. Moreover, we develop and classify large scale Urdu content repositories to identify web sources spreading propaganda. Our results show that ProSOUL framework performs best for propaganda detection in the online Urdu news content compared to the general web content. To the best of our knowledge, this is the first study on the detection of propaganda content in the Urdu language.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2020.3028131