Ontology-Based Information Extraction for Labeling Radical Online Content Using Distant Supervision

Social media companies dedicate significant resources to create machine-learning models to label harmful content on their platforms, including content promoting violent, extremist beliefs. These models have to evolve over time to keep up with a dynamic threat landscape. Over time, as new violent ide...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Information systems research 2024-03, Vol.35 (1), p.203-225
1. Verfasser:	Etudo, Ugochukwu
Format:	Artikel
Sprache:	eng
Schlagworte:	Collective Action Framing Theory distant supervision Ideology Knowledge representation named entity recognition Ontology Propaganda Radicalism relation extraction Social networks terrorism Websites
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Social media companies dedicate significant resources to create machine-learning models to label harmful content on their platforms, including content promoting violent, extremist beliefs. These models have to evolve over time to keep up with a dynamic threat landscape. Over time, as new violent ideologies emerge, existing models will fail to detect them. Training fresh models for the task is risky (there are new model biases to understand), time consuming (you will need to see many examples to predict new examples), and cost-ineffective. We propose an approach that prioritizes the evolution and representation of radical ideas by creating a computer program to explicitly keep track of ideologies. We show how this program uses state-of-the art deep-learning models to create human and machine-readable representations of radical ideologies by automatically consuming content symbolic of those ideologies. Our approach validates the notion that violent ideologies differ in content but are homogenous in structure. With just a few examples of content, the program creates powerful representations that can be used to automatically detect additional content with surprising accuracy. This process greatly reduces the time and resources necessary to adapt existing content-labeling models to the changing ideological and rhetorical landscape. Radical, terroristic organizations pose threats to business, government, and society. The ubiquity of the modern Web and its participatory architecture have enabled such groups to become full-blown online propaganda machines. Today, radicalization that eventually leads to acts of terror occurs predominantly on the Web. Radical ideologies can be spread, in many cases unchecked, by malicious actors who take advantage of the frequently lax surveillance apparatus of online social platforms. This paper argues that an overlooked, essential first step to interdicting this threat is the large-scale, structured collection of knowledge regarding these ideologies in open machine-readable formats. Using Collective Action Framing Theory, this study develops a trio of design artifacts: the Terror Beliefs Ontology (TBO) for a general ontology of terroristic ideology, the Frame Discovery System (FDS) to automatically populate this ontology, and the Frame Resonance Detection System (FRDS) to accurately identify online personae or postings that espouse a radical ideology known to TBO. With a comprehensive evaluation, we demonstrate how these three ins
ISSN:	1047-7047 1526-5536
DOI:	10.1287/isre.2023.1223