Ontology-Based Information Extraction for Labeling Radical Online Content Using Distant Supervision

Social media companies dedicate significant resources to create machine-learning models to label harmful content on their platforms, including content promoting violent, extremist beliefs. These models have to evolve over time to keep up with a dynamic threat landscape. Over time, as new violent ide...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information systems research 2024-03, Vol.35 (1), p.203-225
1. Verfasser: Etudo, Ugochukwu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 225
container_issue 1
container_start_page 203
container_title Information systems research
container_volume 35
creator Etudo, Ugochukwu
description Social media companies dedicate significant resources to create machine-learning models to label harmful content on their platforms, including content promoting violent, extremist beliefs. These models have to evolve over time to keep up with a dynamic threat landscape. Over time, as new violent ideologies emerge, existing models will fail to detect them. Training fresh models for the task is risky (there are new model biases to understand), time consuming (you will need to see many examples to predict new examples), and cost-ineffective. We propose an approach that prioritizes the evolution and representation of radical ideas by creating a computer program to explicitly keep track of ideologies. We show how this program uses state-of-the art deep-learning models to create human and machine-readable representations of radical ideologies by automatically consuming content symbolic of those ideologies. Our approach validates the notion that violent ideologies differ in content but are homogenous in structure. With just a few examples of content, the program creates powerful representations that can be used to automatically detect additional content with surprising accuracy. This process greatly reduces the time and resources necessary to adapt existing content-labeling models to the changing ideological and rhetorical landscape. Radical, terroristic organizations pose threats to business, government, and society. The ubiquity of the modern Web and its participatory architecture have enabled such groups to become full-blown online propaganda machines. Today, radicalization that eventually leads to acts of terror occurs predominantly on the Web. Radical ideologies can be spread, in many cases unchecked, by malicious actors who take advantage of the frequently lax surveillance apparatus of online social platforms. This paper argues that an overlooked, essential first step to interdicting this threat is the large-scale, structured collection of knowledge regarding these ideologies in open machine-readable formats. Using Collective Action Framing Theory, this study develops a trio of design artifacts: the Terror Beliefs Ontology (TBO) for a general ontology of terroristic ideology, the Frame Discovery System (FDS) to automatically populate this ontology, and the Frame Resonance Detection System (FRDS) to accurately identify online personae or postings that espouse a radical ideology known to TBO. With a comprehensive evaluation, we demonstrate how these three ins
doi_str_mv 10.1287/isre.2023.1223
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3059956097</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3059956097</sourcerecordid><originalsourceid>FETCH-LOGICAL-c362t-eeb5f19a085b0ffce09b72351938964ac391f4260298b79b5d52339afd4e67513</originalsourceid><addsrcrecordid>eNqFkEtPAyEUhYnRxFrduiZxPZXHMDMstVZt0qSJ2jVhGGhoplCBGvvvZVoTl27gHjjnXvgAuMVogklT39sY9IQgQrMk9AyMMCNVwRitznONyrqo83IJrmLcIIQo5XQE1NIl3_v1oXiUUXdw7owPW5msd3D2nYJUxzIfwoVsdW_dGr7JzirZw6XLUsOpd0m7BFdxuHyyMcms3vc7Hb5szOlrcGFkH_XN7z4Gq-fZx_S1WCxf5tOHRaFoRVKhdcsM5hI1rEXGKI14WxPKMKcNr0qpKMemJBUivGlr3rKOkfwJabpSVzXDdAzuTn13wX_udUxi4_fB5ZGCIsY5qxCvs2tycqngY0ZmxC7YrQwHgZEYQIoBpBhAigFkDsBTQCvvbPyzNw2nuET5jWNQnCz2iC_-1_IHWpeATw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3059956097</pqid></control><display><type>article</type><title>Ontology-Based Information Extraction for Labeling Radical Online Content Using Distant Supervision</title><source>Informs</source><creator>Etudo, Ugochukwu</creator><creatorcontrib>Etudo, Ugochukwu</creatorcontrib><description>Social media companies dedicate significant resources to create machine-learning models to label harmful content on their platforms, including content promoting violent, extremist beliefs. These models have to evolve over time to keep up with a dynamic threat landscape. Over time, as new violent ideologies emerge, existing models will fail to detect them. Training fresh models for the task is risky (there are new model biases to understand), time consuming (you will need to see many examples to predict new examples), and cost-ineffective. We propose an approach that prioritizes the evolution and representation of radical ideas by creating a computer program to explicitly keep track of ideologies. We show how this program uses state-of-the art deep-learning models to create human and machine-readable representations of radical ideologies by automatically consuming content symbolic of those ideologies. Our approach validates the notion that violent ideologies differ in content but are homogenous in structure. With just a few examples of content, the program creates powerful representations that can be used to automatically detect additional content with surprising accuracy. This process greatly reduces the time and resources necessary to adapt existing content-labeling models to the changing ideological and rhetorical landscape. Radical, terroristic organizations pose threats to business, government, and society. The ubiquity of the modern Web and its participatory architecture have enabled such groups to become full-blown online propaganda machines. Today, radicalization that eventually leads to acts of terror occurs predominantly on the Web. Radical ideologies can be spread, in many cases unchecked, by malicious actors who take advantage of the frequently lax surveillance apparatus of online social platforms. This paper argues that an overlooked, essential first step to interdicting this threat is the large-scale, structured collection of knowledge regarding these ideologies in open machine-readable formats. Using Collective Action Framing Theory, this study develops a trio of design artifacts: the Terror Beliefs Ontology (TBO) for a general ontology of terroristic ideology, the Frame Discovery System (FDS) to automatically populate this ontology, and the Frame Resonance Detection System (FRDS) to accurately identify online personae or postings that espouse a radical ideology known to TBO. With a comprehensive evaluation, we demonstrate how these three instantiated design artifacts, working in concert, can automatically construct a knowledge representation of heterogeneous terroristic ideologies and accurately detect radical online postings. We offer the first design that can assign Web text to any radical ideology without the use of a hand-labeled training corpus. History: Olivia Sheng, Senior editor; Huimin Zhao, Associate Editor. Funding: This work was partially supported by the Virginia Commonwealth University Presidential Research Quest (PeRQ) Fund. Supplemental Material: The e-companion is available at https://doi.org/10.1287/isre.2023.1223 .</description><identifier>ISSN: 1047-7047</identifier><identifier>EISSN: 1526-5536</identifier><identifier>DOI: 10.1287/isre.2023.1223</identifier><language>eng</language><publisher>Linthicum: INFORMS</publisher><subject>Collective Action Framing Theory ; distant supervision ; Ideology ; Knowledge representation ; named entity recognition ; Ontology ; Propaganda ; Radicalism ; relation extraction ; Social networks ; terrorism ; Websites</subject><ispartof>Information systems research, 2024-03, Vol.35 (1), p.203-225</ispartof><rights>Copyright Institute for Operations Research and the Management Sciences Mar 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c362t-eeb5f19a085b0ffce09b72351938964ac391f4260298b79b5d52339afd4e67513</citedby><cites>FETCH-LOGICAL-c362t-eeb5f19a085b0ffce09b72351938964ac391f4260298b79b5d52339afd4e67513</cites><orcidid>0000-0002-5690-4282 ; 0000-0002-1706-5383</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://pubsonline.informs.org/doi/full/10.1287/isre.2023.1223$$EHTML$$P50$$Ginforms$$H</linktohtml><link.rule.ids>314,780,784,3692,27924,27925,62616</link.rule.ids></links><search><creatorcontrib>Etudo, Ugochukwu</creatorcontrib><title>Ontology-Based Information Extraction for Labeling Radical Online Content Using Distant Supervision</title><title>Information systems research</title><description>Social media companies dedicate significant resources to create machine-learning models to label harmful content on their platforms, including content promoting violent, extremist beliefs. These models have to evolve over time to keep up with a dynamic threat landscape. Over time, as new violent ideologies emerge, existing models will fail to detect them. Training fresh models for the task is risky (there are new model biases to understand), time consuming (you will need to see many examples to predict new examples), and cost-ineffective. We propose an approach that prioritizes the evolution and representation of radical ideas by creating a computer program to explicitly keep track of ideologies. We show how this program uses state-of-the art deep-learning models to create human and machine-readable representations of radical ideologies by automatically consuming content symbolic of those ideologies. Our approach validates the notion that violent ideologies differ in content but are homogenous in structure. With just a few examples of content, the program creates powerful representations that can be used to automatically detect additional content with surprising accuracy. This process greatly reduces the time and resources necessary to adapt existing content-labeling models to the changing ideological and rhetorical landscape. Radical, terroristic organizations pose threats to business, government, and society. The ubiquity of the modern Web and its participatory architecture have enabled such groups to become full-blown online propaganda machines. Today, radicalization that eventually leads to acts of terror occurs predominantly on the Web. Radical ideologies can be spread, in many cases unchecked, by malicious actors who take advantage of the frequently lax surveillance apparatus of online social platforms. This paper argues that an overlooked, essential first step to interdicting this threat is the large-scale, structured collection of knowledge regarding these ideologies in open machine-readable formats. Using Collective Action Framing Theory, this study develops a trio of design artifacts: the Terror Beliefs Ontology (TBO) for a general ontology of terroristic ideology, the Frame Discovery System (FDS) to automatically populate this ontology, and the Frame Resonance Detection System (FRDS) to accurately identify online personae or postings that espouse a radical ideology known to TBO. With a comprehensive evaluation, we demonstrate how these three instantiated design artifacts, working in concert, can automatically construct a knowledge representation of heterogeneous terroristic ideologies and accurately detect radical online postings. We offer the first design that can assign Web text to any radical ideology without the use of a hand-labeled training corpus. History: Olivia Sheng, Senior editor; Huimin Zhao, Associate Editor. Funding: This work was partially supported by the Virginia Commonwealth University Presidential Research Quest (PeRQ) Fund. Supplemental Material: The e-companion is available at https://doi.org/10.1287/isre.2023.1223 .</description><subject>Collective Action Framing Theory</subject><subject>distant supervision</subject><subject>Ideology</subject><subject>Knowledge representation</subject><subject>named entity recognition</subject><subject>Ontology</subject><subject>Propaganda</subject><subject>Radicalism</subject><subject>relation extraction</subject><subject>Social networks</subject><subject>terrorism</subject><subject>Websites</subject><issn>1047-7047</issn><issn>1526-5536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNqFkEtPAyEUhYnRxFrduiZxPZXHMDMstVZt0qSJ2jVhGGhoplCBGvvvZVoTl27gHjjnXvgAuMVogklT39sY9IQgQrMk9AyMMCNVwRitznONyrqo83IJrmLcIIQo5XQE1NIl3_v1oXiUUXdw7owPW5msd3D2nYJUxzIfwoVsdW_dGr7JzirZw6XLUsOpd0m7BFdxuHyyMcms3vc7Hb5szOlrcGFkH_XN7z4Gq-fZx_S1WCxf5tOHRaFoRVKhdcsM5hI1rEXGKI14WxPKMKcNr0qpKMemJBUivGlr3rKOkfwJabpSVzXDdAzuTn13wX_udUxi4_fB5ZGCIsY5qxCvs2tycqngY0ZmxC7YrQwHgZEYQIoBpBhAigFkDsBTQCvvbPyzNw2nuET5jWNQnCz2iC_-1_IHWpeATw</recordid><startdate>20240301</startdate><enddate>20240301</enddate><creator>Etudo, Ugochukwu</creator><general>INFORMS</general><general>Institute for Operations Research and the Management Sciences</general><scope>OQ6</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>JQ2</scope><orcidid>https://orcid.org/0000-0002-5690-4282</orcidid><orcidid>https://orcid.org/0000-0002-1706-5383</orcidid></search><sort><creationdate>20240301</creationdate><title>Ontology-Based Information Extraction for Labeling Radical Online Content Using Distant Supervision</title><author>Etudo, Ugochukwu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c362t-eeb5f19a085b0ffce09b72351938964ac391f4260298b79b5d52339afd4e67513</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Collective Action Framing Theory</topic><topic>distant supervision</topic><topic>Ideology</topic><topic>Knowledge representation</topic><topic>named entity recognition</topic><topic>Ontology</topic><topic>Propaganda</topic><topic>Radicalism</topic><topic>relation extraction</topic><topic>Social networks</topic><topic>terrorism</topic><topic>Websites</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Etudo, Ugochukwu</creatorcontrib><collection>ECONIS</collection><collection>CrossRef</collection><collection>ProQuest Computer Science Collection</collection><jtitle>Information systems research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Etudo, Ugochukwu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Ontology-Based Information Extraction for Labeling Radical Online Content Using Distant Supervision</atitle><jtitle>Information systems research</jtitle><date>2024-03-01</date><risdate>2024</risdate><volume>35</volume><issue>1</issue><spage>203</spage><epage>225</epage><pages>203-225</pages><issn>1047-7047</issn><eissn>1526-5536</eissn><abstract>Social media companies dedicate significant resources to create machine-learning models to label harmful content on their platforms, including content promoting violent, extremist beliefs. These models have to evolve over time to keep up with a dynamic threat landscape. Over time, as new violent ideologies emerge, existing models will fail to detect them. Training fresh models for the task is risky (there are new model biases to understand), time consuming (you will need to see many examples to predict new examples), and cost-ineffective. We propose an approach that prioritizes the evolution and representation of radical ideas by creating a computer program to explicitly keep track of ideologies. We show how this program uses state-of-the art deep-learning models to create human and machine-readable representations of radical ideologies by automatically consuming content symbolic of those ideologies. Our approach validates the notion that violent ideologies differ in content but are homogenous in structure. With just a few examples of content, the program creates powerful representations that can be used to automatically detect additional content with surprising accuracy. This process greatly reduces the time and resources necessary to adapt existing content-labeling models to the changing ideological and rhetorical landscape. Radical, terroristic organizations pose threats to business, government, and society. The ubiquity of the modern Web and its participatory architecture have enabled such groups to become full-blown online propaganda machines. Today, radicalization that eventually leads to acts of terror occurs predominantly on the Web. Radical ideologies can be spread, in many cases unchecked, by malicious actors who take advantage of the frequently lax surveillance apparatus of online social platforms. This paper argues that an overlooked, essential first step to interdicting this threat is the large-scale, structured collection of knowledge regarding these ideologies in open machine-readable formats. Using Collective Action Framing Theory, this study develops a trio of design artifacts: the Terror Beliefs Ontology (TBO) for a general ontology of terroristic ideology, the Frame Discovery System (FDS) to automatically populate this ontology, and the Frame Resonance Detection System (FRDS) to accurately identify online personae or postings that espouse a radical ideology known to TBO. With a comprehensive evaluation, we demonstrate how these three instantiated design artifacts, working in concert, can automatically construct a knowledge representation of heterogeneous terroristic ideologies and accurately detect radical online postings. We offer the first design that can assign Web text to any radical ideology without the use of a hand-labeled training corpus. History: Olivia Sheng, Senior editor; Huimin Zhao, Associate Editor. Funding: This work was partially supported by the Virginia Commonwealth University Presidential Research Quest (PeRQ) Fund. Supplemental Material: The e-companion is available at https://doi.org/10.1287/isre.2023.1223 .</abstract><cop>Linthicum</cop><pub>INFORMS</pub><doi>10.1287/isre.2023.1223</doi><tpages>23</tpages><orcidid>https://orcid.org/0000-0002-5690-4282</orcidid><orcidid>https://orcid.org/0000-0002-1706-5383</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1047-7047
ispartof Information systems research, 2024-03, Vol.35 (1), p.203-225
issn 1047-7047
1526-5536
language eng
recordid cdi_proquest_journals_3059956097
source Informs
subjects Collective Action Framing Theory
distant supervision
Ideology
Knowledge representation
named entity recognition
Ontology
Propaganda
Radicalism
relation extraction
Social networks
terrorism
Websites
title Ontology-Based Information Extraction for Labeling Radical Online Content Using Distant Supervision
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-30T20%3A58%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Ontology-Based%20Information%20Extraction%20for%20Labeling%20Radical%20Online%20Content%20Using%20Distant%20Supervision&rft.jtitle=Information%20systems%20research&rft.au=Etudo,%20Ugochukwu&rft.date=2024-03-01&rft.volume=35&rft.issue=1&rft.spage=203&rft.epage=225&rft.pages=203-225&rft.issn=1047-7047&rft.eissn=1526-5536&rft_id=info:doi/10.1287/isre.2023.1223&rft_dat=%3Cproquest_cross%3E3059956097%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3059956097&rft_id=info:pmid/&rfr_iscdi=true