Ontology-Based Information Extraction for Labeling Radical Online Content Using Distant Supervision

Social media companies dedicate significant resources to create machine-learning models to label harmful content on their platforms, including content promoting violent, extremist beliefs. These models have to evolve over time to keep up with a dynamic threat landscape. Over time, as new violent ide...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Information systems research 2024-03, Vol.35 (1), p.203-225
1. Verfasser:	Etudo, Ugochukwu
Format:	Artikel
Sprache:	eng
Schlagworte:	Collective Action Framing Theory distant supervision Ideology Knowledge representation named entity recognition Ontology Propaganda Radicalism relation extraction Social networks terrorism Websites
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	225
container_issue	1
container_start_page	203
container_title	Information systems research
container_volume	35
creator	Etudo, Ugochukwu
description	Social media companies dedicate significant resources to create machine-learning models to label harmful content on their platforms, including content promoting violent, extremist beliefs. These models have to evolve over time to keep up with a dynamic threat landscape. Over time, as new violent ideologies emerge, existing models will fail to detect them. Training fresh models for the task is risky (there are new model biases to understand), time consuming (you will need to see many examples to predict new examples), and cost-ineffective. We propose an approach that prioritizes the evolution and representation of radical ideas by creating a computer program to explicitly keep track of ideologies. We show how this program uses state-of-the art deep-learning models to create human and machine-readable representations of radical ideologies by automatically consuming content symbolic of those ideologies. Our approach validates the notion that violent ideologies differ in content but are homogenous in structure. With just a few examples of content, the program creates powerful representations that can be used to automatically detect additional content with surprising accuracy. This process greatly reduces the time and resources necessary to adapt existing content-labeling models to the changing ideological and rhetorical landscape. Radical, terroristic organizations pose threats to business, government, and society. The ubiquity of the modern Web and its participatory architecture have enabled such groups to become full-blown online propaganda machines. Today, radicalization that eventually leads to acts of terror occurs predominantly on the Web. Radical ideologies can be spread, in many cases unchecked, by malicious actors who take advantage of the frequently lax surveillance apparatus of online social platforms. This paper argues that an overlooked, essential first step to interdicting this threat is the large-scale, structured collection of knowledge regarding these ideologies in open machine-readable formats. Using Collective Action Framing Theory, this study develops a trio of design artifacts: the Terror Beliefs Ontology (TBO) for a general ontology of terroristic ideology, the Frame Discovery System (FDS) to automatically populate this ontology, and the Frame Resonance Detection System (FRDS) to accurately identify online personae or postings that espouse a radical ideology known to TBO. With a comprehensive evaluation, we demonstrate how these three ins
doi_str_mv	10.1287/isre.2023.1223
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3059956097</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3059956097</sourcerecordid><originalsourceid>FETCH-LOGICAL-c362t-eeb5f19a085b0ffce09b72351938964ac391f4260298b79b5d52339afd4e67513</originalsourceid><addsrcrecordid>eNqFkEtPAyEUhYnRxFrduiZxPZXHMDMstVZt0qSJ2jVhGGhoplCBGvvvZVoTl27gHjjnXvgAuMVogklT39sY9IQgQrMk9AyMMCNVwRitznONyrqo83IJrmLcIIQo5XQE1NIl3_v1oXiUUXdw7owPW5msd3D2nYJUxzIfwoVsdW_dGr7JzirZw6XLUsOpd0m7BFdxuHyyMcms3vc7Hb5szOlrcGFkH_XN7z4Gq-fZx_S1WCxf5tOHRaFoRVKhdcsM5hI1rEXGKI14WxPKMKcNr0qpKMemJBUivGlr3rKOkfwJabpSVzXDdAzuTn13wX_udUxi4_fB5ZGCIsY5qxCvs2tycqngY0ZmxC7YrQwHgZEYQIoBpBhAigFkDsBTQCvvbPyzNw2nuET5jWNQnCz2iC_-1_IHWpeATw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3059956097</pqid></control><display><type>article</type><title>Ontology-Based Information Extraction for Labeling Radical Online Content Using Distant Supervision</title><source>Informs</source><creator>Etudo, Ugochukwu</creator><creatorcontrib>Etudo, Ugochukwu</creatorcontrib><description>Social media companies dedicate significant resources to create machine-learning models to label harmful content on their platforms, including content promoting violent, extremist beliefs. These models have to evolve over time to keep up with a dynamic threat landscape. Over time, as new violent ideologies emerge, existing models will fail to detect them. Training fresh models for the task is risky (there are new model biases to understand), time consuming (you will need to see many examples to predict new examples), and cost-ineffective. We propose an approach that prioritizes the evolution and representation of radical ideas by creating a computer program to explicitly keep track of ideologies. We show how this program uses state-of-the art deep-learning models to create human and machine-readable representations of radical ideologies by automatically consuming content symbolic of those ideologies. Our approach validates the notion that violent ideologies differ in content but are homogenous in structure. With just a few examples of content, the program creates powerful representations that can be used to automatically detect additional content with surprising accuracy. This process greatly reduces the time and resources necessary to adapt existing content-labeling models to the changing ideological and rhetorical landscape. Radical, terroristic organizations pose threats to business, government, and society. The ubiquity of the modern Web and its participatory architecture have enabled such groups to become full-blown online propaganda machines. Today, radicalization that eventually leads to acts of terror occurs predominantly on the Web. Radical ideologies can be spread, in many cases unchecked, by malicious actors who take advantage of the frequently lax surveillance apparatus of online social platforms. This paper argues that an overlooked, essential first step to interdicting this threat is the large-scale, structured collection of knowledge regarding these ideologies in open machine-readable formats. Using Collective Action Framing Theory, this study develops a trio of design artifacts: the Terror Beliefs Ontology (TBO) for a general ontology of terroristic ideology, the Frame Discovery System (FDS) to automatically populate this ontology, and the Frame Resonance Detection System (FRDS) to accurately identify online personae or postings that espouse a radical ideology known to TBO. With a comprehensive evaluation, we demonstrate how these three instantiated design artifacts, working in concert, can automatically construct a knowledge representation of heterogeneous terroristic ideologies and accurately detect radical online postings. We offer the first design that can assign Web text to any radical ideology without the use of a hand-labeled training corpus. History: Olivia Sheng, Senior editor; Huimin Zhao, Associate Editor. Funding: This work was partially supported by the Virginia Commonwealth University Presidential Research Quest (PeRQ) Fund. Supplemental Material: The e-companion is available at https://doi.org/10.1287/isre.2023.1223 .</description><identifier>ISSN: 1047-7047</identifier><identifier>EISSN: 1526-5536</identifier><identifier>DOI: 10.1287/isre.2023.1223</identifier><language>eng</language><publisher>Linthicum: INFORMS</publisher><subject>Collective Action Framing Theory ; distant supervision ; Ideology ; Knowledge representation ; named entity recognition ; Ontology ; Propaganda ; Radicalism ; relation extraction ; Social networks ; terrorism ; Websites</subject><ispartof>Information systems research, 2024-03, Vol.35 (1), p.203-225</ispartof><rights>Copyright Institute for Operations Research and the Management Sciences Mar 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c362t-eeb5f19a085b0ffce09b72351938964ac391f4260298b79b5d52339afd4e67513</citedby><cites>FETCH-LOGICAL-c362t-eeb5f19a085b0ffce09b72351938964ac391f4260298b79b5d52339afd4e67513</cites><orcidid>0000-0002-5690-4282 ; 0000-0002-1706-5383</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://pubsonline.informs.org/doi/full/10.1287/isre.2023.1223$$EHTML$$P50$$Ginforms$$H</linktohtml><link.rule.ids>314,780,784,3692,27924,27925,62616</link.rule.ids></links><search><creatorcontrib>Etudo, Ugochukwu</creatorcontrib><title>Ontology-Based Information Extraction for Labeling Radical Online Content Using Distant Supervision</title><title>Information systems research</title><description>Social media companies dedicate significant resources to create machine-learning models to label harmful content on their platforms, including content promoting violent, extremist beliefs. These models have to evolve over time to keep up with a dynamic threat landscape. Over time, as new violent ideologies emerge, existing models will fail to detect them. Training fresh models for the task is risky (there are new model biases to understand), time consuming (you will need to see many examples to predict new examples), and cost-ineffective. We propose an approach that prioritizes the evolution and representation of radical ideas by creating a computer program to explicitly keep track of ideologies. We show how this program uses state-of-the art deep-learning models to create human and machine-readable representations of radical ideologies by automatically consuming content symbolic of those ideologies. Our approach validates the notion that violent ideologies differ in content but are homogenous in structure. With just a few examples of content, the program creates powerful representations that can be used to automatically detect additional content with surprising accuracy. This process greatly reduces the time and resources necessary to adapt existing content-labeling models to the changing ideological and rhetorical landscape. Radical, terroristic organizations pose threats to business, government, and society. The ubiquity of the modern Web and its participatory architecture have enabled such groups to become full-blown online propaganda machines. Today, radicalization that eventually leads to acts of terror occurs predominantly on the Web. Radical ideologies can be spread, in many cases unchecked, by malicious actors who take advantage of the frequently lax surveillance apparatus of online social platforms. This paper argues that an overlooked, essential first step to interdicting this threat is the large-scale, structured collection of knowledge regarding these ideologies in open machine-readable formats. Using Collective Action Framing Theory, this study develops a trio of design artifacts: the Terror Beliefs Ontology (TBO) for a general ontology of terroristic ideology, the Frame Discovery System (FDS) to automatically populate this ontology, and the Frame Resonance Detection System (FRDS) to accurately identify online personae or postings that espouse a radical ideology known to TBO. With a comprehensive evaluation, we demonstrate how these three instantiated design artifacts, working in concert, can automatically construct a knowledge representation of heterogeneous terroristic ideologies and accurately detect radical online postings. We offer the first design that can assign Web text to any radical ideology without the use of a hand-labeled training corpus. History: Olivia Sheng, Senior editor; Huimin Zhao, Associate Editor. Funding: This work was partially supported by the Virginia Commonwealth University Presidential Research Quest (PeRQ) Fund. Supplemental Material: The e-companion is available at https://doi.org/10.1287/isre.2023.1223 .</description><subject>Collective Action Framing Theory</subject><subject>distant supervision</subject><subject>Ideology</subject><subject>Knowledge representation</subject><subject>named entity recognition</subject><subject>Ontology</subject><subject>Propaganda</subject><subject>Radicalism</subject><subject>relation extraction</subject><subject>Social networks</subject><subject>terrorism</subject><subject>Websites</subject><issn>1047-7047</issn><issn>1526-5536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNqFkEtPAyEUhYnRxFrduiZxPZXHMDMstVZt0qSJ2jVhGGhoplCBGvvvZVoTl27gHjjnXvgAuMVogklT39sY9IQgQrMk9AyMMCNVwRitznONyrqo83IJrmLcIIQo5XQE1NIl3_v1oXiUUXdw7owPW5msd3D2nYJUxzIfwoVsdW_dGr7JzirZw6XLUsOpd0m7BFdxuHyyMcms3vc7Hb5szOlrcGFkH_XN7z4Gq-fZx_S1WCxf5tOHRaFoRVKhdcsM5hI1rEXGKI14WxPKMKcNr0qpKMemJBUivGlr3rKOkfwJabpSVzXDdAzuTn13wX_udUxi4_fB5ZGCIsY5qxCvs2tycqngY0ZmxC7YrQwHgZEYQIoBpBhAigFkDsBTQCvvbPyzNw2nuET5jWNQnCz2iC_-1_IHWpeATw</recordid><startdate>20240301</startdate><enddate>20240301</enddate><creator>Etudo, Ugochukwu</creator><general>INFORMS</general><general>Institute for Operations Research and the Management Sciences</general><scope>OQ6</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>JQ2</scope><orcidid>https://orcid.org/0000-0002-5690-4282</orcidid><orcidid>https://orcid.org/0000-0002-1706-5383</orcidid></search><sort><creationdate>20240301</creationdate><title>Ontology-Based Information Extraction for Labeling Radical Online Content Using Distant Supervision</title><author>Etudo, Ugochukwu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c362t-eeb5f19a085b0ffce09b72351938964ac391f4260298b79b5d52339afd4e67513</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Collective Action Framing Theory</topic><topic>distant supervision</topic><topic>Ideology</topic><topic>Knowledge representation</topic><topic>named entity recognition</topic><topic>Ontology</topic><topic>Propaganda</topic><topic>Radicalism</topic><topic>relation extraction</topic><topic>Social networks</topic><topic>terrorism</topic><topic>Websites</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Etudo, Ugochukwu</creatorcontrib><collection>ECONIS</collection><collection>CrossRef</collection><collection>ProQuest Computer Science Collection</collection><jtitle>Information systems research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Etudo, Ugochukwu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Ontology-Based Information Extraction for Labeling Radical Online Content Using Distant Supervision</atitle><jtitle>Information systems research</jtitle><date>2024-03-01</date><risdate>2024</risdate><volume>35</volume><issue>1</issue><spage>203</spage><epage>225</epage><pages>203-225</pages><issn>1047-7047</issn><eissn>1526-5536</eissn><abstract>Social media companies dedicate significant resources to create machine-learning models to label harmful content on their platforms, including content promoting violent, extremist beliefs. These models have to evolve over time to keep up with a dynamic threat landscape. Over time, as new violent ideologies emerge, existing models will fail to detect them. Training fresh models for the task is risky (there are new model biases to understand), time consuming (you will need to see many examples to predict new examples), and cost-ineffective. We propose an approach that prioritizes the evolution and representation of radical ideas by creating a computer program to explicitly keep track of ideologies. We show how this program uses state-of-the art deep-learning models to create human and machine-readable representations of radical ideologies by automatically consuming content symbolic of those ideologies. Our approach validates the notion that violent ideologies differ in content but are homogenous in structure. With just a few examples of content, the program creates powerful representations that can be used to automatically detect additional content with surprising accuracy. This process greatly reduces the time and resources necessary to adapt existing content-labeling models to the changing ideological and rhetorical landscape. Radical, terroristic organizations pose threats to business, government, and society. The ubiquity of the modern Web and its participatory architecture have enabled such groups to become full-blown online propaganda machines. Today, radicalization that eventually leads to acts of terror occurs predominantly on the Web. Radical ideologies can be spread, in many cases unchecked, by malicious actors who take advantage of the frequently lax surveillance apparatus of online social platforms. This paper argues that an overlooked, essential first step to interdicting this threat is the large-scale, structured collection of knowledge regarding these ideologies in open machine-readable formats. Using Collective Action Framing Theory, this study develops a trio of design artifacts: the Terror Beliefs Ontology (TBO) for a general ontology of terroristic ideology, the Frame Discovery System (FDS) to automatically populate this ontology, and the Frame Resonance Detection System (FRDS) to accurately identify online personae or postings that espouse a radical ideology known to TBO. With a comprehensive evaluation, we demonstrate how these three instantiated design artifacts, working in concert, can automatically construct a knowledge representation of heterogeneous terroristic ideologies and accurately detect radical online postings. We offer the first design that can assign Web text to any radical ideology without the use of a hand-labeled training corpus. History: Olivia Sheng, Senior editor; Huimin Zhao, Associate Editor. Funding: This work was partially supported by the Virginia Commonwealth University Presidential Research Quest (PeRQ) Fund. Supplemental Material: The e-companion is available at https://doi.org/10.1287/isre.2023.1223 .</abstract><cop>Linthicum</cop><pub>INFORMS</pub><doi>10.1287/isre.2023.1223</doi><tpages>23</tpages><orcidid>https://orcid.org/0000-0002-5690-4282</orcidid><orcidid>https://orcid.org/0000-0002-1706-5383</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1047-7047
ispartof	Information systems research, 2024-03, Vol.35 (1), p.203-225
issn	1047-7047 1526-5536
language	eng
recordid	cdi_proquest_journals_3059956097
source	Informs
subjects	Collective Action Framing Theory distant supervision Ideology Knowledge representation named entity recognition Ontology Propaganda Radicalism relation extraction Social networks terrorism Websites
title	Ontology-Based Information Extraction for Labeling Radical Online Content Using Distant Supervision
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-30T20%3A58%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Ontology-Based%20Information%20Extraction%20for%20Labeling%20Radical%20Online%20Content%20Using%20Distant%20Supervision&rft.jtitle=Information%20systems%20research&rft.au=Etudo,%20Ugochukwu&rft.date=2024-03-01&rft.volume=35&rft.issue=1&rft.spage=203&rft.epage=225&rft.pages=203-225&rft.issn=1047-7047&rft.eissn=1526-5536&rft_id=info:doi/10.1287/isre.2023.1223&rft_dat=%3Cproquest_cross%3E3059956097%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3059956097&rft_id=info:pmid/&rfr_iscdi=true