Just KIDDIN: Knowledge Infusion and Distillation for Detection of INdecent Memes

Toxicity identification in online multimodal environments remains a challenging task due to the complexity of contextual connections across modalities (e.g., textual and visual). In this paper, we propose a novel framework that integrates Knowledge Distillation (KD) from Large Visual Language Models...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Garg, Rahul, Padhi, Trilok, Jain, Hemang, Kursuncu, Ugur, Kumaraguru, Ponnurangam
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Garg, Rahul Padhi, Trilok Jain, Hemang Kursuncu, Ugur Kumaraguru, Ponnurangam
description	Toxicity identification in online multimodal environments remains a challenging task due to the complexity of contextual connections across modalities (e.g., textual and visual). In this paper, we propose a novel framework that integrates Knowledge Distillation (KD) from Large Visual Language Models (LVLMs) and knowledge infusion to enhance the performance of toxicity detection in hateful memes. Our approach extracts sub-knowledge graphs from ConceptNet, a large-scale commonsense Knowledge Graph (KG) to be infused within a compact VLM framework. The relational context between toxic phrases in captions and memes, as well as visual concepts in memes enhance the model's reasoning capabilities. Experimental results from our study on two hate speech benchmark datasets demonstrate superior performance over the state-of-the-art baselines across AU-ROC, F1, and Recall with improvements of 1.1%, 7%, and 35%, respectively. Given the contextual complexity of the toxicity detection task, our approach showcases the significance of learning from both explicit (i.e. KG) as well as implicit (i.e. LVLMs) contextual cues incorporated through a hybrid neurosymbolic approach. This is crucial for real-world applications where accurate and scalable recognition of toxic content is critical for creating safer online environments.
doi_str_mv	10.48550/arxiv.2411.12174
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2411_12174</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2411_12174</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2411_121743</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE01DM0MjQ34WQI8CotLlHw9nRx8fSzUvDOyy_PSU1JT1XwzEsrLc7Mz1NIzEtRcMksLsnMyUksAQmk5RcpuKSWpCaDeflpCp5-KanJqXklCr6puanFPAysaYk5xam8UJqbQd7NNcTZQxdsd3xBUWZuYlFlPMgN8WA3GBNWAQBYSTtm</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Just KIDDIN: Knowledge Infusion and Distillation for Detection of INdecent Memes</title><source>arXiv.org</source><creator>Garg, Rahul ; Padhi, Trilok ; Jain, Hemang ; Kursuncu, Ugur ; Kumaraguru, Ponnurangam</creator><creatorcontrib>Garg, Rahul ; Padhi, Trilok ; Jain, Hemang ; Kursuncu, Ugur ; Kumaraguru, Ponnurangam</creatorcontrib><description>Toxicity identification in online multimodal environments remains a challenging task due to the complexity of contextual connections across modalities (e.g., textual and visual). In this paper, we propose a novel framework that integrates Knowledge Distillation (KD) from Large Visual Language Models (LVLMs) and knowledge infusion to enhance the performance of toxicity detection in hateful memes. Our approach extracts sub-knowledge graphs from ConceptNet, a large-scale commonsense Knowledge Graph (KG) to be infused within a compact VLM framework. The relational context between toxic phrases in captions and memes, as well as visual concepts in memes enhance the model's reasoning capabilities. Experimental results from our study on two hate speech benchmark datasets demonstrate superior performance over the state-of-the-art baselines across AU-ROC, F1, and Recall with improvements of 1.1%, 7%, and 35%, respectively. Given the contextual complexity of the toxicity detection task, our approach showcases the significance of learning from both explicit (i.e. KG) as well as implicit (i.e. LVLMs) contextual cues incorporated through a hybrid neurosymbolic approach. This is crucial for real-world applications where accurate and scalable recognition of toxic content is critical for creating safer online environments.</description><identifier>DOI: 10.48550/arxiv.2411.12174</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Learning</subject><creationdate>2024-11</creationdate><rights>http://creativecommons.org/licenses/by-nc-sa/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2411.12174$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2411.12174$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Garg, Rahul</creatorcontrib><creatorcontrib>Padhi, Trilok</creatorcontrib><creatorcontrib>Jain, Hemang</creatorcontrib><creatorcontrib>Kursuncu, Ugur</creatorcontrib><creatorcontrib>Kumaraguru, Ponnurangam</creatorcontrib><title>Just KIDDIN: Knowledge Infusion and Distillation for Detection of INdecent Memes</title><description>Toxicity identification in online multimodal environments remains a challenging task due to the complexity of contextual connections across modalities (e.g., textual and visual). In this paper, we propose a novel framework that integrates Knowledge Distillation (KD) from Large Visual Language Models (LVLMs) and knowledge infusion to enhance the performance of toxicity detection in hateful memes. Our approach extracts sub-knowledge graphs from ConceptNet, a large-scale commonsense Knowledge Graph (KG) to be infused within a compact VLM framework. The relational context between toxic phrases in captions and memes, as well as visual concepts in memes enhance the model's reasoning capabilities. Experimental results from our study on two hate speech benchmark datasets demonstrate superior performance over the state-of-the-art baselines across AU-ROC, F1, and Recall with improvements of 1.1%, 7%, and 35%, respectively. Given the contextual complexity of the toxicity detection task, our approach showcases the significance of learning from both explicit (i.e. KG) as well as implicit (i.e. LVLMs) contextual cues incorporated through a hybrid neurosymbolic approach. This is crucial for real-world applications where accurate and scalable recognition of toxic content is critical for creating safer online environments.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE01DM0MjQ34WQI8CotLlHw9nRx8fSzUvDOyy_PSU1JT1XwzEsrLc7Mz1NIzEtRcMksLsnMyUksAQmk5RcpuKSWpCaDeflpCp5-KanJqXklCr6puanFPAysaYk5xam8UJqbQd7NNcTZQxdsd3xBUWZuYlFlPMgN8WA3GBNWAQBYSTtm</recordid><startdate>20241118</startdate><enddate>20241118</enddate><creator>Garg, Rahul</creator><creator>Padhi, Trilok</creator><creator>Jain, Hemang</creator><creator>Kursuncu, Ugur</creator><creator>Kumaraguru, Ponnurangam</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241118</creationdate><title>Just KIDDIN: Knowledge Infusion and Distillation for Detection of INdecent Memes</title><author>Garg, Rahul ; Padhi, Trilok ; Jain, Hemang ; Kursuncu, Ugur ; Kumaraguru, Ponnurangam</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2411_121743</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Garg, Rahul</creatorcontrib><creatorcontrib>Padhi, Trilok</creatorcontrib><creatorcontrib>Jain, Hemang</creatorcontrib><creatorcontrib>Kursuncu, Ugur</creatorcontrib><creatorcontrib>Kumaraguru, Ponnurangam</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Garg, Rahul</au><au>Padhi, Trilok</au><au>Jain, Hemang</au><au>Kursuncu, Ugur</au><au>Kumaraguru, Ponnurangam</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Just KIDDIN: Knowledge Infusion and Distillation for Detection of INdecent Memes</atitle><date>2024-11-18</date><risdate>2024</risdate><abstract>Toxicity identification in online multimodal environments remains a challenging task due to the complexity of contextual connections across modalities (e.g., textual and visual). In this paper, we propose a novel framework that integrates Knowledge Distillation (KD) from Large Visual Language Models (LVLMs) and knowledge infusion to enhance the performance of toxicity detection in hateful memes. Our approach extracts sub-knowledge graphs from ConceptNet, a large-scale commonsense Knowledge Graph (KG) to be infused within a compact VLM framework. The relational context between toxic phrases in captions and memes, as well as visual concepts in memes enhance the model's reasoning capabilities. Experimental results from our study on two hate speech benchmark datasets demonstrate superior performance over the state-of-the-art baselines across AU-ROC, F1, and Recall with improvements of 1.1%, 7%, and 35%, respectively. Given the contextual complexity of the toxicity detection task, our approach showcases the significance of learning from both explicit (i.e. KG) as well as implicit (i.e. LVLMs) contextual cues incorporated through a hybrid neurosymbolic approach. This is crucial for real-world applications where accurate and scalable recognition of toxic content is critical for creating safer online environments.</abstract><doi>10.48550/arxiv.2411.12174</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2411.12174
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2411_12174
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning
title	Just KIDDIN: Knowledge Infusion and Distillation for Detection of INdecent Memes
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T14%3A37%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Just%20KIDDIN:%20Knowledge%20Infusion%20and%20Distillation%20for%20Detection%20of%20INdecent%20Memes&rft.au=Garg,%20Rahul&rft.date=2024-11-18&rft_id=info:doi/10.48550/arxiv.2411.12174&rft_dat=%3Carxiv_GOX%3E2411_12174%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true