Just KIDDIN: Knowledge Infusion and Distillation for Detection of INdecent Memes

Toxicity identification in online multimodal environments remains a challenging task due to the complexity of contextual connections across modalities (e.g., textual and visual). In this paper, we propose a novel framework that integrates Knowledge Distillation (KD) from Large Visual Language Models...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Garg, Rahul, Padhi, Trilok, Jain, Hemang, Kursuncu, Ugur, Kumaraguru, Ponnurangam
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Garg, Rahul
Padhi, Trilok
Jain, Hemang
Kursuncu, Ugur
Kumaraguru, Ponnurangam
description Toxicity identification in online multimodal environments remains a challenging task due to the complexity of contextual connections across modalities (e.g., textual and visual). In this paper, we propose a novel framework that integrates Knowledge Distillation (KD) from Large Visual Language Models (LVLMs) and knowledge infusion to enhance the performance of toxicity detection in hateful memes. Our approach extracts sub-knowledge graphs from ConceptNet, a large-scale commonsense Knowledge Graph (KG) to be infused within a compact VLM framework. The relational context between toxic phrases in captions and memes, as well as visual concepts in memes enhance the model's reasoning capabilities. Experimental results from our study on two hate speech benchmark datasets demonstrate superior performance over the state-of-the-art baselines across AU-ROC, F1, and Recall with improvements of 1.1%, 7%, and 35%, respectively. Given the contextual complexity of the toxicity detection task, our approach showcases the significance of learning from both explicit (i.e. KG) as well as implicit (i.e. LVLMs) contextual cues incorporated through a hybrid neurosymbolic approach. This is crucial for real-world applications where accurate and scalable recognition of toxic content is critical for creating safer online environments.
doi_str_mv 10.48550/arxiv.2411.12174
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2411_12174</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2411_12174</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2411_121743</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE01DM0MjQ34WQI8CotLlHw9nRx8fSzUvDOyy_PSU1JT1XwzEsrLc7Mz1NIzEtRcMksLsnMyUksAQmk5RcpuKSWpCaDeflpCp5-KanJqXklCr6puanFPAysaYk5xam8UJqbQd7NNcTZQxdsd3xBUWZuYlFlPMgN8WA3GBNWAQBYSTtm</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Just KIDDIN: Knowledge Infusion and Distillation for Detection of INdecent Memes</title><source>arXiv.org</source><creator>Garg, Rahul ; Padhi, Trilok ; Jain, Hemang ; Kursuncu, Ugur ; Kumaraguru, Ponnurangam</creator><creatorcontrib>Garg, Rahul ; Padhi, Trilok ; Jain, Hemang ; Kursuncu, Ugur ; Kumaraguru, Ponnurangam</creatorcontrib><description>Toxicity identification in online multimodal environments remains a challenging task due to the complexity of contextual connections across modalities (e.g., textual and visual). In this paper, we propose a novel framework that integrates Knowledge Distillation (KD) from Large Visual Language Models (LVLMs) and knowledge infusion to enhance the performance of toxicity detection in hateful memes. Our approach extracts sub-knowledge graphs from ConceptNet, a large-scale commonsense Knowledge Graph (KG) to be infused within a compact VLM framework. The relational context between toxic phrases in captions and memes, as well as visual concepts in memes enhance the model's reasoning capabilities. Experimental results from our study on two hate speech benchmark datasets demonstrate superior performance over the state-of-the-art baselines across AU-ROC, F1, and Recall with improvements of 1.1%, 7%, and 35%, respectively. Given the contextual complexity of the toxicity detection task, our approach showcases the significance of learning from both explicit (i.e. KG) as well as implicit (i.e. LVLMs) contextual cues incorporated through a hybrid neurosymbolic approach. This is crucial for real-world applications where accurate and scalable recognition of toxic content is critical for creating safer online environments.</description><identifier>DOI: 10.48550/arxiv.2411.12174</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computation and Language ; Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Learning</subject><creationdate>2024-11</creationdate><rights>http://creativecommons.org/licenses/by-nc-sa/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2411.12174$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2411.12174$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Garg, Rahul</creatorcontrib><creatorcontrib>Padhi, Trilok</creatorcontrib><creatorcontrib>Jain, Hemang</creatorcontrib><creatorcontrib>Kursuncu, Ugur</creatorcontrib><creatorcontrib>Kumaraguru, Ponnurangam</creatorcontrib><title>Just KIDDIN: Knowledge Infusion and Distillation for Detection of INdecent Memes</title><description>Toxicity identification in online multimodal environments remains a challenging task due to the complexity of contextual connections across modalities (e.g., textual and visual). In this paper, we propose a novel framework that integrates Knowledge Distillation (KD) from Large Visual Language Models (LVLMs) and knowledge infusion to enhance the performance of toxicity detection in hateful memes. Our approach extracts sub-knowledge graphs from ConceptNet, a large-scale commonsense Knowledge Graph (KG) to be infused within a compact VLM framework. The relational context between toxic phrases in captions and memes, as well as visual concepts in memes enhance the model's reasoning capabilities. Experimental results from our study on two hate speech benchmark datasets demonstrate superior performance over the state-of-the-art baselines across AU-ROC, F1, and Recall with improvements of 1.1%, 7%, and 35%, respectively. Given the contextual complexity of the toxicity detection task, our approach showcases the significance of learning from both explicit (i.e. KG) as well as implicit (i.e. LVLMs) contextual cues incorporated through a hybrid neurosymbolic approach. This is crucial for real-world applications where accurate and scalable recognition of toxic content is critical for creating safer online environments.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE01DM0MjQ34WQI8CotLlHw9nRx8fSzUvDOyy_PSU1JT1XwzEsrLc7Mz1NIzEtRcMksLsnMyUksAQmk5RcpuKSWpCaDeflpCp5-KanJqXklCr6puanFPAysaYk5xam8UJqbQd7NNcTZQxdsd3xBUWZuYlFlPMgN8WA3GBNWAQBYSTtm</recordid><startdate>20241118</startdate><enddate>20241118</enddate><creator>Garg, Rahul</creator><creator>Padhi, Trilok</creator><creator>Jain, Hemang</creator><creator>Kursuncu, Ugur</creator><creator>Kumaraguru, Ponnurangam</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241118</creationdate><title>Just KIDDIN: Knowledge Infusion and Distillation for Detection of INdecent Memes</title><author>Garg, Rahul ; Padhi, Trilok ; Jain, Hemang ; Kursuncu, Ugur ; Kumaraguru, Ponnurangam</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2411_121743</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Garg, Rahul</creatorcontrib><creatorcontrib>Padhi, Trilok</creatorcontrib><creatorcontrib>Jain, Hemang</creatorcontrib><creatorcontrib>Kursuncu, Ugur</creatorcontrib><creatorcontrib>Kumaraguru, Ponnurangam</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Garg, Rahul</au><au>Padhi, Trilok</au><au>Jain, Hemang</au><au>Kursuncu, Ugur</au><au>Kumaraguru, Ponnurangam</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Just KIDDIN: Knowledge Infusion and Distillation for Detection of INdecent Memes</atitle><date>2024-11-18</date><risdate>2024</risdate><abstract>Toxicity identification in online multimodal environments remains a challenging task due to the complexity of contextual connections across modalities (e.g., textual and visual). In this paper, we propose a novel framework that integrates Knowledge Distillation (KD) from Large Visual Language Models (LVLMs) and knowledge infusion to enhance the performance of toxicity detection in hateful memes. Our approach extracts sub-knowledge graphs from ConceptNet, a large-scale commonsense Knowledge Graph (KG) to be infused within a compact VLM framework. The relational context between toxic phrases in captions and memes, as well as visual concepts in memes enhance the model's reasoning capabilities. Experimental results from our study on two hate speech benchmark datasets demonstrate superior performance over the state-of-the-art baselines across AU-ROC, F1, and Recall with improvements of 1.1%, 7%, and 35%, respectively. Given the contextual complexity of the toxicity detection task, our approach showcases the significance of learning from both explicit (i.e. KG) as well as implicit (i.e. LVLMs) contextual cues incorporated through a hybrid neurosymbolic approach. This is crucial for real-world applications where accurate and scalable recognition of toxic content is critical for creating safer online environments.</abstract><doi>10.48550/arxiv.2411.12174</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2411.12174
ispartof
issn
language eng
recordid cdi_arxiv_primary_2411_12174
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Computation and Language
Computer Science - Computer Vision and Pattern Recognition
Computer Science - Learning
title Just KIDDIN: Knowledge Infusion and Distillation for Detection of INdecent Memes
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T14%3A37%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Just%20KIDDIN:%20Knowledge%20Infusion%20and%20Distillation%20for%20Detection%20of%20INdecent%20Memes&rft.au=Garg,%20Rahul&rft.date=2024-11-18&rft_id=info:doi/10.48550/arxiv.2411.12174&rft_dat=%3Carxiv_GOX%3E2411_12174%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true