Creating a Multimodal Dataset of Images and Text to Study Abusive Language

In order to study online hate speech, the availability of datasets containing the linguistic phenomena of interest are of crucial importance. However, when it comes to specific target groups, for example teenagers, collecting such data may be problematic due to issues with consent and privacy restri...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Aprosio, Alessio Palmero, Menini, Stefano, Tonelli, Sara
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Aprosio, Alessio Palmero
Menini, Stefano
Tonelli, Sara
description In order to study online hate speech, the availability of datasets containing the linguistic phenomena of interest are of crucial importance. However, when it comes to specific target groups, for example teenagers, collecting such data may be problematic due to issues with consent and privacy restrictions. Furthermore, while text-only datasets of this kind have been widely used, limitations set by image-based social media platforms like Instagram make it difficult for researchers to experiment with multimodal hate speech data. We therefore developed CREENDER, an annotation tool that has been used in school classes to create a multimodal dataset of images and abusive comments, which we make freely available under Apache 2.0 license. The corpus, with Italian comments, has been analysed from different perspectives, to investigate whether the subject of the images plays a role in triggering a comment. We find that users judge the same images in different ways, although the presence of a person in the picture increases the probability to get an offensive comment.
doi_str_mv 10.48550/arxiv.2005.02235
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2005_02235</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2005_02235</sourcerecordid><originalsourceid>FETCH-LOGICAL-a675-13da0b36639d4d2d341ab06ec35a3e6a164e764dbad656a7e5c41851181555423</originalsourceid><addsrcrecordid>eNotz7tOw0AQheFtUqCEB6BiXsBmb7M2ZWRuQY4ocG-NsxNrJV-QvY6StwcC1Wl-HekT4k7J1OaI8oGmczilWkpMpdYGb8R7MTHFMLRAsF-6GPrRUwdPFGnmCOMRdj21PAMNHio-R4gjfMbFX2DbLHM4MZQ0tMtPsxGrI3Uz3_7vWlQvz1XxlpQfr7tiWybkMkyU8SQb45x59NZrb6yiRjo-GCTDjpSznDnrG_IOHWWMB6tyVCpXiGi1WYv7v9srpv6aQk_Tpf5F1VeU-QaSdkXR</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Creating a Multimodal Dataset of Images and Text to Study Abusive Language</title><source>arXiv.org</source><creator>Aprosio, Alessio Palmero ; Menini, Stefano ; Tonelli, Sara</creator><creatorcontrib>Aprosio, Alessio Palmero ; Menini, Stefano ; Tonelli, Sara</creatorcontrib><description>In order to study online hate speech, the availability of datasets containing the linguistic phenomena of interest are of crucial importance. However, when it comes to specific target groups, for example teenagers, collecting such data may be problematic due to issues with consent and privacy restrictions. Furthermore, while text-only datasets of this kind have been widely used, limitations set by image-based social media platforms like Instagram make it difficult for researchers to experiment with multimodal hate speech data. We therefore developed CREENDER, an annotation tool that has been used in school classes to create a multimodal dataset of images and abusive comments, which we make freely available under Apache 2.0 license. The corpus, with Italian comments, has been analysed from different perspectives, to investigate whether the subject of the images plays a role in triggering a comment. We find that users judge the same images in different ways, although the presence of a person in the picture increases the probability to get an offensive comment.</description><identifier>DOI: 10.48550/arxiv.2005.02235</identifier><language>eng</language><subject>Computer Science - Computation and Language</subject><creationdate>2020-05</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,777,882</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2005.02235$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2005.02235$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Aprosio, Alessio Palmero</creatorcontrib><creatorcontrib>Menini, Stefano</creatorcontrib><creatorcontrib>Tonelli, Sara</creatorcontrib><title>Creating a Multimodal Dataset of Images and Text to Study Abusive Language</title><description>In order to study online hate speech, the availability of datasets containing the linguistic phenomena of interest are of crucial importance. However, when it comes to specific target groups, for example teenagers, collecting such data may be problematic due to issues with consent and privacy restrictions. Furthermore, while text-only datasets of this kind have been widely used, limitations set by image-based social media platforms like Instagram make it difficult for researchers to experiment with multimodal hate speech data. We therefore developed CREENDER, an annotation tool that has been used in school classes to create a multimodal dataset of images and abusive comments, which we make freely available under Apache 2.0 license. The corpus, with Italian comments, has been analysed from different perspectives, to investigate whether the subject of the images plays a role in triggering a comment. We find that users judge the same images in different ways, although the presence of a person in the picture increases the probability to get an offensive comment.</description><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz7tOw0AQheFtUqCEB6BiXsBmb7M2ZWRuQY4ocG-NsxNrJV-QvY6StwcC1Wl-HekT4k7J1OaI8oGmczilWkpMpdYGb8R7MTHFMLRAsF-6GPrRUwdPFGnmCOMRdj21PAMNHio-R4gjfMbFX2DbLHM4MZQ0tMtPsxGrI3Uz3_7vWlQvz1XxlpQfr7tiWybkMkyU8SQb45x59NZrb6yiRjo-GCTDjpSznDnrG_IOHWWMB6tyVCpXiGi1WYv7v9srpv6aQk_Tpf5F1VeU-QaSdkXR</recordid><startdate>20200505</startdate><enddate>20200505</enddate><creator>Aprosio, Alessio Palmero</creator><creator>Menini, Stefano</creator><creator>Tonelli, Sara</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20200505</creationdate><title>Creating a Multimodal Dataset of Images and Text to Study Abusive Language</title><author>Aprosio, Alessio Palmero ; Menini, Stefano ; Tonelli, Sara</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a675-13da0b36639d4d2d341ab06ec35a3e6a164e764dbad656a7e5c41851181555423</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Aprosio, Alessio Palmero</creatorcontrib><creatorcontrib>Menini, Stefano</creatorcontrib><creatorcontrib>Tonelli, Sara</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Aprosio, Alessio Palmero</au><au>Menini, Stefano</au><au>Tonelli, Sara</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Creating a Multimodal Dataset of Images and Text to Study Abusive Language</atitle><date>2020-05-05</date><risdate>2020</risdate><abstract>In order to study online hate speech, the availability of datasets containing the linguistic phenomena of interest are of crucial importance. However, when it comes to specific target groups, for example teenagers, collecting such data may be problematic due to issues with consent and privacy restrictions. Furthermore, while text-only datasets of this kind have been widely used, limitations set by image-based social media platforms like Instagram make it difficult for researchers to experiment with multimodal hate speech data. We therefore developed CREENDER, an annotation tool that has been used in school classes to create a multimodal dataset of images and abusive comments, which we make freely available under Apache 2.0 license. The corpus, with Italian comments, has been analysed from different perspectives, to investigate whether the subject of the images plays a role in triggering a comment. We find that users judge the same images in different ways, although the presence of a person in the picture increases the probability to get an offensive comment.</abstract><doi>10.48550/arxiv.2005.02235</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2005.02235
ispartof
issn
language eng
recordid cdi_arxiv_primary_2005_02235
source arXiv.org
subjects Computer Science - Computation and Language
title Creating a Multimodal Dataset of Images and Text to Study Abusive Language
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T15%3A26%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Creating%20a%20Multimodal%20Dataset%20of%20Images%20and%20Text%20to%20Study%20Abusive%20Language&rft.au=Aprosio,%20Alessio%20Palmero&rft.date=2020-05-05&rft_id=info:doi/10.48550/arxiv.2005.02235&rft_dat=%3Carxiv_GOX%3E2005_02235%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true