Creating a Multimodal Dataset of Images and Text to Study Abusive Language

In order to study online hate speech, the availability of datasets containing the linguistic phenomena of interest are of crucial importance. However, when it comes to specific target groups, for example teenagers, collecting such data may be problematic due to issues with consent and privacy restri...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Aprosio, Alessio Palmero, Menini, Stefano, Tonelli, Sara
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computation and Language
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Aprosio, Alessio Palmero Menini, Stefano Tonelli, Sara
description	In order to study online hate speech, the availability of datasets containing the linguistic phenomena of interest are of crucial importance. However, when it comes to specific target groups, for example teenagers, collecting such data may be problematic due to issues with consent and privacy restrictions. Furthermore, while text-only datasets of this kind have been widely used, limitations set by image-based social media platforms like Instagram make it difficult for researchers to experiment with multimodal hate speech data. We therefore developed CREENDER, an annotation tool that has been used in school classes to create a multimodal dataset of images and abusive comments, which we make freely available under Apache 2.0 license. The corpus, with Italian comments, has been analysed from different perspectives, to investigate whether the subject of the images plays a role in triggering a comment. We find that users judge the same images in different ways, although the presence of a person in the picture increases the probability to get an offensive comment.
doi_str_mv	10.48550/arxiv.2005.02235
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2005_02235</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2005_02235</sourcerecordid><originalsourceid>FETCH-LOGICAL-a675-13da0b36639d4d2d341ab06ec35a3e6a164e764dbad656a7e5c41851181555423</originalsourceid><addsrcrecordid>eNotz7tOw0AQheFtUqCEB6BiXsBmb7M2ZWRuQY4ocG-NsxNrJV-QvY6StwcC1Wl-HekT4k7J1OaI8oGmczilWkpMpdYGb8R7MTHFMLRAsF-6GPrRUwdPFGnmCOMRdj21PAMNHio-R4gjfMbFX2DbLHM4MZQ0tMtPsxGrI3Uz3_7vWlQvz1XxlpQfr7tiWybkMkyU8SQb45x59NZrb6yiRjo-GCTDjpSznDnrG_IOHWWMB6tyVCpXiGi1WYv7v9srpv6aQk_Tpf5F1VeU-QaSdkXR</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Creating a Multimodal Dataset of Images and Text to Study Abusive Language</title><source>arXiv.org</source><creator>Aprosio, Alessio Palmero ; Menini, Stefano ; Tonelli, Sara</creator><creatorcontrib>Aprosio, Alessio Palmero ; Menini, Stefano ; Tonelli, Sara</creatorcontrib><description>In order to study online hate speech, the availability of datasets containing the linguistic phenomena of interest are of crucial importance. However, when it comes to specific target groups, for example teenagers, collecting such data may be problematic due to issues with consent and privacy restrictions. Furthermore, while text-only datasets of this kind have been widely used, limitations set by image-based social media platforms like Instagram make it difficult for researchers to experiment with multimodal hate speech data. We therefore developed CREENDER, an annotation tool that has been used in school classes to create a multimodal dataset of images and abusive comments, which we make freely available under Apache 2.0 license. The corpus, with Italian comments, has been analysed from different perspectives, to investigate whether the subject of the images plays a role in triggering a comment. We find that users judge the same images in different ways, although the presence of a person in the picture increases the probability to get an offensive comment.</description><identifier>DOI: 10.48550/arxiv.2005.02235</identifier><language>eng</language><subject>Computer Science - Computation and Language</subject><creationdate>2020-05</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,777,882</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2005.02235$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2005.02235$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Aprosio, Alessio Palmero</creatorcontrib><creatorcontrib>Menini, Stefano</creatorcontrib><creatorcontrib>Tonelli, Sara</creatorcontrib><title>Creating a Multimodal Dataset of Images and Text to Study Abusive Language</title><description>In order to study online hate speech, the availability of datasets containing the linguistic phenomena of interest are of crucial importance. However, when it comes to specific target groups, for example teenagers, collecting such data may be problematic due to issues with consent and privacy restrictions. Furthermore, while text-only datasets of this kind have been widely used, limitations set by image-based social media platforms like Instagram make it difficult for researchers to experiment with multimodal hate speech data. We therefore developed CREENDER, an annotation tool that has been used in school classes to create a multimodal dataset of images and abusive comments, which we make freely available under Apache 2.0 license. The corpus, with Italian comments, has been analysed from different perspectives, to investigate whether the subject of the images plays a role in triggering a comment. We find that users judge the same images in different ways, although the presence of a person in the picture increases the probability to get an offensive comment.</description><subject>Computer Science - Computation and Language</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz7tOw0AQheFtUqCEB6BiXsBmb7M2ZWRuQY4ocG-NsxNrJV-QvY6StwcC1Wl-HekT4k7J1OaI8oGmczilWkpMpdYGb8R7MTHFMLRAsF-6GPrRUwdPFGnmCOMRdj21PAMNHio-R4gjfMbFX2DbLHM4MZQ0tMtPsxGrI3Uz3_7vWlQvz1XxlpQfr7tiWybkMkyU8SQb45x59NZrb6yiRjo-GCTDjpSznDnrG_IOHWWMB6tyVCpXiGi1WYv7v9srpv6aQk_Tpf5F1VeU-QaSdkXR</recordid><startdate>20200505</startdate><enddate>20200505</enddate><creator>Aprosio, Alessio Palmero</creator><creator>Menini, Stefano</creator><creator>Tonelli, Sara</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20200505</creationdate><title>Creating a Multimodal Dataset of Images and Text to Study Abusive Language</title><author>Aprosio, Alessio Palmero ; Menini, Stefano ; Tonelli, Sara</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a675-13da0b36639d4d2d341ab06ec35a3e6a164e764dbad656a7e5c41851181555423</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Computation and Language</topic><toplevel>online_resources</toplevel><creatorcontrib>Aprosio, Alessio Palmero</creatorcontrib><creatorcontrib>Menini, Stefano</creatorcontrib><creatorcontrib>Tonelli, Sara</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Aprosio, Alessio Palmero</au><au>Menini, Stefano</au><au>Tonelli, Sara</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Creating a Multimodal Dataset of Images and Text to Study Abusive Language</atitle><date>2020-05-05</date><risdate>2020</risdate><abstract>In order to study online hate speech, the availability of datasets containing the linguistic phenomena of interest are of crucial importance. However, when it comes to specific target groups, for example teenagers, collecting such data may be problematic due to issues with consent and privacy restrictions. Furthermore, while text-only datasets of this kind have been widely used, limitations set by image-based social media platforms like Instagram make it difficult for researchers to experiment with multimodal hate speech data. We therefore developed CREENDER, an annotation tool that has been used in school classes to create a multimodal dataset of images and abusive comments, which we make freely available under Apache 2.0 license. The corpus, with Italian comments, has been analysed from different perspectives, to investigate whether the subject of the images plays a role in triggering a comment. We find that users judge the same images in different ways, although the presence of a person in the picture increases the probability to get an offensive comment.</abstract><doi>10.48550/arxiv.2005.02235</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2005.02235
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2005_02235
source	arXiv.org
subjects	Computer Science - Computation and Language
title	Creating a Multimodal Dataset of Images and Text to Study Abusive Language
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T15%3A26%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Creating%20a%20Multimodal%20Dataset%20of%20Images%20and%20Text%20to%20Study%20Abusive%20Language&rft.au=Aprosio,%20Alessio%20Palmero&rft.date=2020-05-05&rft_id=info:doi/10.48550/arxiv.2005.02235&rft_dat=%3Carxiv_GOX%3E2005_02235%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true