Leveraging Self-Supervised Learning for Scene Classification in Child Sexual Abuse Imagery
Places8. We introduce a new subset of Places — called Places8 — where classes are selected to highlight environments most common in Child Sexual Abuse Imagery (CSAI). This is a smaller dataset than the ones used for the pretext task; it represents our downstream task and is used for fine-tuning the...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Dataset |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | H. V. Valois, Pedro Macedo, João Sampaio Ferraz Ribeiro, Leo dos Santos, Jefersson Avila, Sandra |
description | Places8. We introduce a new subset of Places — called Places8 — where classes are selected to highlight environments most common in Child Sexual Abuse Imagery (CSAI). This is a smaller dataset than the ones used for the pretext task; it represents our downstream task and is used for fine-tuning the model post self-supervised learning.
Places365-Challenge indoor classes were initially grouped from 159 to 62 new categories following WordNet synonyms and sometimes direct hyponyms or related words. For example, bedroom and bedchamber were joined, while child room was kept in a separate category given its importance in CSAI investigation. Next, we filtered the remapped dataset into 8 final classes from 23 different scenes of Places365 Challenge. The selection of such scenes followed conversations with the partner Brazilian Federal Police agents and CSAI investigation and labeling experts. Places365-Challenge already provides training and validation splits mapped accordingly. The test split was then generated from a stratified 10% split from the training set, given that the remapping and filtering made for a highly imbalanced dataset. The complete remapping can be seen in table under "Original Categories" and further details for the novel sub-set.
Table. Description of the Places8 dataset. The class represents the final label used, while the original categories stand for the original Places365 labels. Places365 already provides training and validation splits mapped accordingly. The test set comes from a stratified 10% split from the training set.
Class
Test
Train
Val
%
Original Categories
bathroom
5,740
51,655
200
13.4
bathroom, shower
bedroom
11,112
100,012
600
25.9
bedchamber, bedroom, hotel room, berth, dorm room, youth hostel
child's room
4,650
41,849
300
10.8
child's room, nursery, playroom
classroom
3,751
33,763
200
8.7
classroom, kindergarden classroom
dressing room
2,432
21,889
200
5.7
closet, dressing room
living room
9,940
89,458
500
28.7
home theater, living room, recreation room, television room, waiting room
studio
1,404
12,633
100
3.3
television studio
swimming pool
1,505
13,547
200
3.5
jacuzzi, swimming pool
Total
40,534
364,806
2300
100
As it is not possible to provide the images from the Places8 dataset, we provide the original image names, class names, and splits (training, validation, and test). To use Places8, you must download the images from the Places365-Challenge.
Out-of-Distribution (OOD) Scenes. While the introduced Places8 already |
doi_str_mv | 10.5281/zenodo.13910525 |
format | Dataset |
fullrecord | <record><control><sourceid>datacite_PQ8</sourceid><recordid>TN_cdi_datacite_primary_10_5281_zenodo_13910525</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_5281_zenodo_13910525</sourcerecordid><originalsourceid>FETCH-datacite_primary_10_5281_zenodo_139105253</originalsourceid><addsrcrecordid>eNqVzr0KwjAUhuEsDqLOrrmB_qRS0FGKotCtTi7h2JzUA2lakrZYr16LegFO3_DxwsPYWsRhmmxF9ETbqCYUm52I0ySds2uOAzqoyFa8QKODom_RDeRR8RzB2enQjeNFiRZ5ZsB70lRCR43lZHl2J6Pe6aMHw_e33iM_11ChG5dspsF4XH13waLj4ZKdAgUdlNShbB3V4EYpYjnp5Ecnf7rN_8ULWIhKsg</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>dataset</recordtype></control><display><type>dataset</type><title>Leveraging Self-Supervised Learning for Scene Classification in Child Sexual Abuse Imagery</title><source>DataCite</source><creator>H. V. Valois, Pedro ; Macedo, João ; Sampaio Ferraz Ribeiro, Leo ; dos Santos, Jefersson ; Avila, Sandra</creator><creatorcontrib>H. V. Valois, Pedro ; Macedo, João ; Sampaio Ferraz Ribeiro, Leo ; dos Santos, Jefersson ; Avila, Sandra</creatorcontrib><description>Places8. We introduce a new subset of Places — called Places8 — where classes are selected to highlight environments most common in Child Sexual Abuse Imagery (CSAI). This is a smaller dataset than the ones used for the pretext task; it represents our downstream task and is used for fine-tuning the model post self-supervised learning.
Places365-Challenge indoor classes were initially grouped from 159 to 62 new categories following WordNet synonyms and sometimes direct hyponyms or related words. For example, bedroom and bedchamber were joined, while child room was kept in a separate category given its importance in CSAI investigation. Next, we filtered the remapped dataset into 8 final classes from 23 different scenes of Places365 Challenge. The selection of such scenes followed conversations with the partner Brazilian Federal Police agents and CSAI investigation and labeling experts. Places365-Challenge already provides training and validation splits mapped accordingly. The test split was then generated from a stratified 10% split from the training set, given that the remapping and filtering made for a highly imbalanced dataset. The complete remapping can be seen in table under "Original Categories" and further details for the novel sub-set.
Table. Description of the Places8 dataset. The class represents the final label used, while the original categories stand for the original Places365 labels. Places365 already provides training and validation splits mapped accordingly. The test set comes from a stratified 10% split from the training set.
Class
Test
Train
Val
%
Original Categories
bathroom
5,740
51,655
200
13.4
bathroom, shower
bedroom
11,112
100,012
600
25.9
bedchamber, bedroom, hotel room, berth, dorm room, youth hostel
child's room
4,650
41,849
300
10.8
child's room, nursery, playroom
classroom
3,751
33,763
200
8.7
classroom, kindergarden classroom
dressing room
2,432
21,889
200
5.7
closet, dressing room
living room
9,940
89,458
500
28.7
home theater, living room, recreation room, television room, waiting room
studio
1,404
12,633
100
3.3
television studio
swimming pool
1,505
13,547
200
3.5
jacuzzi, swimming pool
Total
40,534
364,806
2300
100
As it is not possible to provide the images from the Places8 dataset, we provide the original image names, class names, and splits (training, validation, and test). To use Places8, you must download the images from the Places365-Challenge.
Out-of-Distribution (OOD) Scenes. While the introduced Places8 already comprises a test set, we sought to create an additional evaluation set to understand better our approach's limitations when exposed to a domain gap. This is especially necessary when we consider that CSAI is known to come from diverse demographics and social backgrounds.
Thus, we designed a small "custom dataset" from online images to check if the model performance is outside of the controlled nature of Places8. The dataset comprises 80 images, 10 images per class from the 8 Places8 classes: bathroom, bedroom, child's room, classroom, dressing room, living room, studio, and swimming pool.
The OOD Scenes set is a sample of images taken~from Google images, Bing images, and the Dollar Street dataset in a 4:3:3 ratio. All images are free to share, modify, and use, including Dollar Street, licensed under CC-BY 4.0 Commercial.
Dollar Street is an annotated image dataset of 289 everyday household items photographed from 404 homes in 63 countries worldwide. It contains 38,479 pictures, split among abstractions (image answers for abstract questions), objects, and places within a home. This dataset explicitly depicts underrepresented populations and is grouped by country and income. Not all countries are present, but there is a balanced amount of pictures per region, and most images come from families who live with less than USD $1000 per month.
These sources were chosen to contrast the Places dataset data from the web with underrepresented data.</description><identifier>DOI: 10.5281/zenodo.13910525</identifier><language>eng</language><publisher>Zenodo</publisher><subject>image classification ; scence recognition ; scene classification</subject><creationdate>2024</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0001-9068-938X ; 0000-0003-1781-2630 ; 0000-0002-8889-1586</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,1892</link.rule.ids><linktorsrc>$$Uhttps://commons.datacite.org/doi.org/10.5281/zenodo.13910525$$EView_record_in_DataCite.org$$FView_record_in_$$GDataCite.org$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>H. V. Valois, Pedro</creatorcontrib><creatorcontrib>Macedo, João</creatorcontrib><creatorcontrib>Sampaio Ferraz Ribeiro, Leo</creatorcontrib><creatorcontrib>dos Santos, Jefersson</creatorcontrib><creatorcontrib>Avila, Sandra</creatorcontrib><title>Leveraging Self-Supervised Learning for Scene Classification in Child Sexual Abuse Imagery</title><description>Places8. We introduce a new subset of Places — called Places8 — where classes are selected to highlight environments most common in Child Sexual Abuse Imagery (CSAI). This is a smaller dataset than the ones used for the pretext task; it represents our downstream task and is used for fine-tuning the model post self-supervised learning.
Places365-Challenge indoor classes were initially grouped from 159 to 62 new categories following WordNet synonyms and sometimes direct hyponyms or related words. For example, bedroom and bedchamber were joined, while child room was kept in a separate category given its importance in CSAI investigation. Next, we filtered the remapped dataset into 8 final classes from 23 different scenes of Places365 Challenge. The selection of such scenes followed conversations with the partner Brazilian Federal Police agents and CSAI investigation and labeling experts. Places365-Challenge already provides training and validation splits mapped accordingly. The test split was then generated from a stratified 10% split from the training set, given that the remapping and filtering made for a highly imbalanced dataset. The complete remapping can be seen in table under "Original Categories" and further details for the novel sub-set.
Table. Description of the Places8 dataset. The class represents the final label used, while the original categories stand for the original Places365 labels. Places365 already provides training and validation splits mapped accordingly. The test set comes from a stratified 10% split from the training set.
Class
Test
Train
Val
%
Original Categories
bathroom
5,740
51,655
200
13.4
bathroom, shower
bedroom
11,112
100,012
600
25.9
bedchamber, bedroom, hotel room, berth, dorm room, youth hostel
child's room
4,650
41,849
300
10.8
child's room, nursery, playroom
classroom
3,751
33,763
200
8.7
classroom, kindergarden classroom
dressing room
2,432
21,889
200
5.7
closet, dressing room
living room
9,940
89,458
500
28.7
home theater, living room, recreation room, television room, waiting room
studio
1,404
12,633
100
3.3
television studio
swimming pool
1,505
13,547
200
3.5
jacuzzi, swimming pool
Total
40,534
364,806
2300
100
As it is not possible to provide the images from the Places8 dataset, we provide the original image names, class names, and splits (training, validation, and test). To use Places8, you must download the images from the Places365-Challenge.
Out-of-Distribution (OOD) Scenes. While the introduced Places8 already comprises a test set, we sought to create an additional evaluation set to understand better our approach's limitations when exposed to a domain gap. This is especially necessary when we consider that CSAI is known to come from diverse demographics and social backgrounds.
Thus, we designed a small "custom dataset" from online images to check if the model performance is outside of the controlled nature of Places8. The dataset comprises 80 images, 10 images per class from the 8 Places8 classes: bathroom, bedroom, child's room, classroom, dressing room, living room, studio, and swimming pool.
The OOD Scenes set is a sample of images taken~from Google images, Bing images, and the Dollar Street dataset in a 4:3:3 ratio. All images are free to share, modify, and use, including Dollar Street, licensed under CC-BY 4.0 Commercial.
Dollar Street is an annotated image dataset of 289 everyday household items photographed from 404 homes in 63 countries worldwide. It contains 38,479 pictures, split among abstractions (image answers for abstract questions), objects, and places within a home. This dataset explicitly depicts underrepresented populations and is grouped by country and income. Not all countries are present, but there is a balanced amount of pictures per region, and most images come from families who live with less than USD $1000 per month.
These sources were chosen to contrast the Places dataset data from the web with underrepresented data.</description><subject>image classification</subject><subject>scence recognition</subject><subject>scene classification</subject><fulltext>true</fulltext><rsrctype>dataset</rsrctype><creationdate>2024</creationdate><recordtype>dataset</recordtype><sourceid>PQ8</sourceid><recordid>eNqVzr0KwjAUhuEsDqLOrrmB_qRS0FGKotCtTi7h2JzUA2lakrZYr16LegFO3_DxwsPYWsRhmmxF9ETbqCYUm52I0ySds2uOAzqoyFa8QKODom_RDeRR8RzB2enQjeNFiRZ5ZsB70lRCR43lZHl2J6Pe6aMHw_e33iM_11ChG5dspsF4XH13waLj4ZKdAgUdlNShbB3V4EYpYjnp5Ecnf7rN_8ULWIhKsg</recordid><startdate>20241010</startdate><enddate>20241010</enddate><creator>H. V. Valois, Pedro</creator><creator>Macedo, João</creator><creator>Sampaio Ferraz Ribeiro, Leo</creator><creator>dos Santos, Jefersson</creator><creator>Avila, Sandra</creator><general>Zenodo</general><scope>DYCCY</scope><scope>PQ8</scope><orcidid>https://orcid.org/0000-0001-9068-938X</orcidid><orcidid>https://orcid.org/0000-0003-1781-2630</orcidid><orcidid>https://orcid.org/0000-0002-8889-1586</orcidid></search><sort><creationdate>20241010</creationdate><title>Leveraging Self-Supervised Learning for Scene Classification in Child Sexual Abuse Imagery</title><author>H. V. Valois, Pedro ; Macedo, João ; Sampaio Ferraz Ribeiro, Leo ; dos Santos, Jefersson ; Avila, Sandra</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-datacite_primary_10_5281_zenodo_139105253</frbrgroupid><rsrctype>datasets</rsrctype><prefilter>datasets</prefilter><language>eng</language><creationdate>2024</creationdate><topic>image classification</topic><topic>scence recognition</topic><topic>scene classification</topic><toplevel>online_resources</toplevel><creatorcontrib>H. V. Valois, Pedro</creatorcontrib><creatorcontrib>Macedo, João</creatorcontrib><creatorcontrib>Sampaio Ferraz Ribeiro, Leo</creatorcontrib><creatorcontrib>dos Santos, Jefersson</creatorcontrib><creatorcontrib>Avila, Sandra</creatorcontrib><collection>DataCite (Open Access)</collection><collection>DataCite</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>H. V. Valois, Pedro</au><au>Macedo, João</au><au>Sampaio Ferraz Ribeiro, Leo</au><au>dos Santos, Jefersson</au><au>Avila, Sandra</au><format>book</format><genre>unknown</genre><ristype>DATA</ristype><title>Leveraging Self-Supervised Learning for Scene Classification in Child Sexual Abuse Imagery</title><date>2024-10-10</date><risdate>2024</risdate><abstract>Places8. We introduce a new subset of Places — called Places8 — where classes are selected to highlight environments most common in Child Sexual Abuse Imagery (CSAI). This is a smaller dataset than the ones used for the pretext task; it represents our downstream task and is used for fine-tuning the model post self-supervised learning.
Places365-Challenge indoor classes were initially grouped from 159 to 62 new categories following WordNet synonyms and sometimes direct hyponyms or related words. For example, bedroom and bedchamber were joined, while child room was kept in a separate category given its importance in CSAI investigation. Next, we filtered the remapped dataset into 8 final classes from 23 different scenes of Places365 Challenge. The selection of such scenes followed conversations with the partner Brazilian Federal Police agents and CSAI investigation and labeling experts. Places365-Challenge already provides training and validation splits mapped accordingly. The test split was then generated from a stratified 10% split from the training set, given that the remapping and filtering made for a highly imbalanced dataset. The complete remapping can be seen in table under "Original Categories" and further details for the novel sub-set.
Table. Description of the Places8 dataset. The class represents the final label used, while the original categories stand for the original Places365 labels. Places365 already provides training and validation splits mapped accordingly. The test set comes from a stratified 10% split from the training set.
Class
Test
Train
Val
%
Original Categories
bathroom
5,740
51,655
200
13.4
bathroom, shower
bedroom
11,112
100,012
600
25.9
bedchamber, bedroom, hotel room, berth, dorm room, youth hostel
child's room
4,650
41,849
300
10.8
child's room, nursery, playroom
classroom
3,751
33,763
200
8.7
classroom, kindergarden classroom
dressing room
2,432
21,889
200
5.7
closet, dressing room
living room
9,940
89,458
500
28.7
home theater, living room, recreation room, television room, waiting room
studio
1,404
12,633
100
3.3
television studio
swimming pool
1,505
13,547
200
3.5
jacuzzi, swimming pool
Total
40,534
364,806
2300
100
As it is not possible to provide the images from the Places8 dataset, we provide the original image names, class names, and splits (training, validation, and test). To use Places8, you must download the images from the Places365-Challenge.
Out-of-Distribution (OOD) Scenes. While the introduced Places8 already comprises a test set, we sought to create an additional evaluation set to understand better our approach's limitations when exposed to a domain gap. This is especially necessary when we consider that CSAI is known to come from diverse demographics and social backgrounds.
Thus, we designed a small "custom dataset" from online images to check if the model performance is outside of the controlled nature of Places8. The dataset comprises 80 images, 10 images per class from the 8 Places8 classes: bathroom, bedroom, child's room, classroom, dressing room, living room, studio, and swimming pool.
The OOD Scenes set is a sample of images taken~from Google images, Bing images, and the Dollar Street dataset in a 4:3:3 ratio. All images are free to share, modify, and use, including Dollar Street, licensed under CC-BY 4.0 Commercial.
Dollar Street is an annotated image dataset of 289 everyday household items photographed from 404 homes in 63 countries worldwide. It contains 38,479 pictures, split among abstractions (image answers for abstract questions), objects, and places within a home. This dataset explicitly depicts underrepresented populations and is grouped by country and income. Not all countries are present, but there is a balanced amount of pictures per region, and most images come from families who live with less than USD $1000 per month.
These sources were chosen to contrast the Places dataset data from the web with underrepresented data.</abstract><pub>Zenodo</pub><doi>10.5281/zenodo.13910525</doi><orcidid>https://orcid.org/0000-0001-9068-938X</orcidid><orcidid>https://orcid.org/0000-0003-1781-2630</orcidid><orcidid>https://orcid.org/0000-0002-8889-1586</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.5281/zenodo.13910525 |
ispartof | |
issn | |
language | eng |
recordid | cdi_datacite_primary_10_5281_zenodo_13910525 |
source | DataCite |
subjects | image classification scence recognition scene classification |
title | Leveraging Self-Supervised Learning for Scene Classification in Child Sexual Abuse Imagery |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T16%3A22%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-datacite_PQ8&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=unknown&rft.au=H.%20V.%20Valois,%20Pedro&rft.date=2024-10-10&rft_id=info:doi/10.5281/zenodo.13910525&rft_dat=%3Cdatacite_PQ8%3E10_5281_zenodo_13910525%3C/datacite_PQ8%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |