Leveraging Self-Supervised Learning for Scene Classification in Child Sexual Abuse Imagery

Places8. We introduce a new subset of Places — called Places8 — where classes are selected to highlight environments most common in Child Sexual Abuse Imagery (CSAI). This is a smaller dataset than the ones used for the pretext task; it represents our downstream task and is used for fine-tuning the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: H. V. Valois, Pedro, Macedo, João, Sampaio Ferraz Ribeiro, Leo, dos Santos, Jefersson, Avila, Sandra
Format: Dataset
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Places8. We introduce a new subset of Places — called Places8 — where classes are selected to highlight environments most common in Child Sexual Abuse Imagery (CSAI). This is a smaller dataset than the ones used for the pretext task; it represents our downstream task and is used for fine-tuning the model post self-supervised learning. Places365-Challenge indoor classes were initially grouped from 159 to 62 new categories following WordNet synonyms and sometimes direct hyponyms or related words. For example, bedroom and bedchamber were joined, while child room was kept in a separate category given its importance in CSAI investigation. Next, we filtered the remapped dataset into 8 final classes from 23 different scenes of Places365 Challenge. The selection of such scenes followed conversations with the partner Brazilian Federal Police agents and CSAI investigation and labeling experts. Places365-Challenge already provides training and validation splits mapped accordingly. The test split was then generated from a stratified 10% split from the training set, given that the remapping and filtering made for a highly imbalanced dataset. The complete remapping can be seen in table under "Original Categories" and further details for the novel sub-set.  Table. Description of the Places8 dataset. The class represents the final label used, while the original categories stand for the original Places365 labels. Places365 already provides training and validation splits mapped accordingly. The test set comes from a stratified 10% split from the training set. Class Test Train Val % Original Categories bathroom 5,740 51,655 200 13.4 bathroom, shower bedroom 11,112 100,012 600 25.9 bedchamber, bedroom, hotel room, berth, dorm room, youth hostel child's room 4,650 41,849 300 10.8 child's room, nursery, playroom classroom 3,751 33,763 200 8.7 classroom, kindergarden classroom dressing room 2,432 21,889 200 5.7 closet, dressing room living room 9,940 89,458 500 28.7 home theater, living room, recreation room, television room, waiting room studio 1,404 12,633 100 3.3 television studio swimming pool 1,505 13,547 200 3.5 jacuzzi, swimming pool Total 40,534 364,806 2300 100     As it is not possible to provide the images from the Places8 dataset, we provide the original image names, class names, and splits (training, validation, and test). To use Places8, you must download the images from the Places365-Challenge. Out-of-Distribution (OOD) Scenes. While the introduced Places8 already
DOI:10.5281/zenodo.13910525