EyeOnWater training dataset for assessing the inclusion of water images

Training dataset The EyeOnWater app is designed to assess the ocean's water quality using images captured by regular citizens. In order to have an extra helping hand in determining whether an image meets the criteria for inclusion in the app, the YOLOv8 model for image classification is employe...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Krijger, Tjerk
Format: Dataset
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Training dataset The EyeOnWater app is designed to assess the ocean's water quality using images captured by regular citizens. In order to have an extra helping hand in determining whether an image meets the criteria for inclusion in the app, the YOLOv8 model for image classification is employed. With the help of this model all uploaded pictures are assessed. If the model deems a water image unsuitable, it is excluded from the app's online database. In order to train this model a training dataset containing a large pool of different images is required. The training dataset includes 12,357 'good' and 10,019 'bad' water quality images that were submitted to the EyeOnWater app.  Technical details Data preprocessing In order to create a larger training dataset the set of original images (containing a total of 1700 images) are augmented, by rotating, displacing and resizing them. Using the following settings: Maximum rotation of 45 degrees in both directions Maximum displacement of 20% times the width or height Horizontal and vertical flip Maximum shear range of 20% times the width Pixel range of 10 units Data splitting The training dataset is 80% used for training, 10% for validation and 10% for prediction.  Classes, labels and annotations The training dataset contains 2 classes with 2 labels 'good' and 'bad'. The 'good' images contain water images that are suited to determine the water quality using the Forel-Ule scale. The 'bad' images can include for example too much water reflection, a visible bottom surface, objects or not even include water at all. Parameters From the images the water quality can be obtained by comparing the water color to the 21 colors in the Forel-Ule scale. Parameter: http://vocab.nerc.ac.uk/collection/P01/current/CLFORULE/   Data sources The images are taken by citizen scientists, often with a smartphone. Data quality As the images are taken by smartphones, the image quality can be low. Next to this, the images are taken outside, in a non-confined space, meaning that there can be bad lightning, reflections and other problems occurring. Therefore, the images need first to be checked before they can be included in the app. Image resolution Larger images are resized to 256px by 256px, smaller images are excluded from the training dataset. Spatial coverage Images are taken on a global scale. Contact information For more information on the training dataset and/or the app, you can contact tjerk@maris.nl. 
DOI:10.5281/zenodo.10777440