EyeOnWater training dataset for assessing the inclusion of water images
Training dataset The EyeOnWater app is designed to assess the ocean's water quality using images captured by regular citizens. In order to have an extra helping hand in determining whether an image meets the criteria for inclusion in the app, the YOLOv8 model for image classification is employe...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Dataset |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Training dataset
The EyeOnWater app is designed to assess the ocean's water quality using images captured by regular citizens. In order to have an extra helping hand in determining whether an image meets the criteria for inclusion in the app, the YOLOv8 model for image classification is employed. With the help of this model all uploaded pictures are assessed. If the model deems a water image unsuitable, it is excluded from the app's online database. In order to train this model a training dataset containing a large pool of different images is required. The dataset contains a total of 13,766 images, categorized into three distinct classes: “water_good,” “water_bad,” and “other.” The “water_good” class includes images that meet the requirements of EyeOnWater. The “water_bad” class comprises images of water that do not fulfill these requirements. Finally, the “other” class consists of miscellaneous images that users submitted, which do not depict water. This categorization enables precise filtering and analysis of images relevant to water quality assessment.
Technical details
Data preprocessing
In order to create a larger training dataset the set of original images (containing a total of 1700 images) are augmented, by rotating, displacing and resizing them. Using the following settings:
Maximum rotation of 45 degrees in both directions
Maximum displacement of 20% times the width or height
Horizontal and vertical flip
Maximum shear range of 20% times the width
Pixel range of 10 units
Data splitting
The training dataset is 80% used for training, 10% for validation and 10% for prediction.
Classes, labels and annotations
The dataset is divided into three classes, as previously mentioned. Initially, the training model was trained on just two classes: “water” and “nonWater.” However, it struggled to distinguish between images of acceptable water and those that did not meet the required standards. To address this, a third class, “water_bad,” was introduced. This class includes images of water that either show the ocean floor, or where a significant portion of the water is obscured by objects such as boats or docks. With the addition of the “water_bad” class, the model's ability to differentiate between acceptable and non-compliant water was improved.
Parameters
From the images the water quality can be obtained by comparing the water color to the 21 colors in the Forel-Ule scale.
Parameter: http://vocab.nerc.ac.uk/collection/P01/current/CLFORULE/
Data sources
The |
---|---|
DOI: | 10.5281/zenodo.10777440 |