EyeOnWater training dataset for assessing the inclusion of water images

Training dataset The EyeOnWater app is designed to assess the ocean's water quality using images captured by regular citizens. In order to have an extra helping hand in determining whether an image meets the criteria for inclusion in the app, the YOLOv8 model for image classification is employe...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Krijger, Tjerk
Format: Dataset
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Training dataset The EyeOnWater app is designed to assess the ocean's water quality using images captured by regular citizens. In order to have an extra helping hand in determining whether an image meets the criteria for inclusion in the app, the YOLOv8 model for image classification is employed. With the help of this model all uploaded pictures are assessed. If the model deems a water image unsuitable, it is excluded from the app's online database. In order to train this model a training dataset containing a large pool of different images is required. The dataset contains a total of 13,766 images, categorized into three distinct classes: “water_good,” “water_bad,” and “other.” The “water_good” class includes images that meet the requirements of EyeOnWater. The “water_bad” class comprises images of water that do not fulfill these requirements. Finally, the “other” class consists of miscellaneous images that users submitted, which do not depict water. This categorization enables precise filtering and analysis of images relevant to water quality assessment. Technical details Data preprocessing In order to create a larger training dataset the set of original images (containing a total of 1700 images) are augmented, by rotating, displacing and resizing them. Using the following settings: Maximum rotation of 45 degrees in both directions Maximum displacement of 20% times the width or height Horizontal and vertical flip Maximum shear range of 20% times the width Pixel range of 10 units Data splitting The training dataset is 80% used for training, 10% for validation and 10% for prediction.  Classes, labels and annotations The dataset is divided into three classes, as previously mentioned. Initially, the training model was trained on just two classes: “water” and “nonWater.” However, it struggled to distinguish between images of acceptable water and those that did not meet the required standards. To address this, a third class, “water_bad,” was introduced. This class includes images of water that either show the ocean floor, or where a significant portion of the water is obscured by objects such as boats or docks. With the addition of the “water_bad” class, the model's ability to differentiate between acceptable and non-compliant water was improved. Parameters From the images the water quality can be obtained by comparing the water color to the 21 colors in the Forel-Ule scale. Parameter: http://vocab.nerc.ac.uk/collection/P01/current/CLFORULE/   Data sources The
DOI:10.5281/zenodo.10777440