UDE Diatoms in the Wild 2024
Diatoms are a highly diverse group of microalgae with finely ornamented microscopic silica shells. Alongside their high species diversity, several other factors make this group highly challenging for deep learning-based identification using light microscopy images. These factors include a) an unusua...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Dataset |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Diatoms are a highly diverse group of microalgae with finely ornamented microscopic silica shells. Alongside their high species diversity, several other factors make this group highly challenging for deep learning-based identification using light microscopy images. These factors include a) an unusually high intra-class variability (due in part to their life cycles) combined with often small between-class differences; b) a rather different visual appearance of specimens depending on their orientation on the slide; and c) the low availability of diatom experts for accurate taxonomic annotation of training image datasets. What’s more, light microscopy imaging of diatoms comes with some pitfalls, namely d) the required high resolution objectives have a very shallow focal depth / depth of field, which usually makes it impossible to include all relevant morphological features within a single focal plane, i.e. image; e) the samples can contain a lot of disturbing background such as sediment, clay, small diatom fragments and remains of other organisms. Most previously published diatom image datasets are relatively small and avoid points d) and e) by presenting images where the focal plane was manually pre-selected for each specimen, and which contained a rather clean background. Whilst such data is a good starting point for developing automated diatom identification models, we deem them insufficient to develop models robust enough for routine application on real-world data.
Here, we present the largest diatom image dataset thus far, aimed at facilitating the application and benchmarking of innovative deep learning methods to the diatom identification problem on realistic research data, “UDE DIATOMS in the Wild 2024” (University of Duisburg-Essen – Digital annotated open-source microscope slide scans from real-world samples, version of 2024). All images originate from ecological and biodiversity research and routine monitoring. Automated slide scanning provided high-throughput, non-selective imaging of diatoms, whilst focus stacking allowed to artificially increase the focal depth, so that features from different focal planes can be observed simultaneously within a single image. The dataset contains 83,570 images of 611 diatom taxa, 101 of which are represented by at least 100 samples, and 144 by at least 50 samples each. 74,410 of these images were identified at the species level to 542 species, the rest to 69 genera. We also complement this dataset with two examp |
---|---|
DOI: | 10.5281/zenodo.10410654 |