The Specimen Data Refinery: Using a scientific workflow approach for information extraction
Conference: TDWG 2022 Session: SYM12 - Information extraction from digital specimen images using Artificial Intelligence Accepted abstract: 10.3897/biss.6.93500 Presentation Date: 2022-10-18 Location: Sofia, Bulgaria Abstract Over the past three years, we have been developing the Specimen Data Refin...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Video |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Conference: TDWG 2022 Session: SYM12 - Information extraction from digital specimen images using Artificial Intelligence Accepted abstract: 10.3897/biss.6.93500 Presentation Date: 2022-10-18 Location: Sofia, Bulgaria Abstract Over the past three years, we have been developing the Specimen Data Refinery (SDR) to automate the extraction of data from specimen images as part of the SYNTHESYS project (Walton et al. 2020). The SDR provides an easy to deploy, open source, web-based interface to multiple workflows that enable a user to create new or enhance existing natural history specimen records. The SDR uses the Galaxy workflow platform as the basis for managing data analysis, and where possible, using existing Galaxy community tools and approaches (Jalili et al. 2020, Hardisty et al. 2022). We have developed a library of domain-specific tools including semantic segmentation, optical character recognition, hand-written text recognition, barcode reading and natural language processing. These tools have been designed to work on standardised images of specimens, specifically herbarium sheets, pinned insects and microscope slides. In this presentation, we provide our technical approach in developing the SDR, including the Galaxy workflow platform, application deployment, and tool interoperability, using FAIR digital objects (e.g., RO-Crates and openDigital Specimen objects (Soiland-Reyes et al. 2022, Addink and Hardisty 2020)). We present an evaluation of the tools, including segmentation, text recognition, and others, and the new challenges in using the resulting data from both a technical and social perspective. |
---|---|
DOI: | 10.6084/m9.figshare.21312345 |