SQLite: A “Frictionless” Solution for Exchange of Biodiversity Data?

Biodiversity data exchange depends on established standards, such as Darwin Core, Audiovisual Core, Taxon Concept Schema, etc. Standards provide terms with defined semantic meaning and structures for data exchange. Standards simplify interchange of biodiversity data among government agencies, resear...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Biodiversity Information Science and Standards 2024-10, Vol.8 (4)
Hauptverfasser: Mozzherin, Dmitry, Ower, Geoffrey
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Biodiversity data exchange depends on established standards, such as Darwin Core, Audiovisual Core, Taxon Concept Schema, etc. Standards provide terms with defined semantic meaning and structures for data exchange. Standards simplify interchange of biodiversity data among government agencies, researchers, and engineers. A notable challenge during such data transfers is the complexity involved in processing data for various uses. Most data exchanges are performed via Comma Separated Value (CSV), Extensible Markup Language (XML), or JavaScript Object Notation (JSON), and need to be parsed and imported into a database before being queryable. For example, to explore the data in a Darwin Core Archive zipped file, it is necessary to decompress it, parse XML-based metadata about the content of the file, parse and read Ecological Metadata Language (EML) to extract provenance metadata, and correctly open the text-delimited files that use a wide variety of character encodings, delimiters, enclosures, and escape characters. All of it requires non-trivial data management and programming skills from users. We propose a paradigm shift toward using queryable SQLite-based files for more straightforward data interchange. This approach would reduce friction in data processing by directly using Structured Query Language (SQL). SQLite is an open source, high performance, lightweight database already installed on most computers (SQLite 2024). It integrates a database into a single file, facilitating straightforward data exchange and compression. Its robust engine is able to manage terabytes of data. The SQLite developers are committed to maintaining backward compatibility of both binary and SQL text file versions until 2050 (SQLite 2024). Connectivity to SQLite databases is supported by all popular programming languages. Furthermore, the United States Library of Congress endorses SQLite alongside XML and JSON for data archives, attesting to its long-term reliability (United States Library of Congress 2024). We at the Species File Group are experimenting with using SQLite to create a universal data converter, in which an SQLite database serves as an intermediate data storage format. This provides several useful advantages: 1. Universal data converter By learning a standard SQL schema for biodiversity data, users are able to efficiently write importers and exporters to/from the intermediate schema for a variety of other biodiversity data-exchange formats. Any programming languag
ISSN:2535-0897
2535-0897
DOI:10.3897/biss.8.138931