SQLite: A “Frictionless” Solution for Exchange of Biodiversity Data?
Biodiversity data exchange depends on established standards, such as Darwin Core, Audiovisual Core, Taxon Concept Schema, etc. Standards provide terms with defined semantic meaning and structures for data exchange. Standards simplify interchange of biodiversity data among government agencies, resear...
Gespeichert in:
Veröffentlicht in: | Biodiversity Information Science and Standards 2024-10, Vol.8 (4) |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Biodiversity data exchange depends on established standards, such as Darwin Core, Audiovisual Core, Taxon Concept Schema, etc. Standards provide terms with defined semantic meaning and structures for data exchange. Standards simplify interchange of biodiversity data among government agencies, researchers, and engineers.
A notable challenge during such data transfers is the complexity involved in processing data for various uses. Most data exchanges are performed via Comma Separated Value (CSV), Extensible Markup Language (XML), or JavaScript Object Notation (JSON), and need to be parsed and imported into a database before being queryable. For example, to explore the data in a Darwin Core Archive zipped file, it is necessary to decompress it, parse XML-based metadata about the content of the file, parse and read Ecological Metadata Language (EML) to extract provenance metadata, and correctly open the text-delimited files that use a wide variety of character encodings, delimiters, enclosures, and escape characters. All of it requires non-trivial data management and programming skills from users. We propose a paradigm shift toward using queryable SQLite-based files for more straightforward data interchange. This approach would reduce friction in data processing by directly using Structured Query Language (SQL).
SQLite is an open source, high performance, lightweight database already installed on most computers (SQLite 2024). It integrates a database into a single file, facilitating straightforward data exchange and compression. Its robust engine is able to manage terabytes of data. The SQLite developers are committed to maintaining backward compatibility of both binary and SQL text file versions until 2050 (SQLite 2024). Connectivity to SQLite databases is supported by all popular programming languages. Furthermore, the United States Library of Congress endorses SQLite alongside XML and JSON for data archives, attesting to its long-term reliability (United States Library of Congress 2024).
We at the Species File Group are experimenting with using SQLite to create a universal data converter, in which an SQLite database serves as an intermediate data storage format. This provides several useful advantages:
1. Universal data converter
By learning a standard SQL schema for biodiversity data, users are able to efficiently write importers and exporters to/from the intermediate schema for a variety of other biodiversity data-exchange formats. Any programming languag |
---|---|
ISSN: | 2535-0897 2535-0897 |
DOI: | 10.3897/biss.8.138931 |