Synthetic data for reef modelling
Synthetic data mimics the statistical properties of real-world datasets while removing reference to sensitive or confidential information in the original dataset (Quintana, 2020). Synthetic data is also useful for general model testing and development, with many methods available for generating data...
Gespeichert in:
Veröffentlicht in: | Ecological informatics 2024-09, Vol.82, p.102698, Article 102698 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Synthetic data mimics the statistical properties of real-world datasets while removing reference to sensitive or confidential information in the original dataset (Quintana, 2020). Synthetic data is also useful for general model testing and development, with many methods available for generating data from machine learning models (Raghunathan, 2021). Although not widely used in the context of ecological and environmental modelling, synthetic data can support and accelerate model testing and analyses where rightsholders are sensitive to data disclosure for study areas, or data collection is expensive.
In the context of reef modelling, synthetic data can be used to support model analyses that can be published without referring to specific sites, reefs, or study areas. This is desirable in the context of decision support for restoration of the Great Barrier Reef. The Reef has many stakeholders and release of early modelling results for intervention scenarios for specific areas would be premature until management or intervention strategy options have been discussed with stakeholders and/or rightsholders. Synthetic data allows a path to publish model and method demonstrations to share knowledge with the reef decision support community without prematurely suggesting policy recommendations for reefs which are sensitive to rightsholders or stakeholders.
We showcase a synthetic data pipeline developed for the reef decision-support system ADRIA (Adaptive Dynamic Reef Intervention Algorithms), using methods from the Python package Synthetic Data Vault (Patki et al., 2016) and others. The synthetic data models are developed to emulate the statistics of case-study reefs for publishing decision-support tool demonstrations, testing and method validation without revealing sensitive reef site information. This pipeline includes developing models for tabular (benthic/compositional reef data), spatial-temporal (wave and heat stress data) and spatial network data (coral larval connectivity). Conditional sampling methods which connect spatial relationships across datasets are used to develop synthetic reef data packages which mimic the statistical properties of the original dataset. The utility of the synthetic data is demonstrated on a sample reef data package, and methods used for anonymizing the data are detailed. The results are discussed in the context of formalizing synthetic data for reef modelling. All synthetic data code is available at ADRIA-synthetic-data/README.md at |
---|---|
ISSN: | 1574-9541 |
DOI: | 10.1016/j.ecoinf.2024.102698 |