Lessons learned while building the Deepwater Horizon Database: Toward improved data sharing in coastal science

Process studies and coupled-model validation efforts in geosciences often require integration of multiple data types across time and space. For example, improved prediction of hydrocarbon fate and transport is an important societal need which fundamentally relies upon synthesis of oceanography and h...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computers & geosciences 2016-02, Vol.87, p.84-90
Hauptverfasser: Thessen, Anne E., McGinnis, Sean, North, Elizabeth W.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Process studies and coupled-model validation efforts in geosciences often require integration of multiple data types across time and space. For example, improved prediction of hydrocarbon fate and transport is an important societal need which fundamentally relies upon synthesis of oceanography and hydrocarbon chemistry. Yet, there are no publically accessible databases which integrate these diverse data types in a georeferenced format, nor are there guidelines for developing such a database. The objective of this research was to analyze the process of building one such database to provide baseline information on data sources and data sharing and to document the challenges and solutions that arose during this major undertaking. The resulting Deepwater Horizon Database was approximately 2.4GB in size and contained over 8 million georeferenced data points collected from industry, government databases, volunteer networks, and individual researchers. The major technical challenges that were overcome were reconciliation of terms, units, and quality flags which were necessary to effectively integrate the disparate data sets. Assembling this database required the development of relationships with individual researchers and data managers which often involved extensive e-mail contacts. The average number of emails exchanged per data set was 7.8. Of the 95 relevant data sets that were discovered, 38 (40%) were obtained, either in whole or in part. Over one third (36%) of the requests for data went unanswered. The majority of responses were received after the first request (64%) and within the first week of the first request (67%). Although fewer than half of the potentially relevant datasets were incorporated into the database, the level of sharing (40%) was high compared to some other disciplines where sharing can be as low as 10%. Our suggestions for building integrated databases include budgeting significant time for e-mail exchanges, being cognizant of the cost versus benefits of pursuing reticent data providers, and building trust through clear, respectful communication and with flexible and appropriate attributions. [Display omitted] •The Deepwater Horizon Database integrates 8 million georeferenced data points.•40% of data sets were obtained; 36% of our requests for data went unanswered.•Most responses were received after the first request and within the first week.•Major challenges overcome were reconciliation of terms, units, and quality flags.•Significant t
ISSN:0098-3004
1873-7803
DOI:10.1016/j.cageo.2015.12.001