Using PySpark to accelerate batch data point rotation for paleogeographic reconstruction
Batch paleogeographic point rotation (BPPR) is a PySpark-based extensible batch data point rotation method that accelerates rotation during paleogeographic reconstruction. Data point rotation is an important part of paleogeographic reconstruction and a significant tool for exploring the co-evolution...
Gespeichert in:
Veröffentlicht in: | International journal of digital earth 2024-12, Vol.17 (1) |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Batch paleogeographic point rotation (BPPR) is a PySpark-based extensible batch data point rotation method that accelerates rotation during paleogeographic reconstruction. Data point rotation is an important part of paleogeographic reconstruction and a significant tool for exploring the co-evolution of Earth and life. However, current point rotation techniques have challenges with processing speeds when handling extensive paleogeographic data. Therefore, this study introduced a parallel-computing framework to construct a BPPR. This method combines PySpark and PyGPlates, which can partition points and compute them simultaneously in multiple threads. The rotation of 232,277 fossil occurrences from the Cretaceous Period in the Paleobiology Database (PBDB) was completed within 26 s. By contrast, an alternative GPlates method completed the same task within 96 s. The proposed method supports CSV, EXCEL, SHP, and other data formats, thereby avoiding possible software switching requirements when using methods associated with GPlates. Using synthetic and real paleontological data as experimental datasets, BPPR proved to be nine times more efficient than GPlates when rotating 900,000 points. This efficiency improvement significantly enhanced data-driven paleogeographic analysis. The parallel strategy employed can be broadly applied to massive data analysis in geoscience. |
---|---|
ISSN: | 1753-8947 1753-8955 |
DOI: | 10.1080/17538947.2024.2428699 |