Using PySpark to accelerate batch data point rotation for paleogeographic reconstruction

Batch paleogeographic point rotation (BPPR) is a PySpark-based extensible batch data point rotation method that accelerates rotation during paleogeographic reconstruction. Data point rotation is an important part of paleogeographic reconstruction and a significant tool for exploring the co-evolution...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of digital earth 2024-12, Vol.17 (1)
Hauptverfasser: Xu, Shuyan, Hu, Linshu, Li, Haipeng, Qin, Mengjiao, Wu, Sensen, Du, Zhenhong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Batch paleogeographic point rotation (BPPR) is a PySpark-based extensible batch data point rotation method that accelerates rotation during paleogeographic reconstruction. Data point rotation is an important part of paleogeographic reconstruction and a significant tool for exploring the co-evolution of Earth and life. However, current point rotation techniques have challenges with processing speeds when handling extensive paleogeographic data. Therefore, this study introduced a parallel-computing framework to construct a BPPR. This method combines PySpark and PyGPlates, which can partition points and compute them simultaneously in multiple threads. The rotation of 232,277 fossil occurrences from the Cretaceous Period in the Paleobiology Database (PBDB) was completed within 26 s. By contrast, an alternative GPlates method completed the same task within 96 s. The proposed method supports CSV, EXCEL, SHP, and other data formats, thereby avoiding possible software switching requirements when using methods associated with GPlates. Using synthetic and real paleontological data as experimental datasets, BPPR proved to be nine times more efficient than GPlates when rotating 900,000 points. This efficiency improvement significantly enhanced data-driven paleogeographic analysis. The parallel strategy employed can be broadly applied to massive data analysis in geoscience.
ISSN:1753-8947
1753-8955
DOI:10.1080/17538947.2024.2428699