Africanus I. Scalable, distributed and efficient radio data processing with Dask-MS and Codex Africanus
New radio interferometers such as MeerKAT, SKA, ngVLA, and DSA-2000 drive advancements in software for two key reasons. First, handling the vast data from these instruments requires subdivision and multi-node processing. Second, their improved sensitivity, achieved through better engineering and lar...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | New radio interferometers such as MeerKAT, SKA, ngVLA, and DSA-2000 drive
advancements in software for two key reasons. First, handling the vast data
from these instruments requires subdivision and multi-node processing. Second,
their improved sensitivity, achieved through better engineering and larger data
volumes, demands new techniques to fully exploit it. This creates a critical
challenge in radio astronomy software: pipelines must be optimized to process
data efficiently, but unforeseen artefacts from increased sensitivity require
ongoing development of new techniques. This leads to a trade-off among (1)
performance, (2) flexibility, and (3) ease-of-development. Rigid designs often
miss the full scope of the problem, while temporary research code is unsuitable
for production. This work introduces a framework for developing radio astronomy
techniques while balancing the above trade-offs. It prioritizes flexibility and
ease-of-development alongside acceptable performance by leveraging Open Source
data formats and software. To manage growing data volumes, data is distributed
across multiple processors and nodes for parallel processing, utilizing HPC and
cloud infrastructure. We present two Python libraries, Dask-MS and Codex
Africanus, which enable distributed, high-performance radio astronomy software
with Dask. Dask is a lightweight parallelization and distribution framework
that integrates with the PyData ecosystem, addressing the "Big Data" challenges
of radio astronomy. |
---|---|
DOI: | 10.48550/arxiv.2412.12052 |