A Static Analysis Approach for Detecting Array Shape Errors in Python

In recent years, Python has become widely used language for data processing in machine learning and deep learning. However, the dynamic typing of Python can lead to errors caused by array shape mismatches that are only detected at runtime. To increase development efficiency, we propose a static meth...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of Information Science and Engineering 2025-01, Vol.41 (1), p.97-119
Hauptverfasser: YUNGYU ZHUANG, CHIEN-WEN KAO, WEI-HSIN YEN
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In recent years, Python has become widely used language for data processing in machine learning and deep learning. However, the dynamic typing of Python can lead to errors caused by array shape mismatches that are only detected at runtime. To increase development efficiency, we propose a static method to check Python code that can detect shape errors before execution. Existing research activities such as Pytropos provide the capability to manually annotate the shape types of arrays and external datasets with Python's type hint feature. However, manual annotation decreases code flexibility and can cause problems, such as incorrect marking due to human error, and wasted labor and time costs. To address these issues, we propose a method that leverages abstract interpretation and abstract syntax tree analysis to statically check code for array and dataset shape types, thus reducing the need for manual annotation and improving code flexibility. We give an implementation of the proposed method named ShapeChecker for the widely-used library NumPy as an example. ShapeChecker extracts the shape type of NumPy arrays and automatically reads external datasets to obtain shape type information, accelerating the checking process and outputting the cause of shape errors when detected. We compared ShapeChecker with existing solutions in various scenarios and obtained promising results, demonstrating the tool's usefulness in improving efficiency and reducing runtime errors. To further enhance its functionality, we plan to extend ShapeChecker's support to more packages and address known issues. Overall, our proposed method and ShapeChecker tool provide a static approach to detecting array shape errors that can improve code quality and improve development efficiency.
ISSN:1016-2364
DOI:10.6688/JISE.202501_41(1).0006