The Performance of the LSTM-based Code Generated by Large Language Models (LLMs) in Forecasting Time Series Data
As an intriguing case is the goodness of the machine and deep learning models generated by these LLMs in conducting automated scientific data analysis, where a data analyst may not have enough expertise in manually coding and optimizing complex deep learning models and codes and thus may opt to leve...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | As an intriguing case is the goodness of the machine and deep learning models
generated by these LLMs in conducting automated scientific data analysis, where
a data analyst may not have enough expertise in manually coding and optimizing
complex deep learning models and codes and thus may opt to leverage LLMs to
generate the required models. This paper investigates and compares the
performance of the mainstream LLMs, such as ChatGPT, PaLM, LLama, and Falcon,
in generating deep learning models for analyzing time series data, an important
and popular data type with its prevalent applications in many application
domains including financial and stock market. This research conducts a set of
controlled experiments where the prompts for generating deep learning-based
models are controlled with respect to sensitivity levels of four criteria
including 1) Clarify and Specificity, 2) Objective and Intent, 3) Contextual
Information, and 4) Format and Style. While the results are relatively mix, we
observe some distinct patterns. We notice that using LLMs, we are able to
generate deep learning-based models with executable codes for each dataset
seperatly whose performance are comparable with the manually crafted and
optimized LSTM models for predicting the whole time series dataset. We also
noticed that ChatGPT outperforms the other LLMs in generating more accurate
models. Furthermore, we observed that the goodness of the generated models vary
with respect to the ``temperature'' parameter used in configuring LLMS. The
results can be beneficial for data analysts and practitioners who would like to
leverage generative AIs to produce good prediction models with acceptable
goodness. |
---|---|
DOI: | 10.48550/arxiv.2411.18731 |