PreBit — A multimodal model with Twitter FinBERT embeddings for extreme price movement prediction of Bitcoin

Bitcoin, with its ever-growing popularity, has demonstrated extreme price volatility since its origin. Extreme price fluctuations have been known to occur due to tweets from Elon Musk, Michael Saylor, and others. In this paper, we aim to investigate whether we can leverage Twitter data to predict th...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2023-12, Vol.233, p.120838, Article 120838
Hauptverfasser: Zou, Yanzhao, Herremans, Dorien
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Bitcoin, with its ever-growing popularity, has demonstrated extreme price volatility since its origin. Extreme price fluctuations have been known to occur due to tweets from Elon Musk, Michael Saylor, and others. In this paper, we aim to investigate whether we can leverage Twitter data to predict these extreme price movements. Existing social media models often take a shortcut and include sentiment extracted from tweets. In this work, however, we want to embed the actual tweets in a domain-informed way, and investigate whether they have an impact. Hence, we propose a multimodal deep learning model for predicting extreme price fluctuations that takes as input candlestick data, prices of a variety of correlated assets, technical indicators, as well as Twitter content. To train the model, a new dataset of 5,000 tweets per day containing the keyword ‘Bitcoin’ was collected from 2015 to 2021. This dataset, called PreBit, is made available online11https://www.kaggle.com/datasets/zyz5557585/prebit-multimodal-dataset-for-bitcoin-price., as is our model.22https://github.com/AMAAI-Lab/PreBit. Our proposed hybrid multimodal model consists of an SVM model based on price data, which is fused with a text-based Convolutional Neural Network. In the text-based model, we use the sentence-level FinBERT embeddings, pretrained on financial lexicons, so as to capture the full contents of the tweets and feed it to the model in an understandable way. In an ablation study, we explore whether adding social media data from the general public on Bitcoin improves the model’s ability to predict extreme price movements. Finally, we propose and backtest a trading strategy based on the predictions of our models with varying prediction threshold and show that it can be used to build a profitable trading strategy with a reduced risk over a ‘hold’ or moving average strategy. [Display omitted] •A multimodal model for BTC extreme price movement prediction using Twitter.•Ablation study of the impact of different modalities on accuracy.•New publicly available dataset of 9,435,437 tweets related to Bitcoin.•A profitable trading strategy with reduced risk exposure for Bitcoin trading.•Demonstrates the influence of predictive thresholds on risk of a trading strategy.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2023.120838