Borsa Istanbul (BIST) daily prediction using financial news and balanced feature selection

•The direction of Borsa Istanbul 100 Index (BIST100) open prices is predicted.•A feature selection method, called Balanced Mutual Information (BMI) is proposed.•BMI is able to deal with the class imbalance problem through oversampling.•BMI is compared with Mutual Information and Chi-square based fea...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2015-12, Vol.42 (22), p.9001-9011
Hauptverfasser: Gunduz, Hakan, Cataltepe, Zehra
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•The direction of Borsa Istanbul 100 Index (BIST100) open prices is predicted.•A feature selection method, called Balanced Mutual Information (BMI) is proposed.•BMI is able to deal with the class imbalance problem through oversampling.•BMI is compared with Mutual Information and Chi-square based feature selection.•BMI achieves higher macro-averaged F-measure than the other methods using less features. In this paper, a novel method is proposed to predict the direction of Borsa Istanbul (BIST) 100 Index (BIST100) open prices using the news articles released, as well as the price data, from the day before. Although English news articles have been used for market-prediction before, to the best of our knowledge, Turkish news articles together with prices have not yet been used to predict the Turkish markets. Turkish text mining techniques are applied on news articles to form feature vectors for each trading day. The feature vectors are assigned three labels based on the direction of the price change from the closing price of the day before and whether the change is significant. News articles are represented using high dimensional features, some of which could be noisy or irrelevant for prediction. There is also the scarcity of training data. Therefore, this study incorporates feature selection methods to select features that could improve classification performance. By its nature, significant positive or negative changes in stock price happen much less than non-significant changes, resulting in an imbalanced data set. Most feature selection methods in literature aim to reduce the classification accuracy. However, for imbalanced datasets, other measures, such as macro-averaged F-measure need to be considered. The paper proposes a feature selection methods that is able to deal with the class imbalance problem through oversampling of the minority classes and consideration of an ensemble of selected features. In order to decide on importance of features, as the relevance criterion for each feature, the proposed methodology uses mutual information which can detect nonlinear dependencies between variables. Therefore, the proposed feature selection method is called Balanced Mutual Information (BMI) feature selection method. Experiments were performed based on news articles provided by two different news sources: Public Disclosure Platform of BIST and financial news websites. It was shown that, using Balanced Mutual Information feature selection method, the significant
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2015.07.058