Spoiler detection in TV program tweets

•It shows why we chose the support vector machine (SVM) for spoiler detection compared with five famous classifiers.•It explains that the verification of the features proposed in the previous work and how those features were selected and utilized to enable the SVM to classify spoilers from TV progra...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information sciences 2016-02, Vol.329, p.220-235
Hauptverfasser: Jeon, Sungho, Kim, Sungchul, Yu, Hwanjo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•It shows why we chose the support vector machine (SVM) for spoiler detection compared with five famous classifiers.•It explains that the verification of the features proposed in the previous work and how those features were selected and utilized to enable the SVM to classify spoilers from TV program tweets.•It shows that our method can work with semi-supervised learning (self-training) to reduce the amount of labeled data in an experiment that displays a similar level of performance.•We collected 2014 World-Cup tweets and conducted an experiment to prove that our method works in general. Watching TV programs at the scheduled airtime is difficult due to time differences between countries or personal circumstances. Not to be a victim of spoilers, people sometimes choose a self imposed isolation from civilization until they have seen their favorite program, such as to stay away from the Internet. However, smartphones allow people to habitually check the SNS messages posted by their friends to maintain their relationships. It leads to the problem of exposing spoilers about their favorite TV programs. To prevent a self imposed isolation from their friends, we need automatic method for detecting spoilers from TV program tweets. To the best of our knowledge, there have been two works that have addressed the spoiler detection task: (1) a keyword matching method and (2) a machine-learning method based on Latent Dirichlet Allocation (LDA). However, they were not designed for short texts as well as the real-world system. The keyword matching method incorrectly predicts most tweets as spoilers. Although the LDA-based method works well on large bodies of text, it fails to accurately detect spoilers from short texts such as Twitter. In this work, we introduce a simple and powerful method of spoiler detection based on four representative features, which are significant indicators of spoilers. To identify and utilize four features, we conduct a precise analysis on real-world tweet data, and we build an SVM-based prediction model based on the result. Using tweets about Dancing with the Stars, and the final of the 2014 World-Cup, we evaluate the effectiveness of the proposed methods on spoiler detection tasks. According to the result, our method achieves greater precision than the competitors while maintaining a comparable recall performance. At the same time, our method outperforms the competitors in terms of processing time, showing that our method is sufficiently lightw
ISSN:0020-0255
1872-6291
DOI:10.1016/j.ins.2015.09.005