Twitter mining for ontology-based domain discovery incorporating machine learning

Purpose This paper aims to obtain the domain of the textual content generated by users of online social network (OSN) platforms. Understanding a users’ domain (s) of interest is a significant step towards addressing their domain-based trustworthiness through an accurate understanding of their conten...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of knowledge management 2018-06, Vol.22 (5), p.949-981
Hauptverfasser:	Abu-Salih, Bilal, Wongthongtham, Pornpit, Yan Kit, Chan
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial intelligence Business intelligence Classification Customer services Customers Data analysis Data management Data mining Distillation False information Knowledge management Linguistics Machine learning Ontology Performance evaluation Politics Recommender systems Semantic analysis Semantic web Semantics Sentiment analysis Social networks Taxonomy Trust Trustworthiness Uncertainty
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Purpose This paper aims to obtain the domain of the textual content generated by users of online social network (OSN) platforms. Understanding a users’ domain (s) of interest is a significant step towards addressing their domain-based trustworthiness through an accurate understanding of their content in their OSNs. Design/methodology/approach This study uses a Twitter mining approach for domain-based classification of users and their textual content. The proposed approach incorporates machine learning modules. The approach comprises two analysis phases: the time-aware semantic analysis of users’ historical content incorporating five commonly used machine learning classifiers. This framework classifies users into two main categories: politics-related and non-politics-related categories. In the second stage, the likelihood predictions obtained in the first phase will be used to predict the domain of future users’ tweets. Findings Experiments have been conducted to validate the mechanism proposed in the study framework, further supported by the excellent performance of the harnessed evaluation metrics. The experiments conducted verify the applicability of the framework to an effective domain-based classification for Twitter users and their content, as evident in the outstanding results of several performance evaluation metrics. Research limitations/implications This study is limited to an on/off domain classification for content of OSNs. Hence, we have selected a politics domain because of Twitter’s popularity as an opulent source of political deliberations. Such data abundance facilitates data aggregation and improves the results of the data analysis. Furthermore, the currently implemented machine learning approaches assume that uncertainty and incompleteness do not affect the accuracy of the Twitter classification. In fact, data uncertainty and incompleteness may exist. In the future, the authors will formulate the data uncertainty and incompleteness into fuzzy numbers which can be used to address imprecise, uncertain and vague data. Practical implications This study proposes a practical framework comprising significant implications for a variety of business-related applications, such as the voice of customer/voice of market, recommendation systems, the discovery of domain-based influencers and opinion mining through tracking and simulation. In particular, the factual grasp of the domains of interest extracted at the user level or post level enhances the cu
ISSN:	1367-3270 1758-7484
DOI:	10.1108/JKM-11-2016-0489