Scientific Text Sentiment Analysis using Machine Learning Techniques

Over time, textual information on the World Wide Web (WWW) has increased exponentially, leading to potential research in the field of machine learning (ML) and natural language processing (NLP). Sentiment analysis of scientific domain articles is a very trendy and interesting topic nowadays. The mai...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of advanced computer science & applications 2019, Vol.10 (12)
Hauptverfasser:	Raza, Hassan, Faizan, M., Hamza, Ahsan, Mushtaq, Ahmed, Akhtar, Naeem
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Algorithms Data mining Decision trees Machine learning Natural language processing Scientific papers Sentences Sentiment analysis Support vector machines
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Over time, textual information on the World Wide Web (WWW) has increased exponentially, leading to potential research in the field of machine learning (ML) and natural language processing (NLP). Sentiment analysis of scientific domain articles is a very trendy and interesting topic nowadays. The main purpose of this research is to facilitate researchers to identify quality research papers based on their sentiment analysis. In this research, sentiment analysis of scientific articles using citation sentences is carried out using an existing constructed annotated corpus. This corpus is consisted of 8736 citation sentences. The noise was removed from data using different data normalization rules in order to clean the data corpus. To perform classification on this data set we developed a system in which six different machine learning algorithms including Naïve-Bayes (NB), Support Vector Machine (SVM), Logistic Regression (LR), Decision Tree (DT), K-Nearest Neighbor (KNN) and Random Forest (RF) are implemented. Then the accuracy of the system is evaluated using different evaluation metrics e.g. F-score and Accuracy score. To improve the system’ accuracy additional features selection techniques, such as lemmatization, n-graming, tokenization, and stop word removal are applied and found that our system provided significant performance in every case compared to the base system. Our method achieved a maximum of about 9% improved results as compared to the base system.
ISSN:	2158-107X 2156-5570
DOI:	10.14569/IJACSA.2019.0101222