Beyond opinion classification: Extracting facts, opinions and experiences from health forums

Surveys indicate that patients, particularly those suffering from chronic conditions, strongly benefit from the information found in social networks and online forums. One challenge in accessing online health information is to differentiate between factual and more subjective information. In this wo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	PloS one 2019-01, Vol.14 (1), p.e0209961-e0209961
Hauptverfasser:	Carrillo-de-Albornoz, Jorge, Aker, Ahmet, Kurtic, Emina, Plaza, Laura
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Allergies Artificial intelligence Biology and Life Sciences Breast cancer Care and treatment Chronic conditions Chronic diseases Chronic illnesses Classification Computer and Information Sciences Diagnosis Feasibility studies Health Health Information Exchange Humans International conferences Internet Language Learning algorithms Linguistics Machine Learning Medicine and Health Sciences Online health care information services Patients Physical Sciences Polarity Research and Analysis Methods Semantic Web Semantics Sentences Sentiment analysis Social networks Social organization Social Sciences Support Vector Machine Support vector machines
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Surveys indicate that patients, particularly those suffering from chronic conditions, strongly benefit from the information found in social networks and online forums. One challenge in accessing online health information is to differentiate between factual and more subjective information. In this work, we evaluate the feasibility of exploiting lexical, syntactic, semantic, network-based and emotional properties of texts to automatically classify patient-generated contents into three types: "experiences", "facts" and "opinions", using machine learning algorithms. In this context, our goal is to develop automatic methods that will make online health information more easily accessible and useful for patients, professionals and researchers. We work with a set of 3000 posts to online health forums in breast cancer, morbus crohn and different allergies. Each sentence in a post is manually labeled as "experience", "fact" or "opinion". Using this data, we train a support vector machine algorithm to perform classification. The results are evaluated in a 10-fold cross validation procedure. Overall, we find that it is possible to predict the type of information contained in a forum post with a very high accuracy (over 80 percent) using simple text representations such as word embeddings and bags of words. We also analyze more complex features such as those based on the network properties, the polarity of words and the verbal tense of the sentences and show that, when combined with the previous ones, they can boost the results.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0209961