Detecting science-based health disinformation: a stylometric machine learning approach

The COVID-19 pandemic showed that misleading scientific health information has become widespread and is challenging to counteract. Some of this disinformation comes from modification of medical research results. This paper investigates how humans create health disinformation through controlled chang...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of computational social science 2023-10, Vol.6 (2), p.817-843
Hauptverfasser: Williams, Jason A., Aleroud, Ahmed, Zimmerman, Danielle
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The COVID-19 pandemic showed that misleading scientific health information has become widespread and is challenging to counteract. Some of this disinformation comes from modification of medical research results. This paper investigates how humans create health disinformation through controlled changes of text from abstracts of peer-reviewed COVID-19 research papers. We also developed a machine learning model that used statement embeddings, readability, and text quality features to create datasets that contain falsified scientific statements. We then created machine learning classification models to identify statements containing disinformation. Our results reveal the importance of readability metrics and information quality features in identifying which statements were falsified. We show that text embeddings and semantic similarity do not yield a high detection rate of true/falsified statements compared to using information quality and readability features.
ISSN:2432-2717
2432-2725
DOI:10.1007/s42001-023-00213-y