Using machine learning to extract information and predict outcomes from reports of randomised trials of smoking cessation interventions in the Human Behaviour-Change Project

Using reports of randomised trials of smoking cessation interventions as a test case, this study aimed to develop and evaluate machine learning (ML) algorithms for extracting information from study reports and predicting outcomes as part of the Human Behaviour-Change Project. It is the first of two...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Wellcome open research 2023, Vol.8, p.452
Hauptverfasser: West, Robert, Bonin, Francesca, Thomas, James, Wright, Alison J, Mac Aonghusa, Pol, Gleize, Martin, Hou, Yufang, O'Mara-Eves, Alison, Hastings, Janna, Johnston, Marie, Michie, Susan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Using reports of randomised trials of smoking cessation interventions as a test case, this study aimed to develop and evaluate machine learning (ML) algorithms for extracting information from study reports and predicting outcomes as part of the Human Behaviour-Change Project. It is the first of two linked papers, with the second paper reporting on further development of a prediction system. Researchers manually annotated 70 items of information ('entities') in 512 reports of randomised trials of smoking cessation interventions covering intervention content and delivery, population, setting, outcome and study methodology using the Behaviour Change Intervention Ontology. These entities were used to train ML algorithms to extract the information automatically. The information extraction ML algorithm involved a named-entity recognition system using the 'FLAIR' framework. The manually annotated intervention, population, setting and study entities were used to develop a deep-learning algorithm using multiple layers of long-short-term-memory (LSTM) components to predict smoking cessation outcomes. The F1 evaluation score, derived from the false positive and false negative rates (range 0-1), for the information extraction algorithm averaged 0.42 across different types of entity (SD=0.22, range 0.05-0.88) compared with an average human annotator's score of 0.75 (SD=0.15, range 0.38-1.00). The algorithm for assigning entities to study arms ( , intervention or control) was not successful. This initial ML outcome prediction algorithm did not outperform prediction based just on the mean outcome value or a linear regression model. While some success was achieved in using ML to extract information from reports of randomised trials of smoking cessation interventions, we identified major challenges that could be addressed by greater standardisation in the way that studies are reported. Outcome prediction from smoking cessation studies may benefit from development of novel algorithms, , using ontological information to inform ML (as reported in the linked paper ).
ISSN:2398-502X
2398-502X
DOI:10.12688/wellcomeopenres.20000.1