Machine Learning-based Flu Forecasting Study Using the Official Data from the Centers for Disease Control and Prevention and Twitter Data

Aim/Purpose: In the United States, the Centers for Disease Control and Prevention (CDC) tracks the disease activity using data collected from medical practice's on a weekly basis. Collection of data by CDC from medical practices on a weekly basis leads to a lag time of approximately 2 weeks bef...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Issues in informing science & information technology education 2021-01, Vol.18, p.63-81
Hauptverfasser:	Wahid, Ali, Munkeby, Steven H, Sambasivam, Samuel
Format:	Artikel
Sprache:	eng
Schlagworte:	Analysis Big data Data mining Forecasts and trends Influenza Machine learning Medicine Neural networks Practice
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Aim/Purpose: In the United States, the Centers for Disease Control and Prevention (CDC) tracks the disease activity using data collected from medical practice's on a weekly basis. Collection of data by CDC from medical practices on a weekly basis leads to a lag time of approximately 2 weeks before any viable action can be planned. The 2-week delay problem was addressed in the study by creating machine learning models to predict flu outbreak. Background: The 2-week delay problem was addressed in the study by correlation of the flu trends identified from Twitter data and official flu data from the Centers for Disease Control and Prevention (CDC) in combination with creating a machine learning model using both data sources to predict flu outbreak. Methodology: A quantitative correlational study was performed using a quasi-experimental design. Flu trends from the CDC portal and tweets with mention of flu and influenza from the state of Georgia were used over a period of 22 weeks from December 29, 2019 to May 30, 2020 for this study. Contribution: This research contributed to the body of knowledge by using a simple bag-of-word method for sentiment analysis followed by the combination of CDC and Twitter data to generate a flu prediction model with higher accuracy than using CDC data only. Findings: The study found that (a) there is no correlation between official flu data from CDC and tweets with mention of flu and (b) there is an improvement in the performance of a flu forecasting model based on a machine learning algorithm using both official flu data from CDC and tweets with mention of flu. Recommendations for Practitioners: In this study, it was found that there was no correlation between the official flu data from the CDC and the count of tweets with mention of flu, which is why tweets alone should be used with caution to predict a flu out-break. Based on the findings of this study, social media data can be used as an additional variable to improve the accuracy of flu prediction models. It is also found that fourth order polynomial and support vector regression models offered the best accuracy of flu prediction models. Recommendations for Researchers: Open-source data, such as Twitter feed, can be mined for useful intelligence benefiting society. Machine learning-based prediction models can be improved by adding open-source data to the primary data set. Impact on Society: Key implication of this study for practitioners in the field were to use social media
ISSN:	1547-5840 1547-5867
DOI:	10.28945/4796