Tecnología de Big Data en el análisis del estado de la pandemia por covid-19 en Colombia

At the present time of the pandemic, there is a need to process large volumes of information generated by reported positive cases, in order to identify patterns that lead to facing the emergency with timely contingency measures. In the present study, the treatment of a data set of the general popula...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Publicaciones e Investigación (En línea) 2021-12, Vol.15 (4)
Hauptverfasser: Quintero López, Jorge Luis, Arismendi Ramírez, Andrés, Pérez Rendón, Ángela Liceth
Format: Artikel
Sprache:spa
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:At the present time of the pandemic, there is a need to process large volumes of information generated by reported positive cases, in order to identify patterns that lead to facing the emergency with timely contingency measures. In the present study, the treatment of a data set of the general population of Colombia is proposed, with information from the month of March and April 2021, in order to characterize, georeference and predict to give value to the data, in search of an understanding of the dynamics of the virus, for which three Naive Bayes, Random Forest and J-48 tree models were used, seeking to identify the virus with greater precision; When using the Weka application, it is concluded that the model that best fits the prediction is the J-48 tree classification algorithm with a classification level of correct instances of 99.24%, with a Kappa value of 0.9266 reporting that there is close to 100% concordance in class classification, with an amount, for this case, of study of 221,583 classes and the prediction with 30 classes taken from the original base consisting of approximately 2,774,465 data. By applying statistical tests, it is possible to identify the correlation between the attributes, which leads to guaranteeing the correct modeling for the prediction. This process becomes a potential input to support the management processes of society and that benefits the decisions that are made in terms of public health. En la actualidad de la pandemia, se presenta la necesidad de procesar grandes volúmenes de información generados por casos reportados positivos, con el fin de identificar patrones que conlleven a afrontar la emergencia con medidas de contingencia oportunas. En el presente estudio se plantea el tratamiento de un data set de la población general de Colombia, con información comprendida del mes de marzo y abril del 2021, con el fin de caracterizar, georreferenciar y predecir para darle valor a los datos, en busca de una comprensión de la dinámica del virus, para lo que se utilizaron tres modelos Naive Bayes, Random Forest y árboles J-48, buscando identificar aquel con mayor precisión; al usar el aplicativo Weka se llega a la conclusión de que el modelo que mejor se ajusta a la predicción, es el algoritmo de clasificación de árboles J-48 con un nivel de clasificación de instancias correctas de 99.24%, con un valor de Kappa de 0.9266 informando que se aproxima al 100 % de concordancia en la clasificación de las clases, con una cantidad, para
ISSN:1900-6608
2539-4088
DOI:10.22490/25394088.5612