Multidimensional Machine Learning Model to Calculate a COVID-19 Vulnerability Index

In Colombia, the first case of COVID-19 was confirmed on 6 March 2020. On 13 March 2023, Colombia registered 6,360,780 confirmed positive cases of COVID-19, representing 12.18% of the total population. The National Administrative Department of Statistics (DANE) in Colombia published in 2020 a COVID-...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of personalized medicine 2023-07, Vol.13 (7), p.1141
Hauptverfasser: Rosero Perez, Paula Andrea, Realpe Gonzalez, Juan Sebastián, Salazar-Cabrera, Ricardo, Restrepo, David, López, Diego M, Blobel, Bernd
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 7
container_start_page 1141
container_title Journal of personalized medicine
container_volume 13
creator Rosero Perez, Paula Andrea
Realpe Gonzalez, Juan Sebastián
Salazar-Cabrera, Ricardo
Restrepo, David
López, Diego M
Blobel, Bernd
description In Colombia, the first case of COVID-19 was confirmed on 6 March 2020. On 13 March 2023, Colombia registered 6,360,780 confirmed positive cases of COVID-19, representing 12.18% of the total population. The National Administrative Department of Statistics (DANE) in Colombia published in 2020 a COVID-19 vulnerability index, which estimates the vulnerability (per city block) of being infected with COVID-19. Unfortunately, DANE did not consider multiple factors that could increase the risk of COVID-19 (in addition to demographic and health), such as environmental and mobility data (found in the related literature). The proposed multidimensional index considers variables of different types (unemployment rate, gross domestic product, citizens' mobility, vaccination data, and climatological and spatial information) in which the incidence of COVID-19 is calculated and compared with the incidence of the COVID-19 vulnerability index provided by DANE. The collection, data preparation, modeling, and evaluation phases of the Cross-Industry Standard Process for Data Mining methodology (CRISP-DM) were considered for constructing the index. The multidimensional index was evaluated using multiple machine learning models to calculate the incidence of COVID-19 cases in the main cities of Colombia. The results showed that the best-performing model to predict the incidence of COVID-19 in Colombia is the Extra Trees Regressor algorithm, obtaining an R-squared of 0.829. This work is the first step toward a multidimensional analysis of COVID-19 risk factors, which has the potential to support decision making in public health programs. The results are also relevant for calculating vulnerability indexes for other viral diseases, such as dengue.
doi_str_mv 10.3390/jpm13071141
format Article
fullrecord <record><control><sourceid>gale_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_10381838</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A759154511</galeid><sourcerecordid>A759154511</sourcerecordid><originalsourceid>FETCH-LOGICAL-c435t-2bf5b9b180d253375ad60874a209d3b8041039b15fe268448e3e3d6c0b82c7b73</originalsourceid><addsrcrecordid>eNptkstr3DAQh0VpaUKaU-9F0EuhONHTlk8lbF8Lu-TQNlchS-ONFlnaWnZp_vtqmwebEOmgQfPNb6SZQegtJWect-R8uxsoJw2lgr5Ax4w0shKC1S8P7CN0mvOWlKUkYzV5jY54IyltpDhGP9ZzmLzzA8TsUzQBr4299hHwCswYfdzgdXIQ8JTwwgQ7BzMBNnhxebX8XNEWX80hwmg6H_x0g5fRwd836FVvQobTu_ME_fr65efie7W6_LZcXKwqK7icKtb1sms7qohjkpcnGVcT1QjDSOt4p4ighBe_7IHVSggFHLirLekUs03X8BP06VZ3N3cDOAtxGk3Qu9EPZrzRyXj92BP9td6kP7roKqq4Kgof7hTG9HuGPOnBZwshmAhpzpqVtKStCdkne_8E3aZ5LAX7T5UWCCoPqI0JoH3sU0ls96L6opEtlaIUvlBnz1BlOxi8TRF6X-4fBXy8DbBjynmE_uGTlOj9HOiDOSj0u8O6PLD3Xef_AKpsqjM</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2843074157</pqid></control><display><type>article</type><title>Multidimensional Machine Learning Model to Calculate a COVID-19 Vulnerability Index</title><source>MDPI - Multidisciplinary Digital Publishing Institute</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central</source><source>PubMed Central Open Access</source><creator>Rosero Perez, Paula Andrea ; Realpe Gonzalez, Juan Sebastián ; Salazar-Cabrera, Ricardo ; Restrepo, David ; López, Diego M ; Blobel, Bernd</creator><creatorcontrib>Rosero Perez, Paula Andrea ; Realpe Gonzalez, Juan Sebastián ; Salazar-Cabrera, Ricardo ; Restrepo, David ; López, Diego M ; Blobel, Bernd</creatorcontrib><description>In Colombia, the first case of COVID-19 was confirmed on 6 March 2020. On 13 March 2023, Colombia registered 6,360,780 confirmed positive cases of COVID-19, representing 12.18% of the total population. The National Administrative Department of Statistics (DANE) in Colombia published in 2020 a COVID-19 vulnerability index, which estimates the vulnerability (per city block) of being infected with COVID-19. Unfortunately, DANE did not consider multiple factors that could increase the risk of COVID-19 (in addition to demographic and health), such as environmental and mobility data (found in the related literature). The proposed multidimensional index considers variables of different types (unemployment rate, gross domestic product, citizens' mobility, vaccination data, and climatological and spatial information) in which the incidence of COVID-19 is calculated and compared with the incidence of the COVID-19 vulnerability index provided by DANE. The collection, data preparation, modeling, and evaluation phases of the Cross-Industry Standard Process for Data Mining methodology (CRISP-DM) were considered for constructing the index. The multidimensional index was evaluated using multiple machine learning models to calculate the incidence of COVID-19 cases in the main cities of Colombia. The results showed that the best-performing model to predict the incidence of COVID-19 in Colombia is the Extra Trees Regressor algorithm, obtaining an R-squared of 0.829. This work is the first step toward a multidimensional analysis of COVID-19 risk factors, which has the potential to support decision making in public health programs. The results are also relevant for calculating vulnerability indexes for other viral diseases, such as dengue.</description><identifier>ISSN: 2075-4426</identifier><identifier>EISSN: 2075-4426</identifier><identifier>DOI: 10.3390/jpm13071141</identifier><identifier>PMID: 37511754</identifier><language>eng</language><publisher>Switzerland: MDPI AG</publisher><subject>Analysis ; Cluster analysis ; COVID-19 ; COVID-19 vaccines ; Data mining ; Decision making ; Diabetes ; Disease susceptibility ; Hypertension ; Immunization ; Infections ; Learning algorithms ; Machine learning ; Medical research ; Mobility ; Mortality ; Precision medicine ; Product information ; Public health ; Risk factors ; Sociodemographics ; Socioeconomic factors ; Statistical analysis ; Vaccination ; Viral diseases ; Virus diseases</subject><ispartof>Journal of personalized medicine, 2023-07, Vol.13 (7), p.1141</ispartof><rights>COPYRIGHT 2023 MDPI AG</rights><rights>2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2023 by the authors. 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c435t-2bf5b9b180d253375ad60874a209d3b8041039b15fe268448e3e3d6c0b82c7b73</cites><orcidid>0000-0002-3789-1957 ; 0009-0003-4148-1138 ; 0000-0002-7552-1383 ; 0000-0001-9425-4375 ; 0009-0009-1982-7116</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC10381838/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC10381838/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,881,27901,27902,53766,53768</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/37511754$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Rosero Perez, Paula Andrea</creatorcontrib><creatorcontrib>Realpe Gonzalez, Juan Sebastián</creatorcontrib><creatorcontrib>Salazar-Cabrera, Ricardo</creatorcontrib><creatorcontrib>Restrepo, David</creatorcontrib><creatorcontrib>López, Diego M</creatorcontrib><creatorcontrib>Blobel, Bernd</creatorcontrib><title>Multidimensional Machine Learning Model to Calculate a COVID-19 Vulnerability Index</title><title>Journal of personalized medicine</title><addtitle>J Pers Med</addtitle><description>In Colombia, the first case of COVID-19 was confirmed on 6 March 2020. On 13 March 2023, Colombia registered 6,360,780 confirmed positive cases of COVID-19, representing 12.18% of the total population. The National Administrative Department of Statistics (DANE) in Colombia published in 2020 a COVID-19 vulnerability index, which estimates the vulnerability (per city block) of being infected with COVID-19. Unfortunately, DANE did not consider multiple factors that could increase the risk of COVID-19 (in addition to demographic and health), such as environmental and mobility data (found in the related literature). The proposed multidimensional index considers variables of different types (unemployment rate, gross domestic product, citizens' mobility, vaccination data, and climatological and spatial information) in which the incidence of COVID-19 is calculated and compared with the incidence of the COVID-19 vulnerability index provided by DANE. The collection, data preparation, modeling, and evaluation phases of the Cross-Industry Standard Process for Data Mining methodology (CRISP-DM) were considered for constructing the index. The multidimensional index was evaluated using multiple machine learning models to calculate the incidence of COVID-19 cases in the main cities of Colombia. The results showed that the best-performing model to predict the incidence of COVID-19 in Colombia is the Extra Trees Regressor algorithm, obtaining an R-squared of 0.829. This work is the first step toward a multidimensional analysis of COVID-19 risk factors, which has the potential to support decision making in public health programs. The results are also relevant for calculating vulnerability indexes for other viral diseases, such as dengue.</description><subject>Analysis</subject><subject>Cluster analysis</subject><subject>COVID-19</subject><subject>COVID-19 vaccines</subject><subject>Data mining</subject><subject>Decision making</subject><subject>Diabetes</subject><subject>Disease susceptibility</subject><subject>Hypertension</subject><subject>Immunization</subject><subject>Infections</subject><subject>Learning algorithms</subject><subject>Machine learning</subject><subject>Medical research</subject><subject>Mobility</subject><subject>Mortality</subject><subject>Precision medicine</subject><subject>Product information</subject><subject>Public health</subject><subject>Risk factors</subject><subject>Sociodemographics</subject><subject>Socioeconomic factors</subject><subject>Statistical analysis</subject><subject>Vaccination</subject><subject>Viral diseases</subject><subject>Virus diseases</subject><issn>2075-4426</issn><issn>2075-4426</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNptkstr3DAQh0VpaUKaU-9F0EuhONHTlk8lbF8Lu-TQNlchS-ONFlnaWnZp_vtqmwebEOmgQfPNb6SZQegtJWect-R8uxsoJw2lgr5Ax4w0shKC1S8P7CN0mvOWlKUkYzV5jY54IyltpDhGP9ZzmLzzA8TsUzQBr4299hHwCswYfdzgdXIQ8JTwwgQ7BzMBNnhxebX8XNEWX80hwmg6H_x0g5fRwd836FVvQobTu_ME_fr65efie7W6_LZcXKwqK7icKtb1sms7qohjkpcnGVcT1QjDSOt4p4ighBe_7IHVSggFHLirLekUs03X8BP06VZ3N3cDOAtxGk3Qu9EPZrzRyXj92BP9td6kP7roKqq4Kgof7hTG9HuGPOnBZwshmAhpzpqVtKStCdkne_8E3aZ5LAX7T5UWCCoPqI0JoH3sU0ls96L6opEtlaIUvlBnz1BlOxi8TRF6X-4fBXy8DbBjynmE_uGTlOj9HOiDOSj0u8O6PLD3Xef_AKpsqjM</recordid><startdate>20230715</startdate><enddate>20230715</enddate><creator>Rosero Perez, Paula Andrea</creator><creator>Realpe Gonzalez, Juan Sebastián</creator><creator>Salazar-Cabrera, Ricardo</creator><creator>Restrepo, David</creator><creator>López, Diego M</creator><creator>Blobel, Bernd</creator><general>MDPI AG</general><general>MDPI</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FH</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>LK8</scope><scope>M7P</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-3789-1957</orcidid><orcidid>https://orcid.org/0009-0003-4148-1138</orcidid><orcidid>https://orcid.org/0000-0002-7552-1383</orcidid><orcidid>https://orcid.org/0000-0001-9425-4375</orcidid><orcidid>https://orcid.org/0009-0009-1982-7116</orcidid></search><sort><creationdate>20230715</creationdate><title>Multidimensional Machine Learning Model to Calculate a COVID-19 Vulnerability Index</title><author>Rosero Perez, Paula Andrea ; Realpe Gonzalez, Juan Sebastián ; Salazar-Cabrera, Ricardo ; Restrepo, David ; López, Diego M ; Blobel, Bernd</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c435t-2bf5b9b180d253375ad60874a209d3b8041039b15fe268448e3e3d6c0b82c7b73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Analysis</topic><topic>Cluster analysis</topic><topic>COVID-19</topic><topic>COVID-19 vaccines</topic><topic>Data mining</topic><topic>Decision making</topic><topic>Diabetes</topic><topic>Disease susceptibility</topic><topic>Hypertension</topic><topic>Immunization</topic><topic>Infections</topic><topic>Learning algorithms</topic><topic>Machine learning</topic><topic>Medical research</topic><topic>Mobility</topic><topic>Mortality</topic><topic>Precision medicine</topic><topic>Product information</topic><topic>Public health</topic><topic>Risk factors</topic><topic>Sociodemographics</topic><topic>Socioeconomic factors</topic><topic>Statistical analysis</topic><topic>Vaccination</topic><topic>Viral diseases</topic><topic>Virus diseases</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Rosero Perez, Paula Andrea</creatorcontrib><creatorcontrib>Realpe Gonzalez, Juan Sebastián</creatorcontrib><creatorcontrib>Salazar-Cabrera, Ricardo</creatorcontrib><creatorcontrib>Restrepo, David</creatorcontrib><creatorcontrib>López, Diego M</creatorcontrib><creatorcontrib>Blobel, Bernd</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Biological Science Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Journal of personalized medicine</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Rosero Perez, Paula Andrea</au><au>Realpe Gonzalez, Juan Sebastián</au><au>Salazar-Cabrera, Ricardo</au><au>Restrepo, David</au><au>López, Diego M</au><au>Blobel, Bernd</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multidimensional Machine Learning Model to Calculate a COVID-19 Vulnerability Index</atitle><jtitle>Journal of personalized medicine</jtitle><addtitle>J Pers Med</addtitle><date>2023-07-15</date><risdate>2023</risdate><volume>13</volume><issue>7</issue><spage>1141</spage><pages>1141-</pages><issn>2075-4426</issn><eissn>2075-4426</eissn><abstract>In Colombia, the first case of COVID-19 was confirmed on 6 March 2020. On 13 March 2023, Colombia registered 6,360,780 confirmed positive cases of COVID-19, representing 12.18% of the total population. The National Administrative Department of Statistics (DANE) in Colombia published in 2020 a COVID-19 vulnerability index, which estimates the vulnerability (per city block) of being infected with COVID-19. Unfortunately, DANE did not consider multiple factors that could increase the risk of COVID-19 (in addition to demographic and health), such as environmental and mobility data (found in the related literature). The proposed multidimensional index considers variables of different types (unemployment rate, gross domestic product, citizens' mobility, vaccination data, and climatological and spatial information) in which the incidence of COVID-19 is calculated and compared with the incidence of the COVID-19 vulnerability index provided by DANE. The collection, data preparation, modeling, and evaluation phases of the Cross-Industry Standard Process for Data Mining methodology (CRISP-DM) were considered for constructing the index. The multidimensional index was evaluated using multiple machine learning models to calculate the incidence of COVID-19 cases in the main cities of Colombia. The results showed that the best-performing model to predict the incidence of COVID-19 in Colombia is the Extra Trees Regressor algorithm, obtaining an R-squared of 0.829. This work is the first step toward a multidimensional analysis of COVID-19 risk factors, which has the potential to support decision making in public health programs. The results are also relevant for calculating vulnerability indexes for other viral diseases, such as dengue.</abstract><cop>Switzerland</cop><pub>MDPI AG</pub><pmid>37511754</pmid><doi>10.3390/jpm13071141</doi><orcidid>https://orcid.org/0000-0002-3789-1957</orcidid><orcidid>https://orcid.org/0009-0003-4148-1138</orcidid><orcidid>https://orcid.org/0000-0002-7552-1383</orcidid><orcidid>https://orcid.org/0000-0001-9425-4375</orcidid><orcidid>https://orcid.org/0009-0009-1982-7116</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2075-4426
ispartof Journal of personalized medicine, 2023-07, Vol.13 (7), p.1141
issn 2075-4426
2075-4426
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_10381838
source MDPI - Multidisciplinary Digital Publishing Institute; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central; PubMed Central Open Access
subjects Analysis
Cluster analysis
COVID-19
COVID-19 vaccines
Data mining
Decision making
Diabetes
Disease susceptibility
Hypertension
Immunization
Infections
Learning algorithms
Machine learning
Medical research
Mobility
Mortality
Precision medicine
Product information
Public health
Risk factors
Sociodemographics
Socioeconomic factors
Statistical analysis
Vaccination
Viral diseases
Virus diseases
title Multidimensional Machine Learning Model to Calculate a COVID-19 Vulnerability Index
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-11T12%3A26%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multidimensional%20Machine%20Learning%20Model%20to%20Calculate%20a%20COVID-19%20Vulnerability%20Index&rft.jtitle=Journal%20of%20personalized%20medicine&rft.au=Rosero%20Perez,%20Paula%20Andrea&rft.date=2023-07-15&rft.volume=13&rft.issue=7&rft.spage=1141&rft.pages=1141-&rft.issn=2075-4426&rft.eissn=2075-4426&rft_id=info:doi/10.3390/jpm13071141&rft_dat=%3Cgale_pubme%3EA759154511%3C/gale_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2843074157&rft_id=info:pmid/37511754&rft_galeid=A759154511&rfr_iscdi=true