Multidimensional Machine Learning Model to Calculate a COVID-19 Vulnerability Index
In Colombia, the first case of COVID-19 was confirmed on 6 March 2020. On 13 March 2023, Colombia registered 6,360,780 confirmed positive cases of COVID-19, representing 12.18% of the total population. The National Administrative Department of Statistics (DANE) in Colombia published in 2020 a COVID-...
Gespeichert in:
Veröffentlicht in: | Journal of personalized medicine 2023-07, Vol.13 (7), p.1141 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | 7 |
container_start_page | 1141 |
container_title | Journal of personalized medicine |
container_volume | 13 |
creator | Rosero Perez, Paula Andrea Realpe Gonzalez, Juan Sebastián Salazar-Cabrera, Ricardo Restrepo, David López, Diego M Blobel, Bernd |
description | In Colombia, the first case of COVID-19 was confirmed on 6 March 2020. On 13 March 2023, Colombia registered 6,360,780 confirmed positive cases of COVID-19, representing 12.18% of the total population. The National Administrative Department of Statistics (DANE) in Colombia published in 2020 a COVID-19 vulnerability index, which estimates the vulnerability (per city block) of being infected with COVID-19. Unfortunately, DANE did not consider multiple factors that could increase the risk of COVID-19 (in addition to demographic and health), such as environmental and mobility data (found in the related literature). The proposed multidimensional index considers variables of different types (unemployment rate, gross domestic product, citizens' mobility, vaccination data, and climatological and spatial information) in which the incidence of COVID-19 is calculated and compared with the incidence of the COVID-19 vulnerability index provided by DANE. The collection, data preparation, modeling, and evaluation phases of the Cross-Industry Standard Process for Data Mining methodology (CRISP-DM) were considered for constructing the index. The multidimensional index was evaluated using multiple machine learning models to calculate the incidence of COVID-19 cases in the main cities of Colombia. The results showed that the best-performing model to predict the incidence of COVID-19 in Colombia is the Extra Trees Regressor algorithm, obtaining an R-squared of 0.829. This work is the first step toward a multidimensional analysis of COVID-19 risk factors, which has the potential to support decision making in public health programs. The results are also relevant for calculating vulnerability indexes for other viral diseases, such as dengue. |
doi_str_mv | 10.3390/jpm13071141 |
format | Article |
fullrecord | <record><control><sourceid>gale_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_10381838</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A759154511</galeid><sourcerecordid>A759154511</sourcerecordid><originalsourceid>FETCH-LOGICAL-c435t-2bf5b9b180d253375ad60874a209d3b8041039b15fe268448e3e3d6c0b82c7b73</originalsourceid><addsrcrecordid>eNptkstr3DAQh0VpaUKaU-9F0EuhONHTlk8lbF8Lu-TQNlchS-ONFlnaWnZp_vtqmwebEOmgQfPNb6SZQegtJWect-R8uxsoJw2lgr5Ax4w0shKC1S8P7CN0mvOWlKUkYzV5jY54IyltpDhGP9ZzmLzzA8TsUzQBr4299hHwCswYfdzgdXIQ8JTwwgQ7BzMBNnhxebX8XNEWX80hwmg6H_x0g5fRwd836FVvQobTu_ME_fr65efie7W6_LZcXKwqK7icKtb1sms7qohjkpcnGVcT1QjDSOt4p4ighBe_7IHVSggFHLirLekUs03X8BP06VZ3N3cDOAtxGk3Qu9EPZrzRyXj92BP9td6kP7roKqq4Kgof7hTG9HuGPOnBZwshmAhpzpqVtKStCdkne_8E3aZ5LAX7T5UWCCoPqI0JoH3sU0ls96L6opEtlaIUvlBnz1BlOxi8TRF6X-4fBXy8DbBjynmE_uGTlOj9HOiDOSj0u8O6PLD3Xef_AKpsqjM</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2843074157</pqid></control><display><type>article</type><title>Multidimensional Machine Learning Model to Calculate a COVID-19 Vulnerability Index</title><source>MDPI - Multidisciplinary Digital Publishing Institute</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><source>PubMed Central</source><source>PubMed Central Open Access</source><creator>Rosero Perez, Paula Andrea ; Realpe Gonzalez, Juan Sebastián ; Salazar-Cabrera, Ricardo ; Restrepo, David ; López, Diego M ; Blobel, Bernd</creator><creatorcontrib>Rosero Perez, Paula Andrea ; Realpe Gonzalez, Juan Sebastián ; Salazar-Cabrera, Ricardo ; Restrepo, David ; López, Diego M ; Blobel, Bernd</creatorcontrib><description>In Colombia, the first case of COVID-19 was confirmed on 6 March 2020. On 13 March 2023, Colombia registered 6,360,780 confirmed positive cases of COVID-19, representing 12.18% of the total population. The National Administrative Department of Statistics (DANE) in Colombia published in 2020 a COVID-19 vulnerability index, which estimates the vulnerability (per city block) of being infected with COVID-19. Unfortunately, DANE did not consider multiple factors that could increase the risk of COVID-19 (in addition to demographic and health), such as environmental and mobility data (found in the related literature). The proposed multidimensional index considers variables of different types (unemployment rate, gross domestic product, citizens' mobility, vaccination data, and climatological and spatial information) in which the incidence of COVID-19 is calculated and compared with the incidence of the COVID-19 vulnerability index provided by DANE. The collection, data preparation, modeling, and evaluation phases of the Cross-Industry Standard Process for Data Mining methodology (CRISP-DM) were considered for constructing the index. The multidimensional index was evaluated using multiple machine learning models to calculate the incidence of COVID-19 cases in the main cities of Colombia. The results showed that the best-performing model to predict the incidence of COVID-19 in Colombia is the Extra Trees Regressor algorithm, obtaining an R-squared of 0.829. This work is the first step toward a multidimensional analysis of COVID-19 risk factors, which has the potential to support decision making in public health programs. The results are also relevant for calculating vulnerability indexes for other viral diseases, such as dengue.</description><identifier>ISSN: 2075-4426</identifier><identifier>EISSN: 2075-4426</identifier><identifier>DOI: 10.3390/jpm13071141</identifier><identifier>PMID: 37511754</identifier><language>eng</language><publisher>Switzerland: MDPI AG</publisher><subject>Analysis ; Cluster analysis ; COVID-19 ; COVID-19 vaccines ; Data mining ; Decision making ; Diabetes ; Disease susceptibility ; Hypertension ; Immunization ; Infections ; Learning algorithms ; Machine learning ; Medical research ; Mobility ; Mortality ; Precision medicine ; Product information ; Public health ; Risk factors ; Sociodemographics ; Socioeconomic factors ; Statistical analysis ; Vaccination ; Viral diseases ; Virus diseases</subject><ispartof>Journal of personalized medicine, 2023-07, Vol.13 (7), p.1141</ispartof><rights>COPYRIGHT 2023 MDPI AG</rights><rights>2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2023 by the authors. 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c435t-2bf5b9b180d253375ad60874a209d3b8041039b15fe268448e3e3d6c0b82c7b73</cites><orcidid>0000-0002-3789-1957 ; 0009-0003-4148-1138 ; 0000-0002-7552-1383 ; 0000-0001-9425-4375 ; 0009-0009-1982-7116</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC10381838/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC10381838/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,881,27901,27902,53766,53768</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/37511754$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Rosero Perez, Paula Andrea</creatorcontrib><creatorcontrib>Realpe Gonzalez, Juan Sebastián</creatorcontrib><creatorcontrib>Salazar-Cabrera, Ricardo</creatorcontrib><creatorcontrib>Restrepo, David</creatorcontrib><creatorcontrib>López, Diego M</creatorcontrib><creatorcontrib>Blobel, Bernd</creatorcontrib><title>Multidimensional Machine Learning Model to Calculate a COVID-19 Vulnerability Index</title><title>Journal of personalized medicine</title><addtitle>J Pers Med</addtitle><description>In Colombia, the first case of COVID-19 was confirmed on 6 March 2020. On 13 March 2023, Colombia registered 6,360,780 confirmed positive cases of COVID-19, representing 12.18% of the total population. The National Administrative Department of Statistics (DANE) in Colombia published in 2020 a COVID-19 vulnerability index, which estimates the vulnerability (per city block) of being infected with COVID-19. Unfortunately, DANE did not consider multiple factors that could increase the risk of COVID-19 (in addition to demographic and health), such as environmental and mobility data (found in the related literature). The proposed multidimensional index considers variables of different types (unemployment rate, gross domestic product, citizens' mobility, vaccination data, and climatological and spatial information) in which the incidence of COVID-19 is calculated and compared with the incidence of the COVID-19 vulnerability index provided by DANE. The collection, data preparation, modeling, and evaluation phases of the Cross-Industry Standard Process for Data Mining methodology (CRISP-DM) were considered for constructing the index. The multidimensional index was evaluated using multiple machine learning models to calculate the incidence of COVID-19 cases in the main cities of Colombia. The results showed that the best-performing model to predict the incidence of COVID-19 in Colombia is the Extra Trees Regressor algorithm, obtaining an R-squared of 0.829. This work is the first step toward a multidimensional analysis of COVID-19 risk factors, which has the potential to support decision making in public health programs. The results are also relevant for calculating vulnerability indexes for other viral diseases, such as dengue.</description><subject>Analysis</subject><subject>Cluster analysis</subject><subject>COVID-19</subject><subject>COVID-19 vaccines</subject><subject>Data mining</subject><subject>Decision making</subject><subject>Diabetes</subject><subject>Disease susceptibility</subject><subject>Hypertension</subject><subject>Immunization</subject><subject>Infections</subject><subject>Learning algorithms</subject><subject>Machine learning</subject><subject>Medical research</subject><subject>Mobility</subject><subject>Mortality</subject><subject>Precision medicine</subject><subject>Product information</subject><subject>Public health</subject><subject>Risk factors</subject><subject>Sociodemographics</subject><subject>Socioeconomic factors</subject><subject>Statistical analysis</subject><subject>Vaccination</subject><subject>Viral diseases</subject><subject>Virus diseases</subject><issn>2075-4426</issn><issn>2075-4426</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNptkstr3DAQh0VpaUKaU-9F0EuhONHTlk8lbF8Lu-TQNlchS-ONFlnaWnZp_vtqmwebEOmgQfPNb6SZQegtJWect-R8uxsoJw2lgr5Ax4w0shKC1S8P7CN0mvOWlKUkYzV5jY54IyltpDhGP9ZzmLzzA8TsUzQBr4299hHwCswYfdzgdXIQ8JTwwgQ7BzMBNnhxebX8XNEWX80hwmg6H_x0g5fRwd836FVvQobTu_ME_fr65efie7W6_LZcXKwqK7icKtb1sms7qohjkpcnGVcT1QjDSOt4p4ighBe_7IHVSggFHLirLekUs03X8BP06VZ3N3cDOAtxGk3Qu9EPZrzRyXj92BP9td6kP7roKqq4Kgof7hTG9HuGPOnBZwshmAhpzpqVtKStCdkne_8E3aZ5LAX7T5UWCCoPqI0JoH3sU0ls96L6opEtlaIUvlBnz1BlOxi8TRF6X-4fBXy8DbBjynmE_uGTlOj9HOiDOSj0u8O6PLD3Xef_AKpsqjM</recordid><startdate>20230715</startdate><enddate>20230715</enddate><creator>Rosero Perez, Paula Andrea</creator><creator>Realpe Gonzalez, Juan Sebastián</creator><creator>Salazar-Cabrera, Ricardo</creator><creator>Restrepo, David</creator><creator>López, Diego M</creator><creator>Blobel, Bernd</creator><general>MDPI AG</general><general>MDPI</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FH</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>LK8</scope><scope>M7P</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-3789-1957</orcidid><orcidid>https://orcid.org/0009-0003-4148-1138</orcidid><orcidid>https://orcid.org/0000-0002-7552-1383</orcidid><orcidid>https://orcid.org/0000-0001-9425-4375</orcidid><orcidid>https://orcid.org/0009-0009-1982-7116</orcidid></search><sort><creationdate>20230715</creationdate><title>Multidimensional Machine Learning Model to Calculate a COVID-19 Vulnerability Index</title><author>Rosero Perez, Paula Andrea ; Realpe Gonzalez, Juan Sebastián ; Salazar-Cabrera, Ricardo ; Restrepo, David ; López, Diego M ; Blobel, Bernd</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c435t-2bf5b9b180d253375ad60874a209d3b8041039b15fe268448e3e3d6c0b82c7b73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Analysis</topic><topic>Cluster analysis</topic><topic>COVID-19</topic><topic>COVID-19 vaccines</topic><topic>Data mining</topic><topic>Decision making</topic><topic>Diabetes</topic><topic>Disease susceptibility</topic><topic>Hypertension</topic><topic>Immunization</topic><topic>Infections</topic><topic>Learning algorithms</topic><topic>Machine learning</topic><topic>Medical research</topic><topic>Mobility</topic><topic>Mortality</topic><topic>Precision medicine</topic><topic>Product information</topic><topic>Public health</topic><topic>Risk factors</topic><topic>Sociodemographics</topic><topic>Socioeconomic factors</topic><topic>Statistical analysis</topic><topic>Vaccination</topic><topic>Viral diseases</topic><topic>Virus diseases</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Rosero Perez, Paula Andrea</creatorcontrib><creatorcontrib>Realpe Gonzalez, Juan Sebastián</creatorcontrib><creatorcontrib>Salazar-Cabrera, Ricardo</creatorcontrib><creatorcontrib>Restrepo, David</creatorcontrib><creatorcontrib>López, Diego M</creatorcontrib><creatorcontrib>Blobel, Bernd</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Biological Science Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Journal of personalized medicine</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Rosero Perez, Paula Andrea</au><au>Realpe Gonzalez, Juan Sebastián</au><au>Salazar-Cabrera, Ricardo</au><au>Restrepo, David</au><au>López, Diego M</au><au>Blobel, Bernd</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multidimensional Machine Learning Model to Calculate a COVID-19 Vulnerability Index</atitle><jtitle>Journal of personalized medicine</jtitle><addtitle>J Pers Med</addtitle><date>2023-07-15</date><risdate>2023</risdate><volume>13</volume><issue>7</issue><spage>1141</spage><pages>1141-</pages><issn>2075-4426</issn><eissn>2075-4426</eissn><abstract>In Colombia, the first case of COVID-19 was confirmed on 6 March 2020. On 13 March 2023, Colombia registered 6,360,780 confirmed positive cases of COVID-19, representing 12.18% of the total population. The National Administrative Department of Statistics (DANE) in Colombia published in 2020 a COVID-19 vulnerability index, which estimates the vulnerability (per city block) of being infected with COVID-19. Unfortunately, DANE did not consider multiple factors that could increase the risk of COVID-19 (in addition to demographic and health), such as environmental and mobility data (found in the related literature). The proposed multidimensional index considers variables of different types (unemployment rate, gross domestic product, citizens' mobility, vaccination data, and climatological and spatial information) in which the incidence of COVID-19 is calculated and compared with the incidence of the COVID-19 vulnerability index provided by DANE. The collection, data preparation, modeling, and evaluation phases of the Cross-Industry Standard Process for Data Mining methodology (CRISP-DM) were considered for constructing the index. The multidimensional index was evaluated using multiple machine learning models to calculate the incidence of COVID-19 cases in the main cities of Colombia. The results showed that the best-performing model to predict the incidence of COVID-19 in Colombia is the Extra Trees Regressor algorithm, obtaining an R-squared of 0.829. This work is the first step toward a multidimensional analysis of COVID-19 risk factors, which has the potential to support decision making in public health programs. The results are also relevant for calculating vulnerability indexes for other viral diseases, such as dengue.</abstract><cop>Switzerland</cop><pub>MDPI AG</pub><pmid>37511754</pmid><doi>10.3390/jpm13071141</doi><orcidid>https://orcid.org/0000-0002-3789-1957</orcidid><orcidid>https://orcid.org/0009-0003-4148-1138</orcidid><orcidid>https://orcid.org/0000-0002-7552-1383</orcidid><orcidid>https://orcid.org/0000-0001-9425-4375</orcidid><orcidid>https://orcid.org/0009-0009-1982-7116</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2075-4426 |
ispartof | Journal of personalized medicine, 2023-07, Vol.13 (7), p.1141 |
issn | 2075-4426 2075-4426 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_10381838 |
source | MDPI - Multidisciplinary Digital Publishing Institute; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals; PubMed Central; PubMed Central Open Access |
subjects | Analysis Cluster analysis COVID-19 COVID-19 vaccines Data mining Decision making Diabetes Disease susceptibility Hypertension Immunization Infections Learning algorithms Machine learning Medical research Mobility Mortality Precision medicine Product information Public health Risk factors Sociodemographics Socioeconomic factors Statistical analysis Vaccination Viral diseases Virus diseases |
title | Multidimensional Machine Learning Model to Calculate a COVID-19 Vulnerability Index |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-11T12%3A26%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multidimensional%20Machine%20Learning%20Model%20to%20Calculate%20a%20COVID-19%20Vulnerability%20Index&rft.jtitle=Journal%20of%20personalized%20medicine&rft.au=Rosero%20Perez,%20Paula%20Andrea&rft.date=2023-07-15&rft.volume=13&rft.issue=7&rft.spage=1141&rft.pages=1141-&rft.issn=2075-4426&rft.eissn=2075-4426&rft_id=info:doi/10.3390/jpm13071141&rft_dat=%3Cgale_pubme%3EA759154511%3C/gale_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2843074157&rft_id=info:pmid/37511754&rft_galeid=A759154511&rfr_iscdi=true |