Discovering Mathematical Patterns Behind HIV-1 Genetic Recombination: A New Methodology to Identify Viral Features

In this article, we introduce a novel methodology for characterizing viral genetic features: the Unified Methodology of recombinant virus Identification (UMI). Our methodology converts genomic sequences into spectrograms, applies transfer learning using a pre-trained Convolutional Neural Network (CN...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2023, Vol.11, p.95796-95812
Hauptverfasser: Guerrero-Tamayo, Ana, Urquijo, Borja Sanz, Casado, Concepcion, Tosantos, Maria-Dolores Moragues, Olivares, Isabel, Pastor-Lopez, Iker
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 95812
container_issue
container_start_page 95796
container_title IEEE access
container_volume 11
creator Guerrero-Tamayo, Ana
Urquijo, Borja Sanz
Casado, Concepcion
Tosantos, Maria-Dolores Moragues
Olivares, Isabel
Pastor-Lopez, Iker
description In this article, we introduce a novel methodology for characterizing viral genetic features: the Unified Methodology of recombinant virus Identification (UMI). Our methodology converts genomic sequences into spectrograms, applies transfer learning using a pre-trained Convolutional Neural Network (CNN), and employs interpretability tools to identify the genomic regions relevant for characterizing a viral sequence as recombinant. The UMI methodology does not necessitate multiple sequence alignment or manual adjustments. As a result, it operates much faster, has low computational demands, and is capable of handling substantial amounts of data. To validate this, we applied UMI to one extensively studied and documented case: HIV-1 genetic recombination. We worked with all identified HIV-1 complete sequences (13554 sequences up to 2020), searching for mathematical patterns, signatures, that characterize an HIV-1 sequence as recombinant. CNN's hit rate (test accuracy) is 94%, with consistent and differentiated decision areas in each category. Using interpretability tools, we verified that the hot zones were similar for sequences of the same subtype and phylogenetic proximity. The leading areas for classifying a sequence as recombinant or non-recombinant are coincident with genomic regions that play a key role in genetic recombination processes. By applying UMI methodology we found that there is indeed a genome mathematical pattern that assesses an HIV-1 sequence as recombinant. In addition, we located its position. Considering expert knowledge, our results showed a substantial, robust and biologically-consistent hit rate. This type of solution can successfully guide the location and subsequent characterization of relevant areas, avoiding the heavy analysis of multiple sequence alignment and manual adjustments.
doi_str_mv 10.1109/ACCESS.2023.3311752
format Article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_10238470</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10238470</ieee_id><doaj_id>oai_doaj_org_article_b5d233b03e1845e58268b31da28e6cb3</doaj_id><sourcerecordid>2864343860</sourcerecordid><originalsourceid>FETCH-LOGICAL-c359t-d0db3e22f712deed3ef065343fcef09000852ea1e851b4eb751b1091dc81bbe3</originalsourceid><addsrcrecordid>eNpNUcFOGzEQXaEiEVG-AA6WOG9qe9a7Drc0JSESKVVBXC17PZs4StbUdlrl7-t0UcVcZjR-73lmXlFcMzpmjE6-TGez--fnMaccxgCMNYKfFSPO6kkJAupPH-qL4irGLc0hc0s0oyJ8c7H1vzG4fk1WOm1wr5Nr9Y780Clh6CP5ihvXW_KwfC0ZWWCP-Z38xNbvjesz2Pd3ZEq-4x-ywrTx1u_8-kiSJ0uLfXLdkby6kAXnqNMhYPxcnHd6F_HqPV8WL_P7l9lD-fi0WM6mj2ULYpJKS60B5LxrGLeIFrCjtYAKujZXk9MOgqNmKAUzFZomp3wOZlvJjEG4LJaDrPV6q96C2-twVF479a_hw1rpkFfZoTLCcgBDAZmsBArJa2mAWc0l1q2BrHU7aL0F_-uAMamtP4Q-T6-4rKs8lKxpRsGAaoOPMWD3_1dG1ckqNVilTlapd6sy62ZgOUT8wOAgq4bCX9pPj50</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2864343860</pqid></control><display><type>article</type><title>Discovering Mathematical Patterns Behind HIV-1 Genetic Recombination: A New Methodology to Identify Viral Features</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Guerrero-Tamayo, Ana ; Urquijo, Borja Sanz ; Casado, Concepcion ; Tosantos, Maria-Dolores Moragues ; Olivares, Isabel ; Pastor-Lopez, Iker</creator><creatorcontrib>Guerrero-Tamayo, Ana ; Urquijo, Borja Sanz ; Casado, Concepcion ; Tosantos, Maria-Dolores Moragues ; Olivares, Isabel ; Pastor-Lopez, Iker</creatorcontrib><description>In this article, we introduce a novel methodology for characterizing viral genetic features: the Unified Methodology of recombinant virus Identification (UMI). Our methodology converts genomic sequences into spectrograms, applies transfer learning using a pre-trained Convolutional Neural Network (CNN), and employs interpretability tools to identify the genomic regions relevant for characterizing a viral sequence as recombinant. The UMI methodology does not necessitate multiple sequence alignment or manual adjustments. As a result, it operates much faster, has low computational demands, and is capable of handling substantial amounts of data. To validate this, we applied UMI to one extensively studied and documented case: HIV-1 genetic recombination. We worked with all identified HIV-1 complete sequences (13554 sequences up to 2020), searching for mathematical patterns, signatures, that characterize an HIV-1 sequence as recombinant. CNN's hit rate (test accuracy) is 94%, with consistent and differentiated decision areas in each category. Using interpretability tools, we verified that the hot zones were similar for sequences of the same subtype and phylogenetic proximity. The leading areas for classifying a sequence as recombinant or non-recombinant are coincident with genomic regions that play a key role in genetic recombination processes. By applying UMI methodology we found that there is indeed a genome mathematical pattern that assesses an HIV-1 sequence as recombinant. In addition, we located its position. Considering expert knowledge, our results showed a substantial, robust and biologically-consistent hit rate. This type of solution can successfully guide the location and subsequent characterization of relevant areas, avoiding the heavy analysis of multiple sequence alignment and manual adjustments.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2023.3311752</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Alignment ; Artificial neural networks ; Bioinformatics ; Computer viruses ; Convolutional neural network ; Convolutional neural networks ; Coronaviruses ; deep learning ; genetic recombination ; Genetics ; genome mathematical pattern ; genome mathematical signature ; Genomics ; HIV-1 ; Mathematical analysis ; Mathematical models ; Methodology ; Pattern classification ; RNA ; Robustness (mathematics) ; Sequences ; Spectrogram ; Spectrograms</subject><ispartof>IEEE access, 2023, Vol.11, p.95796-95812</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c359t-d0db3e22f712deed3ef065343fcef09000852ea1e851b4eb751b1091dc81bbe3</cites><orcidid>0000-0002-7272-8610 ; 0000-0002-2347-5717 ; 0000-0003-2039-7773 ; 0000-0002-1828-350X ; 0000-0002-3068-6248 ; 0000-0003-3412-2877</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10238470$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,864,2100,4022,27632,27922,27923,27924,54932</link.rule.ids></links><search><creatorcontrib>Guerrero-Tamayo, Ana</creatorcontrib><creatorcontrib>Urquijo, Borja Sanz</creatorcontrib><creatorcontrib>Casado, Concepcion</creatorcontrib><creatorcontrib>Tosantos, Maria-Dolores Moragues</creatorcontrib><creatorcontrib>Olivares, Isabel</creatorcontrib><creatorcontrib>Pastor-Lopez, Iker</creatorcontrib><title>Discovering Mathematical Patterns Behind HIV-1 Genetic Recombination: A New Methodology to Identify Viral Features</title><title>IEEE access</title><addtitle>Access</addtitle><description>In this article, we introduce a novel methodology for characterizing viral genetic features: the Unified Methodology of recombinant virus Identification (UMI). Our methodology converts genomic sequences into spectrograms, applies transfer learning using a pre-trained Convolutional Neural Network (CNN), and employs interpretability tools to identify the genomic regions relevant for characterizing a viral sequence as recombinant. The UMI methodology does not necessitate multiple sequence alignment or manual adjustments. As a result, it operates much faster, has low computational demands, and is capable of handling substantial amounts of data. To validate this, we applied UMI to one extensively studied and documented case: HIV-1 genetic recombination. We worked with all identified HIV-1 complete sequences (13554 sequences up to 2020), searching for mathematical patterns, signatures, that characterize an HIV-1 sequence as recombinant. CNN's hit rate (test accuracy) is 94%, with consistent and differentiated decision areas in each category. Using interpretability tools, we verified that the hot zones were similar for sequences of the same subtype and phylogenetic proximity. The leading areas for classifying a sequence as recombinant or non-recombinant are coincident with genomic regions that play a key role in genetic recombination processes. By applying UMI methodology we found that there is indeed a genome mathematical pattern that assesses an HIV-1 sequence as recombinant. In addition, we located its position. Considering expert knowledge, our results showed a substantial, robust and biologically-consistent hit rate. This type of solution can successfully guide the location and subsequent characterization of relevant areas, avoiding the heavy analysis of multiple sequence alignment and manual adjustments.</description><subject>Alignment</subject><subject>Artificial neural networks</subject><subject>Bioinformatics</subject><subject>Computer viruses</subject><subject>Convolutional neural network</subject><subject>Convolutional neural networks</subject><subject>Coronaviruses</subject><subject>deep learning</subject><subject>genetic recombination</subject><subject>Genetics</subject><subject>genome mathematical pattern</subject><subject>genome mathematical signature</subject><subject>Genomics</subject><subject>HIV-1</subject><subject>Mathematical analysis</subject><subject>Mathematical models</subject><subject>Methodology</subject><subject>Pattern classification</subject><subject>RNA</subject><subject>Robustness (mathematics)</subject><subject>Sequences</subject><subject>Spectrogram</subject><subject>Spectrograms</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUcFOGzEQXaEiEVG-AA6WOG9qe9a7Drc0JSESKVVBXC17PZs4StbUdlrl7-t0UcVcZjR-73lmXlFcMzpmjE6-TGez--fnMaccxgCMNYKfFSPO6kkJAupPH-qL4irGLc0hc0s0oyJ8c7H1vzG4fk1WOm1wr5Nr9Y780Clh6CP5ihvXW_KwfC0ZWWCP-Z38xNbvjesz2Pd3ZEq-4x-ywrTx1u_8-kiSJ0uLfXLdkby6kAXnqNMhYPxcnHd6F_HqPV8WL_P7l9lD-fi0WM6mj2ULYpJKS60B5LxrGLeIFrCjtYAKujZXk9MOgqNmKAUzFZomp3wOZlvJjEG4LJaDrPV6q96C2-twVF479a_hw1rpkFfZoTLCcgBDAZmsBArJa2mAWc0l1q2BrHU7aL0F_-uAMamtP4Q-T6-4rKs8lKxpRsGAaoOPMWD3_1dG1ckqNVilTlapd6sy62ZgOUT8wOAgq4bCX9pPj50</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Guerrero-Tamayo, Ana</creator><creator>Urquijo, Borja Sanz</creator><creator>Casado, Concepcion</creator><creator>Tosantos, Maria-Dolores Moragues</creator><creator>Olivares, Isabel</creator><creator>Pastor-Lopez, Iker</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-7272-8610</orcidid><orcidid>https://orcid.org/0000-0002-2347-5717</orcidid><orcidid>https://orcid.org/0000-0003-2039-7773</orcidid><orcidid>https://orcid.org/0000-0002-1828-350X</orcidid><orcidid>https://orcid.org/0000-0002-3068-6248</orcidid><orcidid>https://orcid.org/0000-0003-3412-2877</orcidid></search><sort><creationdate>2023</creationdate><title>Discovering Mathematical Patterns Behind HIV-1 Genetic Recombination: A New Methodology to Identify Viral Features</title><author>Guerrero-Tamayo, Ana ; Urquijo, Borja Sanz ; Casado, Concepcion ; Tosantos, Maria-Dolores Moragues ; Olivares, Isabel ; Pastor-Lopez, Iker</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c359t-d0db3e22f712deed3ef065343fcef09000852ea1e851b4eb751b1091dc81bbe3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Alignment</topic><topic>Artificial neural networks</topic><topic>Bioinformatics</topic><topic>Computer viruses</topic><topic>Convolutional neural network</topic><topic>Convolutional neural networks</topic><topic>Coronaviruses</topic><topic>deep learning</topic><topic>genetic recombination</topic><topic>Genetics</topic><topic>genome mathematical pattern</topic><topic>genome mathematical signature</topic><topic>Genomics</topic><topic>HIV-1</topic><topic>Mathematical analysis</topic><topic>Mathematical models</topic><topic>Methodology</topic><topic>Pattern classification</topic><topic>RNA</topic><topic>Robustness (mathematics)</topic><topic>Sequences</topic><topic>Spectrogram</topic><topic>Spectrograms</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Guerrero-Tamayo, Ana</creatorcontrib><creatorcontrib>Urquijo, Borja Sanz</creatorcontrib><creatorcontrib>Casado, Concepcion</creatorcontrib><creatorcontrib>Tosantos, Maria-Dolores Moragues</creatorcontrib><creatorcontrib>Olivares, Isabel</creatorcontrib><creatorcontrib>Pastor-Lopez, Iker</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Guerrero-Tamayo, Ana</au><au>Urquijo, Borja Sanz</au><au>Casado, Concepcion</au><au>Tosantos, Maria-Dolores Moragues</au><au>Olivares, Isabel</au><au>Pastor-Lopez, Iker</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Discovering Mathematical Patterns Behind HIV-1 Genetic Recombination: A New Methodology to Identify Viral Features</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2023</date><risdate>2023</risdate><volume>11</volume><spage>95796</spage><epage>95812</epage><pages>95796-95812</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>In this article, we introduce a novel methodology for characterizing viral genetic features: the Unified Methodology of recombinant virus Identification (UMI). Our methodology converts genomic sequences into spectrograms, applies transfer learning using a pre-trained Convolutional Neural Network (CNN), and employs interpretability tools to identify the genomic regions relevant for characterizing a viral sequence as recombinant. The UMI methodology does not necessitate multiple sequence alignment or manual adjustments. As a result, it operates much faster, has low computational demands, and is capable of handling substantial amounts of data. To validate this, we applied UMI to one extensively studied and documented case: HIV-1 genetic recombination. We worked with all identified HIV-1 complete sequences (13554 sequences up to 2020), searching for mathematical patterns, signatures, that characterize an HIV-1 sequence as recombinant. CNN's hit rate (test accuracy) is 94%, with consistent and differentiated decision areas in each category. Using interpretability tools, we verified that the hot zones were similar for sequences of the same subtype and phylogenetic proximity. The leading areas for classifying a sequence as recombinant or non-recombinant are coincident with genomic regions that play a key role in genetic recombination processes. By applying UMI methodology we found that there is indeed a genome mathematical pattern that assesses an HIV-1 sequence as recombinant. In addition, we located its position. Considering expert knowledge, our results showed a substantial, robust and biologically-consistent hit rate. This type of solution can successfully guide the location and subsequent characterization of relevant areas, avoiding the heavy analysis of multiple sequence alignment and manual adjustments.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2023.3311752</doi><tpages>17</tpages><orcidid>https://orcid.org/0000-0002-7272-8610</orcidid><orcidid>https://orcid.org/0000-0002-2347-5717</orcidid><orcidid>https://orcid.org/0000-0003-2039-7773</orcidid><orcidid>https://orcid.org/0000-0002-1828-350X</orcidid><orcidid>https://orcid.org/0000-0002-3068-6248</orcidid><orcidid>https://orcid.org/0000-0003-3412-2877</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2023, Vol.11, p.95796-95812
issn 2169-3536
2169-3536
language eng
recordid cdi_ieee_primary_10238470
source IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects Alignment
Artificial neural networks
Bioinformatics
Computer viruses
Convolutional neural network
Convolutional neural networks
Coronaviruses
deep learning
genetic recombination
Genetics
genome mathematical pattern
genome mathematical signature
Genomics
HIV-1
Mathematical analysis
Mathematical models
Methodology
Pattern classification
RNA
Robustness (mathematics)
Sequences
Spectrogram
Spectrograms
title Discovering Mathematical Patterns Behind HIV-1 Genetic Recombination: A New Methodology to Identify Viral Features
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-12T06%3A55%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Discovering%20Mathematical%20Patterns%20Behind%20HIV-1%20Genetic%20Recombination:%20A%20New%20Methodology%20to%20Identify%20Viral%20Features&rft.jtitle=IEEE%20access&rft.au=Guerrero-Tamayo,%20Ana&rft.date=2023&rft.volume=11&rft.spage=95796&rft.epage=95812&rft.pages=95796-95812&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2023.3311752&rft_dat=%3Cproquest_ieee_%3E2864343860%3C/proquest_ieee_%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2864343860&rft_id=info:pmid/&rft_ieee_id=10238470&rft_doaj_id=oai_doaj_org_article_b5d233b03e1845e58268b31da28e6cb3&rfr_iscdi=true