NCYPred: A Bidirectional LSTM Network With Attention for Y RNA and Short Non-Coding RNA Classification

Short non-coding RNAs (sncRNAs) are involved in multiple cellular processes and can be divided into dozens of classes. Among such classes, Y RNAs have been gaining attention, being essential factors for the initiation of DNA replication on vertebrates, as well as potential tumor biomarkers. Homologs...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE/ACM transactions on computational biology and bioinformatics 2023-01, Vol.20 (1), p.557-565
Hauptverfasser: Lima, Diego de S., Amichi, Luiz J. A., Fernandez, Maria A., Constantino, Ademir A., Seixas, Flavio A. V.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 565
container_issue 1
container_start_page 557
container_title IEEE/ACM transactions on computational biology and bioinformatics
container_volume 20
creator Lima, Diego de S.
Amichi, Luiz J. A.
Fernandez, Maria A.
Constantino, Ademir A.
Seixas, Flavio A. V.
description Short non-coding RNAs (sncRNAs) are involved in multiple cellular processes and can be divided into dozens of classes. Among such classes, Y RNAs have been gaining attention, being essential factors for the initiation of DNA replication on vertebrates, as well as potential tumor biomarkers. Homologs have also been described in nematodes and insects, as well as related sequences in bacteria. Methods capable of accurately predicting Y RNA transcripts are lacking. In this work, we developed an attention-based LSTM network and built a classification model able to classify sncRNAs (including Y RNA) directly from nucleotide sequences. A dataset consisting of 45,447 sncRNA sequences, from a wide range of organisms, obtained from Rfam 14.3 was built. Performance evaluation demonstrated that our proposed method, NCYPred ( N on -C oding/ Y RNA Pred iction ), can accurately predict Y RNA sequences and their homologs, as well as 11 additional classes, achieving results comparable with state-of-the-art methods. We also demonstrate that applying t-SNE on learned sequence representations could be useful for sequence analysis. Our model is freely available as a web-server ( https://www.gpea.uem.br/ncypred/ ).
doi_str_mv 10.1109/TCBB.2021.3131136
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2773453662</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9627779</ieee_id><sourcerecordid>2604025311</sourcerecordid><originalsourceid>FETCH-LOGICAL-c349t-11868c6007f58fedaf5a98be873a3670fd99b9ab16ca08eb87ab187eb3a5a5e93</originalsourceid><addsrcrecordid>eNpdkU1P4zAQhi3Eio_CD0BIyBIXLil2HH9xayNgV-oWBEWIU-QkYzCkMdipVvvvSbZdDpxmpPd5ZzTzInREyZhSos8X-XQ6TklKx4wySpnYQnuUc5loLbLtoc94wrVgu2g_xldC0kyTbAftskylItVyD9l5_nQboL7AEzx1tQtQdc63psGz-8VvPIfujw9v-NF1L3jSddAOKrY-4Cd8N59g09b4_sWHDs99m-S-du3zPyFvTIzOusoMjgP0w5omwuGmjtDD1eUi_5nMbq5_5ZNZUrFMdwmlSqhKECItVxZqY7nRqgQlmWFCEltrXWpTUlEZoqBUsu-VhJIZbjhoNkJn67nvwX-sIHbF0sUKmsa04FexSAXJSMqHZ43Q6Tf01a9Cf3lPSckyzoRIe4quqSr4GAPY4j24pQl_C0qKIYRiCKEYQig2IfSek83kVbmE-svx_-s9cLwGHAB8yVr0e6VmnxRQiMs</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2773453662</pqid></control><display><type>article</type><title>NCYPred: A Bidirectional LSTM Network With Attention for Y RNA and Short Non-Coding RNA Classification</title><source>IEEE Electronic Library (IEL)</source><creator>Lima, Diego de S. ; Amichi, Luiz J. A. ; Fernandez, Maria A. ; Constantino, Ademir A. ; Seixas, Flavio A. V.</creator><creatorcontrib>Lima, Diego de S. ; Amichi, Luiz J. A. ; Fernandez, Maria A. ; Constantino, Ademir A. ; Seixas, Flavio A. V.</creatorcontrib><description>Short non-coding RNAs (sncRNAs) are involved in multiple cellular processes and can be divided into dozens of classes. Among such classes, Y RNAs have been gaining attention, being essential factors for the initiation of DNA replication on vertebrates, as well as potential tumor biomarkers. Homologs have also been described in nematodes and insects, as well as related sequences in bacteria. Methods capable of accurately predicting Y RNA transcripts are lacking. In this work, we developed an attention-based LSTM network and built a classification model able to classify sncRNAs (including Y RNA) directly from nucleotide sequences. A dataset consisting of 45,447 sncRNA sequences, from a wide range of organisms, obtained from Rfam 14.3 was built. Performance evaluation demonstrated that our proposed method, NCYPred ( N on -C oding/ Y RNA Pred iction ), can accurately predict Y RNA sequences and their homologs, as well as 11 additional classes, achieving results comparable with state-of-the-art methods. We also demonstrate that applying t-SNE on learned sequence representations could be useful for sequence analysis. Our model is freely available as a web-server ( https://www.gpea.uem.br/ncypred/ ).</description><identifier>ISSN: 1545-5963</identifier><identifier>EISSN: 1557-9964</identifier><identifier>DOI: 10.1109/TCBB.2021.3131136</identifier><identifier>PMID: 34826297</identifier><identifier>CODEN: ITCBCY</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Animals ; Bacteria - genetics ; Biological system modeling ; Biomarkers ; Classification ; Classification algorithms ; Computers ; DNA biosynthesis ; Encoding ; Feature extraction ; Gene sequencing ; Homology ; Insects ; Nematodes ; Non-coding RNA ; Nucleotides ; Performance evaluation ; Predictive models ; recurrent neural network ; Replication initiation ; Ribonucleic acid ; RNA ; RNA, Small Untranslated - genetics ; Sequence analysis ; Sequence Analysis, RNA ; sequence classification ; Training ; Vertebrates ; web server ; Y RNA</subject><ispartof>IEEE/ACM transactions on computational biology and bioinformatics, 2023-01, Vol.20 (1), p.557-565</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c349t-11868c6007f58fedaf5a98be873a3670fd99b9ab16ca08eb87ab187eb3a5a5e93</citedby><cites>FETCH-LOGICAL-c349t-11868c6007f58fedaf5a98be873a3670fd99b9ab16ca08eb87ab187eb3a5a5e93</cites><orcidid>0000-0002-0660-2390 ; 0000-0002-7696-5680 ; 0000-0002-0117-6919</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9627779$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,778,782,794,27907,27908,54741</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9627779$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/34826297$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Lima, Diego de S.</creatorcontrib><creatorcontrib>Amichi, Luiz J. A.</creatorcontrib><creatorcontrib>Fernandez, Maria A.</creatorcontrib><creatorcontrib>Constantino, Ademir A.</creatorcontrib><creatorcontrib>Seixas, Flavio A. V.</creatorcontrib><title>NCYPred: A Bidirectional LSTM Network With Attention for Y RNA and Short Non-Coding RNA Classification</title><title>IEEE/ACM transactions on computational biology and bioinformatics</title><addtitle>TCBB</addtitle><addtitle>IEEE/ACM Trans Comput Biol Bioinform</addtitle><description>Short non-coding RNAs (sncRNAs) are involved in multiple cellular processes and can be divided into dozens of classes. Among such classes, Y RNAs have been gaining attention, being essential factors for the initiation of DNA replication on vertebrates, as well as potential tumor biomarkers. Homologs have also been described in nematodes and insects, as well as related sequences in bacteria. Methods capable of accurately predicting Y RNA transcripts are lacking. In this work, we developed an attention-based LSTM network and built a classification model able to classify sncRNAs (including Y RNA) directly from nucleotide sequences. A dataset consisting of 45,447 sncRNA sequences, from a wide range of organisms, obtained from Rfam 14.3 was built. Performance evaluation demonstrated that our proposed method, NCYPred ( N on -C oding/ Y RNA Pred iction ), can accurately predict Y RNA sequences and their homologs, as well as 11 additional classes, achieving results comparable with state-of-the-art methods. We also demonstrate that applying t-SNE on learned sequence representations could be useful for sequence analysis. Our model is freely available as a web-server ( https://www.gpea.uem.br/ncypred/ ).</description><subject>Animals</subject><subject>Bacteria - genetics</subject><subject>Biological system modeling</subject><subject>Biomarkers</subject><subject>Classification</subject><subject>Classification algorithms</subject><subject>Computers</subject><subject>DNA biosynthesis</subject><subject>Encoding</subject><subject>Feature extraction</subject><subject>Gene sequencing</subject><subject>Homology</subject><subject>Insects</subject><subject>Nematodes</subject><subject>Non-coding RNA</subject><subject>Nucleotides</subject><subject>Performance evaluation</subject><subject>Predictive models</subject><subject>recurrent neural network</subject><subject>Replication initiation</subject><subject>Ribonucleic acid</subject><subject>RNA</subject><subject>RNA, Small Untranslated - genetics</subject><subject>Sequence analysis</subject><subject>Sequence Analysis, RNA</subject><subject>sequence classification</subject><subject>Training</subject><subject>Vertebrates</subject><subject>web server</subject><subject>Y RNA</subject><issn>1545-5963</issn><issn>1557-9964</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><sourceid>EIF</sourceid><recordid>eNpdkU1P4zAQhi3Eio_CD0BIyBIXLil2HH9xayNgV-oWBEWIU-QkYzCkMdipVvvvSbZdDpxmpPd5ZzTzInREyZhSos8X-XQ6TklKx4wySpnYQnuUc5loLbLtoc94wrVgu2g_xldC0kyTbAftskylItVyD9l5_nQboL7AEzx1tQtQdc63psGz-8VvPIfujw9v-NF1L3jSddAOKrY-4Cd8N59g09b4_sWHDs99m-S-du3zPyFvTIzOusoMjgP0w5omwuGmjtDD1eUi_5nMbq5_5ZNZUrFMdwmlSqhKECItVxZqY7nRqgQlmWFCEltrXWpTUlEZoqBUsu-VhJIZbjhoNkJn67nvwX-sIHbF0sUKmsa04FexSAXJSMqHZ43Q6Tf01a9Cf3lPSckyzoRIe4quqSr4GAPY4j24pQl_C0qKIYRiCKEYQig2IfSek83kVbmE-svx_-s9cLwGHAB8yVr0e6VmnxRQiMs</recordid><startdate>202301</startdate><enddate>202301</enddate><creator>Lima, Diego de S.</creator><creator>Amichi, Luiz J. A.</creator><creator>Fernandez, Maria A.</creator><creator>Constantino, Ademir A.</creator><creator>Seixas, Flavio A. V.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-0660-2390</orcidid><orcidid>https://orcid.org/0000-0002-7696-5680</orcidid><orcidid>https://orcid.org/0000-0002-0117-6919</orcidid></search><sort><creationdate>202301</creationdate><title>NCYPred: A Bidirectional LSTM Network With Attention for Y RNA and Short Non-Coding RNA Classification</title><author>Lima, Diego de S. ; Amichi, Luiz J. A. ; Fernandez, Maria A. ; Constantino, Ademir A. ; Seixas, Flavio A. V.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c349t-11868c6007f58fedaf5a98be873a3670fd99b9ab16ca08eb87ab187eb3a5a5e93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Animals</topic><topic>Bacteria - genetics</topic><topic>Biological system modeling</topic><topic>Biomarkers</topic><topic>Classification</topic><topic>Classification algorithms</topic><topic>Computers</topic><topic>DNA biosynthesis</topic><topic>Encoding</topic><topic>Feature extraction</topic><topic>Gene sequencing</topic><topic>Homology</topic><topic>Insects</topic><topic>Nematodes</topic><topic>Non-coding RNA</topic><topic>Nucleotides</topic><topic>Performance evaluation</topic><topic>Predictive models</topic><topic>recurrent neural network</topic><topic>Replication initiation</topic><topic>Ribonucleic acid</topic><topic>RNA</topic><topic>RNA, Small Untranslated - genetics</topic><topic>Sequence analysis</topic><topic>Sequence Analysis, RNA</topic><topic>sequence classification</topic><topic>Training</topic><topic>Vertebrates</topic><topic>web server</topic><topic>Y RNA</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lima, Diego de S.</creatorcontrib><creatorcontrib>Amichi, Luiz J. A.</creatorcontrib><creatorcontrib>Fernandez, Maria A.</creatorcontrib><creatorcontrib>Constantino, Ademir A.</creatorcontrib><creatorcontrib>Seixas, Flavio A. V.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE/ACM transactions on computational biology and bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lima, Diego de S.</au><au>Amichi, Luiz J. A.</au><au>Fernandez, Maria A.</au><au>Constantino, Ademir A.</au><au>Seixas, Flavio A. V.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>NCYPred: A Bidirectional LSTM Network With Attention for Y RNA and Short Non-Coding RNA Classification</atitle><jtitle>IEEE/ACM transactions on computational biology and bioinformatics</jtitle><stitle>TCBB</stitle><addtitle>IEEE/ACM Trans Comput Biol Bioinform</addtitle><date>2023-01</date><risdate>2023</risdate><volume>20</volume><issue>1</issue><spage>557</spage><epage>565</epage><pages>557-565</pages><issn>1545-5963</issn><eissn>1557-9964</eissn><coden>ITCBCY</coden><abstract>Short non-coding RNAs (sncRNAs) are involved in multiple cellular processes and can be divided into dozens of classes. Among such classes, Y RNAs have been gaining attention, being essential factors for the initiation of DNA replication on vertebrates, as well as potential tumor biomarkers. Homologs have also been described in nematodes and insects, as well as related sequences in bacteria. Methods capable of accurately predicting Y RNA transcripts are lacking. In this work, we developed an attention-based LSTM network and built a classification model able to classify sncRNAs (including Y RNA) directly from nucleotide sequences. A dataset consisting of 45,447 sncRNA sequences, from a wide range of organisms, obtained from Rfam 14.3 was built. Performance evaluation demonstrated that our proposed method, NCYPred ( N on -C oding/ Y RNA Pred iction ), can accurately predict Y RNA sequences and their homologs, as well as 11 additional classes, achieving results comparable with state-of-the-art methods. We also demonstrate that applying t-SNE on learned sequence representations could be useful for sequence analysis. Our model is freely available as a web-server ( https://www.gpea.uem.br/ncypred/ ).</abstract><cop>United States</cop><pub>IEEE</pub><pmid>34826297</pmid><doi>10.1109/TCBB.2021.3131136</doi><tpages>9</tpages><orcidid>https://orcid.org/0000-0002-0660-2390</orcidid><orcidid>https://orcid.org/0000-0002-7696-5680</orcidid><orcidid>https://orcid.org/0000-0002-0117-6919</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1545-5963
ispartof IEEE/ACM transactions on computational biology and bioinformatics, 2023-01, Vol.20 (1), p.557-565
issn 1545-5963
1557-9964
language eng
recordid cdi_proquest_journals_2773453662
source IEEE Electronic Library (IEL)
subjects Animals
Bacteria - genetics
Biological system modeling
Biomarkers
Classification
Classification algorithms
Computers
DNA biosynthesis
Encoding
Feature extraction
Gene sequencing
Homology
Insects
Nematodes
Non-coding RNA
Nucleotides
Performance evaluation
Predictive models
recurrent neural network
Replication initiation
Ribonucleic acid
RNA
RNA, Small Untranslated - genetics
Sequence analysis
Sequence Analysis, RNA
sequence classification
Training
Vertebrates
web server
Y RNA
title NCYPred: A Bidirectional LSTM Network With Attention for Y RNA and Short Non-Coding RNA Classification
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T23%3A18%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=NCYPred:%20A%20Bidirectional%20LSTM%20Network%20With%20Attention%20for%20Y%20RNA%20and%20Short%20Non-Coding%20RNA%20Classification&rft.jtitle=IEEE/ACM%20transactions%20on%20computational%20biology%20and%20bioinformatics&rft.au=Lima,%20Diego%20de%20S.&rft.date=2023-01&rft.volume=20&rft.issue=1&rft.spage=557&rft.epage=565&rft.pages=557-565&rft.issn=1545-5963&rft.eissn=1557-9964&rft.coden=ITCBCY&rft_id=info:doi/10.1109/TCBB.2021.3131136&rft_dat=%3Cproquest_RIE%3E2604025311%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2773453662&rft_id=info:pmid/34826297&rft_ieee_id=9627779&rfr_iscdi=true