RNAlight: a machine learning model to identify nucleotide features determining RNA subcellular localization

Abstract Different RNAs have distinct subcellular localizations. However, nucleotide features that determine these distinct distributions of lncRNAs and mRNAs have yet to be fully addressed. Here, we develop RNAlight, a machine learning model based on LightGBM, to identify nucleotide k-mers contribu...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Briefings in bioinformatics 2023-01, Vol.24 (1)
Hauptverfasser: Yuan, Guo-Hua, Wang, Ying, Wang, Guang-Zhong, Yang, Li
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue 1
container_start_page
container_title Briefings in bioinformatics
container_volume 24
creator Yuan, Guo-Hua
Wang, Ying
Wang, Guang-Zhong
Yang, Li
description Abstract Different RNAs have distinct subcellular localizations. However, nucleotide features that determine these distinct distributions of lncRNAs and mRNAs have yet to be fully addressed. Here, we develop RNAlight, a machine learning model based on LightGBM, to identify nucleotide k-mers contributing to the subcellular localizations of mRNAs and lncRNAs. With the Tree SHAP algorithm, RNAlight extracts nucleotide features for cytoplasmic or nuclear localization of RNAs, indicating the sequence basis for distinct RNA subcellular localizations. By assembling k-mers to sequence features and subsequently mapping to known RBP-associated motifs, different types of sequence features and their associated RBPs were additionally uncovered for lncRNAs and mRNAs with distinct subcellular localizations. Finally, we extended RNAlight to precisely predict the subcellular localizations of other types of RNAs, including snRNAs, snoRNAs and different circular RNA transcripts, suggesting the generality of using RNAlight for RNA subcellular localization prediction.
doi_str_mv 10.1093/bib/bbac509
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2747005342</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bib/bbac509</oup_id><sourcerecordid>3113462080</sourcerecordid><originalsourceid>FETCH-LOGICAL-c385t-448962ad959aee98b059495af824ec1ba7a533a7124485938b2abcb509ffe2d43</originalsourceid><addsrcrecordid>eNp9kUuLFTEQhYMozji6ci8BQQRpJ51Hp-NuGHzBoCC6birp6pmM6c41j8X46831Xl24cFVV8NXhVB1Cnvbsdc-MOLfenlsLTjFzj5z2UutOMiXv7_tBd0oO4oQ8yvmWMc702D8kJ2KQg5SjPiXfv3y6CP76pryhQFdwN35DGhDS5rdrusYZAy2R-hm34pc7ulUXMJY20wWh1ISZzlgwrf73RpOjuVqHIdQAiYboIPifUHzcHpMHC4SMT471jHx79_br5Yfu6vP7j5cXV50ToypdM2YGDrNRBhDNaJky0ihYRi7R9RY0KCFA97yRyojRcrDOtvOXBfksxRl5edDdpfijYi7T6vPeEWwYa564lpoxJSRv6PN_0NtY09bcTaLvhRw4G1mjXh0ol2LOCZdpl_wK6W7q2bTPYGoZTMcMGv3sqFntivNf9s_TG_DiAMS6-6_SL3OkkDo</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3113462080</pqid></control><display><type>article</type><title>RNAlight: a machine learning model to identify nucleotide features determining RNA subcellular localization</title><source>MEDLINE</source><source>Business Source Complete</source><source>Freely available e-journals</source><source>Open Access: Oxford University Press Open Journals</source><source>PubMed Central</source><creator>Yuan, Guo-Hua ; Wang, Ying ; Wang, Guang-Zhong ; Yang, Li</creator><creatorcontrib>Yuan, Guo-Hua ; Wang, Ying ; Wang, Guang-Zhong ; Yang, Li</creatorcontrib><description>Abstract Different RNAs have distinct subcellular localizations. However, nucleotide features that determine these distinct distributions of lncRNAs and mRNAs have yet to be fully addressed. Here, we develop RNAlight, a machine learning model based on LightGBM, to identify nucleotide k-mers contributing to the subcellular localizations of mRNAs and lncRNAs. With the Tree SHAP algorithm, RNAlight extracts nucleotide features for cytoplasmic or nuclear localization of RNAs, indicating the sequence basis for distinct RNA subcellular localizations. By assembling k-mers to sequence features and subsequently mapping to known RBP-associated motifs, different types of sequence features and their associated RBPs were additionally uncovered for lncRNAs and mRNAs with distinct subcellular localizations. Finally, we extended RNAlight to precisely predict the subcellular localizations of other types of RNAs, including snRNAs, snoRNAs and different circular RNA transcripts, suggesting the generality of using RNAlight for RNA subcellular localization prediction.</description><identifier>ISSN: 1467-5463</identifier><identifier>EISSN: 1477-4054</identifier><identifier>DOI: 10.1093/bib/bbac509</identifier><identifier>PMID: 36464487</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Algorithms ; Circular RNA ; Learning algorithms ; Localization ; Machine Learning ; Non-coding RNA ; Nucleotide sequence ; Nucleotides ; Ribonucleic acid ; RNA ; RNA, Long Noncoding - genetics ; RNA, Messenger - genetics</subject><ispartof>Briefings in bioinformatics, 2023-01, Vol.24 (1)</ispartof><rights>The Author(s) 2022. Published by Oxford University Press. 2022</rights><rights>The Author(s) 2022. Published by Oxford University Press.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c385t-448962ad959aee98b059495af824ec1ba7a533a7124485938b2abcb509ffe2d43</citedby><cites>FETCH-LOGICAL-c385t-448962ad959aee98b059495af824ec1ba7a533a7124485938b2abcb509ffe2d43</cites><orcidid>0000-0001-8833-7473 ; 0000-0001-6432-8310 ; 0000-0002-8459-3784 ; 0000-0001-5230-4119</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,1598,27903,27904</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/36464487$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Yuan, Guo-Hua</creatorcontrib><creatorcontrib>Wang, Ying</creatorcontrib><creatorcontrib>Wang, Guang-Zhong</creatorcontrib><creatorcontrib>Yang, Li</creatorcontrib><title>RNAlight: a machine learning model to identify nucleotide features determining RNA subcellular localization</title><title>Briefings in bioinformatics</title><addtitle>Brief Bioinform</addtitle><description>Abstract Different RNAs have distinct subcellular localizations. However, nucleotide features that determine these distinct distributions of lncRNAs and mRNAs have yet to be fully addressed. Here, we develop RNAlight, a machine learning model based on LightGBM, to identify nucleotide k-mers contributing to the subcellular localizations of mRNAs and lncRNAs. With the Tree SHAP algorithm, RNAlight extracts nucleotide features for cytoplasmic or nuclear localization of RNAs, indicating the sequence basis for distinct RNA subcellular localizations. By assembling k-mers to sequence features and subsequently mapping to known RBP-associated motifs, different types of sequence features and their associated RBPs were additionally uncovered for lncRNAs and mRNAs with distinct subcellular localizations. Finally, we extended RNAlight to precisely predict the subcellular localizations of other types of RNAs, including snRNAs, snoRNAs and different circular RNA transcripts, suggesting the generality of using RNAlight for RNA subcellular localization prediction.</description><subject>Algorithms</subject><subject>Circular RNA</subject><subject>Learning algorithms</subject><subject>Localization</subject><subject>Machine Learning</subject><subject>Non-coding RNA</subject><subject>Nucleotide sequence</subject><subject>Nucleotides</subject><subject>Ribonucleic acid</subject><subject>RNA</subject><subject>RNA, Long Noncoding - genetics</subject><subject>RNA, Messenger - genetics</subject><issn>1467-5463</issn><issn>1477-4054</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>TOX</sourceid><sourceid>EIF</sourceid><recordid>eNp9kUuLFTEQhYMozji6ci8BQQRpJ51Hp-NuGHzBoCC6birp6pmM6c41j8X46831Xl24cFVV8NXhVB1Cnvbsdc-MOLfenlsLTjFzj5z2UutOMiXv7_tBd0oO4oQ8yvmWMc702D8kJ2KQg5SjPiXfv3y6CP76pryhQFdwN35DGhDS5rdrusYZAy2R-hm34pc7ulUXMJY20wWh1ISZzlgwrf73RpOjuVqHIdQAiYboIPifUHzcHpMHC4SMT471jHx79_br5Yfu6vP7j5cXV50ToypdM2YGDrNRBhDNaJky0ihYRi7R9RY0KCFA97yRyojRcrDOtvOXBfksxRl5edDdpfijYi7T6vPeEWwYa564lpoxJSRv6PN_0NtY09bcTaLvhRw4G1mjXh0ol2LOCZdpl_wK6W7q2bTPYGoZTMcMGv3sqFntivNf9s_TG_DiAMS6-6_SL3OkkDo</recordid><startdate>20230119</startdate><enddate>20230119</enddate><creator>Yuan, Guo-Hua</creator><creator>Wang, Ying</creator><creator>Wang, Guang-Zhong</creator><creator>Yang, Li</creator><general>Oxford University Press</general><general>Oxford Publishing Limited (England)</general><scope>TOX</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QO</scope><scope>7SC</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>K9.</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-8833-7473</orcidid><orcidid>https://orcid.org/0000-0001-6432-8310</orcidid><orcidid>https://orcid.org/0000-0002-8459-3784</orcidid><orcidid>https://orcid.org/0000-0001-5230-4119</orcidid></search><sort><creationdate>20230119</creationdate><title>RNAlight: a machine learning model to identify nucleotide features determining RNA subcellular localization</title><author>Yuan, Guo-Hua ; Wang, Ying ; Wang, Guang-Zhong ; Yang, Li</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c385t-448962ad959aee98b059495af824ec1ba7a533a7124485938b2abcb509ffe2d43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Circular RNA</topic><topic>Learning algorithms</topic><topic>Localization</topic><topic>Machine Learning</topic><topic>Non-coding RNA</topic><topic>Nucleotide sequence</topic><topic>Nucleotides</topic><topic>Ribonucleic acid</topic><topic>RNA</topic><topic>RNA, Long Noncoding - genetics</topic><topic>RNA, Messenger - genetics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yuan, Guo-Hua</creatorcontrib><creatorcontrib>Wang, Ying</creatorcontrib><creatorcontrib>Wang, Guang-Zhong</creatorcontrib><creatorcontrib>Yang, Li</creatorcontrib><collection>Open Access: Oxford University Press Open Journals</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Biotechnology Research Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Briefings in bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yuan, Guo-Hua</au><au>Wang, Ying</au><au>Wang, Guang-Zhong</au><au>Yang, Li</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>RNAlight: a machine learning model to identify nucleotide features determining RNA subcellular localization</atitle><jtitle>Briefings in bioinformatics</jtitle><addtitle>Brief Bioinform</addtitle><date>2023-01-19</date><risdate>2023</risdate><volume>24</volume><issue>1</issue><issn>1467-5463</issn><eissn>1477-4054</eissn><abstract>Abstract Different RNAs have distinct subcellular localizations. However, nucleotide features that determine these distinct distributions of lncRNAs and mRNAs have yet to be fully addressed. Here, we develop RNAlight, a machine learning model based on LightGBM, to identify nucleotide k-mers contributing to the subcellular localizations of mRNAs and lncRNAs. With the Tree SHAP algorithm, RNAlight extracts nucleotide features for cytoplasmic or nuclear localization of RNAs, indicating the sequence basis for distinct RNA subcellular localizations. By assembling k-mers to sequence features and subsequently mapping to known RBP-associated motifs, different types of sequence features and their associated RBPs were additionally uncovered for lncRNAs and mRNAs with distinct subcellular localizations. Finally, we extended RNAlight to precisely predict the subcellular localizations of other types of RNAs, including snRNAs, snoRNAs and different circular RNA transcripts, suggesting the generality of using RNAlight for RNA subcellular localization prediction.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>36464487</pmid><doi>10.1093/bib/bbac509</doi><orcidid>https://orcid.org/0000-0001-8833-7473</orcidid><orcidid>https://orcid.org/0000-0001-6432-8310</orcidid><orcidid>https://orcid.org/0000-0002-8459-3784</orcidid><orcidid>https://orcid.org/0000-0001-5230-4119</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1467-5463
ispartof Briefings in bioinformatics, 2023-01, Vol.24 (1)
issn 1467-5463
1477-4054
language eng
recordid cdi_proquest_miscellaneous_2747005342
source MEDLINE; Business Source Complete; Freely available e-journals; Open Access: Oxford University Press Open Journals; PubMed Central
subjects Algorithms
Circular RNA
Learning algorithms
Localization
Machine Learning
Non-coding RNA
Nucleotide sequence
Nucleotides
Ribonucleic acid
RNA
RNA, Long Noncoding - genetics
RNA, Messenger - genetics
title RNAlight: a machine learning model to identify nucleotide features determining RNA subcellular localization
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T14%3A33%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=RNAlight:%20a%20machine%20learning%20model%20to%20identify%20nucleotide%20features%20determining%20RNA%20subcellular%20localization&rft.jtitle=Briefings%20in%20bioinformatics&rft.au=Yuan,%20Guo-Hua&rft.date=2023-01-19&rft.volume=24&rft.issue=1&rft.issn=1467-5463&rft.eissn=1477-4054&rft_id=info:doi/10.1093/bib/bbac509&rft_dat=%3Cproquest_cross%3E3113462080%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3113462080&rft_id=info:pmid/36464487&rft_oup_id=10.1093/bib/bbac509&rfr_iscdi=true