RNAlight: a machine learning model to identify nucleotide features determining RNA subcellular localization
Abstract Different RNAs have distinct subcellular localizations. However, nucleotide features that determine these distinct distributions of lncRNAs and mRNAs have yet to be fully addressed. Here, we develop RNAlight, a machine learning model based on LightGBM, to identify nucleotide k-mers contribu...
Gespeichert in:
Veröffentlicht in: | Briefings in bioinformatics 2023-01, Vol.24 (1) |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | 1 |
container_start_page | |
container_title | Briefings in bioinformatics |
container_volume | 24 |
creator | Yuan, Guo-Hua Wang, Ying Wang, Guang-Zhong Yang, Li |
description | Abstract
Different RNAs have distinct subcellular localizations. However, nucleotide features that determine these distinct distributions of lncRNAs and mRNAs have yet to be fully addressed. Here, we develop RNAlight, a machine learning model based on LightGBM, to identify nucleotide k-mers contributing to the subcellular localizations of mRNAs and lncRNAs. With the Tree SHAP algorithm, RNAlight extracts nucleotide features for cytoplasmic or nuclear localization of RNAs, indicating the sequence basis for distinct RNA subcellular localizations. By assembling k-mers to sequence features and subsequently mapping to known RBP-associated motifs, different types of sequence features and their associated RBPs were additionally uncovered for lncRNAs and mRNAs with distinct subcellular localizations. Finally, we extended RNAlight to precisely predict the subcellular localizations of other types of RNAs, including snRNAs, snoRNAs and different circular RNA transcripts, suggesting the generality of using RNAlight for RNA subcellular localization prediction. |
doi_str_mv | 10.1093/bib/bbac509 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2747005342</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bib/bbac509</oup_id><sourcerecordid>3113462080</sourcerecordid><originalsourceid>FETCH-LOGICAL-c385t-448962ad959aee98b059495af824ec1ba7a533a7124485938b2abcb509ffe2d43</originalsourceid><addsrcrecordid>eNp9kUuLFTEQhYMozji6ci8BQQRpJ51Hp-NuGHzBoCC6birp6pmM6c41j8X46831Xl24cFVV8NXhVB1Cnvbsdc-MOLfenlsLTjFzj5z2UutOMiXv7_tBd0oO4oQ8yvmWMc702D8kJ2KQg5SjPiXfv3y6CP76pryhQFdwN35DGhDS5rdrusYZAy2R-hm34pc7ulUXMJY20wWh1ISZzlgwrf73RpOjuVqHIdQAiYboIPifUHzcHpMHC4SMT471jHx79_br5Yfu6vP7j5cXV50ToypdM2YGDrNRBhDNaJky0ihYRi7R9RY0KCFA97yRyojRcrDOtvOXBfksxRl5edDdpfijYi7T6vPeEWwYa564lpoxJSRv6PN_0NtY09bcTaLvhRw4G1mjXh0ol2LOCZdpl_wK6W7q2bTPYGoZTMcMGv3sqFntivNf9s_TG_DiAMS6-6_SL3OkkDo</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3113462080</pqid></control><display><type>article</type><title>RNAlight: a machine learning model to identify nucleotide features determining RNA subcellular localization</title><source>MEDLINE</source><source>Business Source Complete</source><source>Freely available e-journals</source><source>Open Access: Oxford University Press Open Journals</source><source>PubMed Central</source><creator>Yuan, Guo-Hua ; Wang, Ying ; Wang, Guang-Zhong ; Yang, Li</creator><creatorcontrib>Yuan, Guo-Hua ; Wang, Ying ; Wang, Guang-Zhong ; Yang, Li</creatorcontrib><description>Abstract
Different RNAs have distinct subcellular localizations. However, nucleotide features that determine these distinct distributions of lncRNAs and mRNAs have yet to be fully addressed. Here, we develop RNAlight, a machine learning model based on LightGBM, to identify nucleotide k-mers contributing to the subcellular localizations of mRNAs and lncRNAs. With the Tree SHAP algorithm, RNAlight extracts nucleotide features for cytoplasmic or nuclear localization of RNAs, indicating the sequence basis for distinct RNA subcellular localizations. By assembling k-mers to sequence features and subsequently mapping to known RBP-associated motifs, different types of sequence features and their associated RBPs were additionally uncovered for lncRNAs and mRNAs with distinct subcellular localizations. Finally, we extended RNAlight to precisely predict the subcellular localizations of other types of RNAs, including snRNAs, snoRNAs and different circular RNA transcripts, suggesting the generality of using RNAlight for RNA subcellular localization prediction.</description><identifier>ISSN: 1467-5463</identifier><identifier>EISSN: 1477-4054</identifier><identifier>DOI: 10.1093/bib/bbac509</identifier><identifier>PMID: 36464487</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Algorithms ; Circular RNA ; Learning algorithms ; Localization ; Machine Learning ; Non-coding RNA ; Nucleotide sequence ; Nucleotides ; Ribonucleic acid ; RNA ; RNA, Long Noncoding - genetics ; RNA, Messenger - genetics</subject><ispartof>Briefings in bioinformatics, 2023-01, Vol.24 (1)</ispartof><rights>The Author(s) 2022. Published by Oxford University Press. 2022</rights><rights>The Author(s) 2022. Published by Oxford University Press.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c385t-448962ad959aee98b059495af824ec1ba7a533a7124485938b2abcb509ffe2d43</citedby><cites>FETCH-LOGICAL-c385t-448962ad959aee98b059495af824ec1ba7a533a7124485938b2abcb509ffe2d43</cites><orcidid>0000-0001-8833-7473 ; 0000-0001-6432-8310 ; 0000-0002-8459-3784 ; 0000-0001-5230-4119</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,1598,27903,27904</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/36464487$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Yuan, Guo-Hua</creatorcontrib><creatorcontrib>Wang, Ying</creatorcontrib><creatorcontrib>Wang, Guang-Zhong</creatorcontrib><creatorcontrib>Yang, Li</creatorcontrib><title>RNAlight: a machine learning model to identify nucleotide features determining RNA subcellular localization</title><title>Briefings in bioinformatics</title><addtitle>Brief Bioinform</addtitle><description>Abstract
Different RNAs have distinct subcellular localizations. However, nucleotide features that determine these distinct distributions of lncRNAs and mRNAs have yet to be fully addressed. Here, we develop RNAlight, a machine learning model based on LightGBM, to identify nucleotide k-mers contributing to the subcellular localizations of mRNAs and lncRNAs. With the Tree SHAP algorithm, RNAlight extracts nucleotide features for cytoplasmic or nuclear localization of RNAs, indicating the sequence basis for distinct RNA subcellular localizations. By assembling k-mers to sequence features and subsequently mapping to known RBP-associated motifs, different types of sequence features and their associated RBPs were additionally uncovered for lncRNAs and mRNAs with distinct subcellular localizations. Finally, we extended RNAlight to precisely predict the subcellular localizations of other types of RNAs, including snRNAs, snoRNAs and different circular RNA transcripts, suggesting the generality of using RNAlight for RNA subcellular localization prediction.</description><subject>Algorithms</subject><subject>Circular RNA</subject><subject>Learning algorithms</subject><subject>Localization</subject><subject>Machine Learning</subject><subject>Non-coding RNA</subject><subject>Nucleotide sequence</subject><subject>Nucleotides</subject><subject>Ribonucleic acid</subject><subject>RNA</subject><subject>RNA, Long Noncoding - genetics</subject><subject>RNA, Messenger - genetics</subject><issn>1467-5463</issn><issn>1477-4054</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>TOX</sourceid><sourceid>EIF</sourceid><recordid>eNp9kUuLFTEQhYMozji6ci8BQQRpJ51Hp-NuGHzBoCC6birp6pmM6c41j8X46831Xl24cFVV8NXhVB1Cnvbsdc-MOLfenlsLTjFzj5z2UutOMiXv7_tBd0oO4oQ8yvmWMc702D8kJ2KQg5SjPiXfv3y6CP76pryhQFdwN35DGhDS5rdrusYZAy2R-hm34pc7ulUXMJY20wWh1ISZzlgwrf73RpOjuVqHIdQAiYboIPifUHzcHpMHC4SMT471jHx79_br5Yfu6vP7j5cXV50ToypdM2YGDrNRBhDNaJky0ihYRi7R9RY0KCFA97yRyojRcrDOtvOXBfksxRl5edDdpfijYi7T6vPeEWwYa564lpoxJSRv6PN_0NtY09bcTaLvhRw4G1mjXh0ol2LOCZdpl_wK6W7q2bTPYGoZTMcMGv3sqFntivNf9s_TG_DiAMS6-6_SL3OkkDo</recordid><startdate>20230119</startdate><enddate>20230119</enddate><creator>Yuan, Guo-Hua</creator><creator>Wang, Ying</creator><creator>Wang, Guang-Zhong</creator><creator>Yang, Li</creator><general>Oxford University Press</general><general>Oxford Publishing Limited (England)</general><scope>TOX</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QO</scope><scope>7SC</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>K9.</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-8833-7473</orcidid><orcidid>https://orcid.org/0000-0001-6432-8310</orcidid><orcidid>https://orcid.org/0000-0002-8459-3784</orcidid><orcidid>https://orcid.org/0000-0001-5230-4119</orcidid></search><sort><creationdate>20230119</creationdate><title>RNAlight: a machine learning model to identify nucleotide features determining RNA subcellular localization</title><author>Yuan, Guo-Hua ; Wang, Ying ; Wang, Guang-Zhong ; Yang, Li</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c385t-448962ad959aee98b059495af824ec1ba7a533a7124485938b2abcb509ffe2d43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Circular RNA</topic><topic>Learning algorithms</topic><topic>Localization</topic><topic>Machine Learning</topic><topic>Non-coding RNA</topic><topic>Nucleotide sequence</topic><topic>Nucleotides</topic><topic>Ribonucleic acid</topic><topic>RNA</topic><topic>RNA, Long Noncoding - genetics</topic><topic>RNA, Messenger - genetics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yuan, Guo-Hua</creatorcontrib><creatorcontrib>Wang, Ying</creatorcontrib><creatorcontrib>Wang, Guang-Zhong</creatorcontrib><creatorcontrib>Yang, Li</creatorcontrib><collection>Open Access: Oxford University Press Open Journals</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Biotechnology Research Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Briefings in bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yuan, Guo-Hua</au><au>Wang, Ying</au><au>Wang, Guang-Zhong</au><au>Yang, Li</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>RNAlight: a machine learning model to identify nucleotide features determining RNA subcellular localization</atitle><jtitle>Briefings in bioinformatics</jtitle><addtitle>Brief Bioinform</addtitle><date>2023-01-19</date><risdate>2023</risdate><volume>24</volume><issue>1</issue><issn>1467-5463</issn><eissn>1477-4054</eissn><abstract>Abstract
Different RNAs have distinct subcellular localizations. However, nucleotide features that determine these distinct distributions of lncRNAs and mRNAs have yet to be fully addressed. Here, we develop RNAlight, a machine learning model based on LightGBM, to identify nucleotide k-mers contributing to the subcellular localizations of mRNAs and lncRNAs. With the Tree SHAP algorithm, RNAlight extracts nucleotide features for cytoplasmic or nuclear localization of RNAs, indicating the sequence basis for distinct RNA subcellular localizations. By assembling k-mers to sequence features and subsequently mapping to known RBP-associated motifs, different types of sequence features and their associated RBPs were additionally uncovered for lncRNAs and mRNAs with distinct subcellular localizations. Finally, we extended RNAlight to precisely predict the subcellular localizations of other types of RNAs, including snRNAs, snoRNAs and different circular RNA transcripts, suggesting the generality of using RNAlight for RNA subcellular localization prediction.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>36464487</pmid><doi>10.1093/bib/bbac509</doi><orcidid>https://orcid.org/0000-0001-8833-7473</orcidid><orcidid>https://orcid.org/0000-0001-6432-8310</orcidid><orcidid>https://orcid.org/0000-0002-8459-3784</orcidid><orcidid>https://orcid.org/0000-0001-5230-4119</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1467-5463 |
ispartof | Briefings in bioinformatics, 2023-01, Vol.24 (1) |
issn | 1467-5463 1477-4054 |
language | eng |
recordid | cdi_proquest_miscellaneous_2747005342 |
source | MEDLINE; Business Source Complete; Freely available e-journals; Open Access: Oxford University Press Open Journals; PubMed Central |
subjects | Algorithms Circular RNA Learning algorithms Localization Machine Learning Non-coding RNA Nucleotide sequence Nucleotides Ribonucleic acid RNA RNA, Long Noncoding - genetics RNA, Messenger - genetics |
title | RNAlight: a machine learning model to identify nucleotide features determining RNA subcellular localization |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T14%3A33%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=RNAlight:%20a%20machine%20learning%20model%20to%20identify%20nucleotide%20features%20determining%20RNA%20subcellular%20localization&rft.jtitle=Briefings%20in%20bioinformatics&rft.au=Yuan,%20Guo-Hua&rft.date=2023-01-19&rft.volume=24&rft.issue=1&rft.issn=1467-5463&rft.eissn=1477-4054&rft_id=info:doi/10.1093/bib/bbac509&rft_dat=%3Cproquest_cross%3E3113462080%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3113462080&rft_id=info:pmid/36464487&rft_oup_id=10.1093/bib/bbac509&rfr_iscdi=true |