AtLASS: A Scheme for End-to-End Prediction of Splice Sites Using Attention-based Bi-LSTM

Eukaryotic genomes contain exons and introns, and it is necessary to accurately identify exon-intron boundaries, i.e., splice sites, to annotate genomes. To address this problem, many previous works have proposed annotation methods/tools based on RNA-seq evidence. Many recent works exploit neural ne...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IPSJ Transactions on Bioinformatics 2023, Vol.16, pp.20-27
Hauptverfasser: Harada, Ryo, Kume, Keitaro, Horie, Kazumasa, Nakayama, Takuro, Inagaki, Yuji, Amagasa, Toshiyuki
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 27
container_issue
container_start_page 20
container_title IPSJ Transactions on Bioinformatics
container_volume 16
creator Harada, Ryo
Kume, Keitaro
Horie, Kazumasa
Nakayama, Takuro
Inagaki, Yuji
Amagasa, Toshiyuki
description Eukaryotic genomes contain exons and introns, and it is necessary to accurately identify exon-intron boundaries, i.e., splice sites, to annotate genomes. To address this problem, many previous works have proposed annotation methods/tools based on RNA-seq evidence. Many recent works exploit neural networks (NNs) as their prediction models, but only a few can be used to generate new genome annotation in practice. In this study, we propose AtLASS, a fully automated method for predicting splice sites from genomic and RNA-seq data using attention-based Bi-LSTM (Bidirectional Long Short-Term Memory). We exploit two-stage training on RNA-seq data to address the problem of biased label problem, thereby reducing the false positives. The experiments on the genomes of three species show that the performance of the proposed method itself is comparable to that of existing methods, but we can achieve better performance by combining the outputs of the proposed method and the existing method. The proposed method is the first program specialized in end-to-end splice site prediction using NNs.
doi_str_mv 10.2197/ipsjtbio.16.20
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2957074553</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2957074553</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3330-2c4d3d7a40c4d0f7c52601d9caaf710169e9cded0e2257ba410187fc2901e2353</originalsourceid><addsrcrecordid>eNpVkN1LwzAUxYsoOKevPgd8bk3Sjyy-1TE_oKLQDXwLWXK7pWxNTbIH_3s7qqJP53DP79wLN4quCU4o4ezW9L4Na2MTUiQUn0QTMpvRuCgYP_3jz6ML71uMC45pNoney1CVdX2HSlSrLewBNdahRafjYONB0JsDbVQwtkO2QXW_MwpQbQJ4tPKm26AyBOiOebyWHjS6N3FVL18uo7NG7jxcfes0Wj0slvOnuHp9fJ6XVazSNMUxVZlONZMZHgxumMppgYnmSsqGEUwKDlxp0BgozdlaZsNsxhpFOSZA0zydRjfj3t7ZjwP4IFp7cN1wUlCeM8yyPE8HKhkp5az3DhrRO7OX7lMQLI7fEz_fE6QQFA-Fciy0PsgN_OLSBaN28A_HY-c3U1vpBHTpF3vkepc</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2957074553</pqid></control><display><type>article</type><title>AtLASS: A Scheme for End-to-End Prediction of Splice Sites Using Attention-based Bi-LSTM</title><source>J-STAGE Free</source><source>Freely Accessible Japanese Titles</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Harada, Ryo ; Kume, Keitaro ; Horie, Kazumasa ; Nakayama, Takuro ; Inagaki, Yuji ; Amagasa, Toshiyuki</creator><creatorcontrib>Harada, Ryo ; Kume, Keitaro ; Horie, Kazumasa ; Nakayama, Takuro ; Inagaki, Yuji ; Amagasa, Toshiyuki</creatorcontrib><description>Eukaryotic genomes contain exons and introns, and it is necessary to accurately identify exon-intron boundaries, i.e., splice sites, to annotate genomes. To address this problem, many previous works have proposed annotation methods/tools based on RNA-seq evidence. Many recent works exploit neural networks (NNs) as their prediction models, but only a few can be used to generate new genome annotation in practice. In this study, we propose AtLASS, a fully automated method for predicting splice sites from genomic and RNA-seq data using attention-based Bi-LSTM (Bidirectional Long Short-Term Memory). We exploit two-stage training on RNA-seq data to address the problem of biased label problem, thereby reducing the false positives. The experiments on the genomes of three species show that the performance of the proposed method itself is comparable to that of existing methods, but we can achieve better performance by combining the outputs of the proposed method and the existing method. The proposed method is the first program specialized in end-to-end splice site prediction using NNs.</description><identifier>ISSN: 1882-6679</identifier><identifier>EISSN: 1882-6679</identifier><identifier>DOI: 10.2197/ipsjtbio.16.20</identifier><language>eng</language><publisher>Tokyo: Information Processing Society of Japan</publisher><subject>Annotations ; AtLASS ; deep learning ; Exons ; genome annotation ; Genomes ; intron ; Introns ; Long short-term memory ; Neural networks ; Prediction models ; Ribonucleic acid ; RNA ; splice site</subject><ispartof>IPSJ Transactions on Bioinformatics, 2023, Vol.16, pp.20-27</ispartof><rights>2023 by the Information Processing Society of Japan</rights><rights>Copyright Japan Science and Technology Agency 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3330-2c4d3d7a40c4d0f7c52601d9caaf710169e9cded0e2257ba410187fc2901e2353</citedby><cites>FETCH-LOGICAL-c3330-2c4d3d7a40c4d0f7c52601d9caaf710169e9cded0e2257ba410187fc2901e2353</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,1877,4010,27900,27901,27902</link.rule.ids></links><search><creatorcontrib>Harada, Ryo</creatorcontrib><creatorcontrib>Kume, Keitaro</creatorcontrib><creatorcontrib>Horie, Kazumasa</creatorcontrib><creatorcontrib>Nakayama, Takuro</creatorcontrib><creatorcontrib>Inagaki, Yuji</creatorcontrib><creatorcontrib>Amagasa, Toshiyuki</creatorcontrib><title>AtLASS: A Scheme for End-to-End Prediction of Splice Sites Using Attention-based Bi-LSTM</title><title>IPSJ Transactions on Bioinformatics</title><addtitle>IPSJ Transactions on Bioinformatics</addtitle><description>Eukaryotic genomes contain exons and introns, and it is necessary to accurately identify exon-intron boundaries, i.e., splice sites, to annotate genomes. To address this problem, many previous works have proposed annotation methods/tools based on RNA-seq evidence. Many recent works exploit neural networks (NNs) as their prediction models, but only a few can be used to generate new genome annotation in practice. In this study, we propose AtLASS, a fully automated method for predicting splice sites from genomic and RNA-seq data using attention-based Bi-LSTM (Bidirectional Long Short-Term Memory). We exploit two-stage training on RNA-seq data to address the problem of biased label problem, thereby reducing the false positives. The experiments on the genomes of three species show that the performance of the proposed method itself is comparable to that of existing methods, but we can achieve better performance by combining the outputs of the proposed method and the existing method. The proposed method is the first program specialized in end-to-end splice site prediction using NNs.</description><subject>Annotations</subject><subject>AtLASS</subject><subject>deep learning</subject><subject>Exons</subject><subject>genome annotation</subject><subject>Genomes</subject><subject>intron</subject><subject>Introns</subject><subject>Long short-term memory</subject><subject>Neural networks</subject><subject>Prediction models</subject><subject>Ribonucleic acid</subject><subject>RNA</subject><subject>splice site</subject><issn>1882-6679</issn><issn>1882-6679</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNpVkN1LwzAUxYsoOKevPgd8bk3Sjyy-1TE_oKLQDXwLWXK7pWxNTbIH_3s7qqJP53DP79wLN4quCU4o4ezW9L4Na2MTUiQUn0QTMpvRuCgYP_3jz6ML71uMC45pNoney1CVdX2HSlSrLewBNdahRafjYONB0JsDbVQwtkO2QXW_MwpQbQJ4tPKm26AyBOiOebyWHjS6N3FVL18uo7NG7jxcfes0Wj0slvOnuHp9fJ6XVazSNMUxVZlONZMZHgxumMppgYnmSsqGEUwKDlxp0BgozdlaZsNsxhpFOSZA0zydRjfj3t7ZjwP4IFp7cN1wUlCeM8yyPE8HKhkp5az3DhrRO7OX7lMQLI7fEz_fE6QQFA-Fciy0PsgN_OLSBaN28A_HY-c3U1vpBHTpF3vkepc</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Harada, Ryo</creator><creator>Kume, Keitaro</creator><creator>Horie, Kazumasa</creator><creator>Nakayama, Takuro</creator><creator>Inagaki, Yuji</creator><creator>Amagasa, Toshiyuki</creator><general>Information Processing Society of Japan</general><general>Japan Science and Technology Agency</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7QO</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope></search><sort><creationdate>2023</creationdate><title>AtLASS: A Scheme for End-to-End Prediction of Splice Sites Using Attention-based Bi-LSTM</title><author>Harada, Ryo ; Kume, Keitaro ; Horie, Kazumasa ; Nakayama, Takuro ; Inagaki, Yuji ; Amagasa, Toshiyuki</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3330-2c4d3d7a40c4d0f7c52601d9caaf710169e9cded0e2257ba410187fc2901e2353</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Annotations</topic><topic>AtLASS</topic><topic>deep learning</topic><topic>Exons</topic><topic>genome annotation</topic><topic>Genomes</topic><topic>intron</topic><topic>Introns</topic><topic>Long short-term memory</topic><topic>Neural networks</topic><topic>Prediction models</topic><topic>Ribonucleic acid</topic><topic>RNA</topic><topic>splice site</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Harada, Ryo</creatorcontrib><creatorcontrib>Kume, Keitaro</creatorcontrib><creatorcontrib>Horie, Kazumasa</creatorcontrib><creatorcontrib>Nakayama, Takuro</creatorcontrib><creatorcontrib>Inagaki, Yuji</creatorcontrib><creatorcontrib>Amagasa, Toshiyuki</creatorcontrib><collection>CrossRef</collection><collection>Biotechnology Research Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><jtitle>IPSJ Transactions on Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Harada, Ryo</au><au>Kume, Keitaro</au><au>Horie, Kazumasa</au><au>Nakayama, Takuro</au><au>Inagaki, Yuji</au><au>Amagasa, Toshiyuki</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>AtLASS: A Scheme for End-to-End Prediction of Splice Sites Using Attention-based Bi-LSTM</atitle><jtitle>IPSJ Transactions on Bioinformatics</jtitle><addtitle>IPSJ Transactions on Bioinformatics</addtitle><date>2023</date><risdate>2023</risdate><volume>16</volume><spage>20</spage><epage>27</epage><pages>20-27</pages><issn>1882-6679</issn><eissn>1882-6679</eissn><abstract>Eukaryotic genomes contain exons and introns, and it is necessary to accurately identify exon-intron boundaries, i.e., splice sites, to annotate genomes. To address this problem, many previous works have proposed annotation methods/tools based on RNA-seq evidence. Many recent works exploit neural networks (NNs) as their prediction models, but only a few can be used to generate new genome annotation in practice. In this study, we propose AtLASS, a fully automated method for predicting splice sites from genomic and RNA-seq data using attention-based Bi-LSTM (Bidirectional Long Short-Term Memory). We exploit two-stage training on RNA-seq data to address the problem of biased label problem, thereby reducing the false positives. The experiments on the genomes of three species show that the performance of the proposed method itself is comparable to that of existing methods, but we can achieve better performance by combining the outputs of the proposed method and the existing method. The proposed method is the first program specialized in end-to-end splice site prediction using NNs.</abstract><cop>Tokyo</cop><pub>Information Processing Society of Japan</pub><doi>10.2197/ipsjtbio.16.20</doi><tpages>8</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1882-6679
ispartof IPSJ Transactions on Bioinformatics, 2023, Vol.16, pp.20-27
issn 1882-6679
1882-6679
language eng
recordid cdi_proquest_journals_2957074553
source J-STAGE Free; Freely Accessible Japanese Titles; EZB-FREE-00999 freely available EZB journals
subjects Annotations
AtLASS
deep learning
Exons
genome annotation
Genomes
intron
Introns
Long short-term memory
Neural networks
Prediction models
Ribonucleic acid
RNA
splice site
title AtLASS: A Scheme for End-to-End Prediction of Splice Sites Using Attention-based Bi-LSTM
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T22%3A50%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=AtLASS:%20A%20Scheme%20for%20End-to-End%20Prediction%20of%20Splice%20Sites%20Using%20Attention-based%20Bi-LSTM&rft.jtitle=IPSJ%20Transactions%20on%20Bioinformatics&rft.au=Harada,%20Ryo&rft.date=2023&rft.volume=16&rft.spage=20&rft.epage=27&rft.pages=20-27&rft.issn=1882-6679&rft.eissn=1882-6679&rft_id=info:doi/10.2197/ipsjtbio.16.20&rft_dat=%3Cproquest_cross%3E2957074553%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2957074553&rft_id=info:pmid/&rfr_iscdi=true