AtLASS: A Scheme for End-to-End Prediction of Splice Sites Using Attention-based Bi-LSTM
Eukaryotic genomes contain exons and introns, and it is necessary to accurately identify exon-intron boundaries, i.e., splice sites, to annotate genomes. To address this problem, many previous works have proposed annotation methods/tools based on RNA-seq evidence. Many recent works exploit neural ne...
Gespeichert in:
Veröffentlicht in: | IPSJ Transactions on Bioinformatics 2023, Vol.16, pp.20-27 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 27 |
---|---|
container_issue | |
container_start_page | 20 |
container_title | IPSJ Transactions on Bioinformatics |
container_volume | 16 |
creator | Harada, Ryo Kume, Keitaro Horie, Kazumasa Nakayama, Takuro Inagaki, Yuji Amagasa, Toshiyuki |
description | Eukaryotic genomes contain exons and introns, and it is necessary to accurately identify exon-intron boundaries, i.e., splice sites, to annotate genomes. To address this problem, many previous works have proposed annotation methods/tools based on RNA-seq evidence. Many recent works exploit neural networks (NNs) as their prediction models, but only a few can be used to generate new genome annotation in practice. In this study, we propose AtLASS, a fully automated method for predicting splice sites from genomic and RNA-seq data using attention-based Bi-LSTM (Bidirectional Long Short-Term Memory). We exploit two-stage training on RNA-seq data to address the problem of biased label problem, thereby reducing the false positives. The experiments on the genomes of three species show that the performance of the proposed method itself is comparable to that of existing methods, but we can achieve better performance by combining the outputs of the proposed method and the existing method. The proposed method is the first program specialized in end-to-end splice site prediction using NNs. |
doi_str_mv | 10.2197/ipsjtbio.16.20 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2957074553</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2957074553</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3330-2c4d3d7a40c4d0f7c52601d9caaf710169e9cded0e2257ba410187fc2901e2353</originalsourceid><addsrcrecordid>eNpVkN1LwzAUxYsoOKevPgd8bk3Sjyy-1TE_oKLQDXwLWXK7pWxNTbIH_3s7qqJP53DP79wLN4quCU4o4ezW9L4Na2MTUiQUn0QTMpvRuCgYP_3jz6ML71uMC45pNoney1CVdX2HSlSrLewBNdahRafjYONB0JsDbVQwtkO2QXW_MwpQbQJ4tPKm26AyBOiOebyWHjS6N3FVL18uo7NG7jxcfes0Wj0slvOnuHp9fJ6XVazSNMUxVZlONZMZHgxumMppgYnmSsqGEUwKDlxp0BgozdlaZsNsxhpFOSZA0zydRjfj3t7ZjwP4IFp7cN1wUlCeM8yyPE8HKhkp5az3DhrRO7OX7lMQLI7fEz_fE6QQFA-Fciy0PsgN_OLSBaN28A_HY-c3U1vpBHTpF3vkepc</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2957074553</pqid></control><display><type>article</type><title>AtLASS: A Scheme for End-to-End Prediction of Splice Sites Using Attention-based Bi-LSTM</title><source>J-STAGE Free</source><source>Freely Accessible Japanese Titles</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Harada, Ryo ; Kume, Keitaro ; Horie, Kazumasa ; Nakayama, Takuro ; Inagaki, Yuji ; Amagasa, Toshiyuki</creator><creatorcontrib>Harada, Ryo ; Kume, Keitaro ; Horie, Kazumasa ; Nakayama, Takuro ; Inagaki, Yuji ; Amagasa, Toshiyuki</creatorcontrib><description>Eukaryotic genomes contain exons and introns, and it is necessary to accurately identify exon-intron boundaries, i.e., splice sites, to annotate genomes. To address this problem, many previous works have proposed annotation methods/tools based on RNA-seq evidence. Many recent works exploit neural networks (NNs) as their prediction models, but only a few can be used to generate new genome annotation in practice. In this study, we propose AtLASS, a fully automated method for predicting splice sites from genomic and RNA-seq data using attention-based Bi-LSTM (Bidirectional Long Short-Term Memory). We exploit two-stage training on RNA-seq data to address the problem of biased label problem, thereby reducing the false positives. The experiments on the genomes of three species show that the performance of the proposed method itself is comparable to that of existing methods, but we can achieve better performance by combining the outputs of the proposed method and the existing method. The proposed method is the first program specialized in end-to-end splice site prediction using NNs.</description><identifier>ISSN: 1882-6679</identifier><identifier>EISSN: 1882-6679</identifier><identifier>DOI: 10.2197/ipsjtbio.16.20</identifier><language>eng</language><publisher>Tokyo: Information Processing Society of Japan</publisher><subject>Annotations ; AtLASS ; deep learning ; Exons ; genome annotation ; Genomes ; intron ; Introns ; Long short-term memory ; Neural networks ; Prediction models ; Ribonucleic acid ; RNA ; splice site</subject><ispartof>IPSJ Transactions on Bioinformatics, 2023, Vol.16, pp.20-27</ispartof><rights>2023 by the Information Processing Society of Japan</rights><rights>Copyright Japan Science and Technology Agency 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3330-2c4d3d7a40c4d0f7c52601d9caaf710169e9cded0e2257ba410187fc2901e2353</citedby><cites>FETCH-LOGICAL-c3330-2c4d3d7a40c4d0f7c52601d9caaf710169e9cded0e2257ba410187fc2901e2353</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,1877,4010,27900,27901,27902</link.rule.ids></links><search><creatorcontrib>Harada, Ryo</creatorcontrib><creatorcontrib>Kume, Keitaro</creatorcontrib><creatorcontrib>Horie, Kazumasa</creatorcontrib><creatorcontrib>Nakayama, Takuro</creatorcontrib><creatorcontrib>Inagaki, Yuji</creatorcontrib><creatorcontrib>Amagasa, Toshiyuki</creatorcontrib><title>AtLASS: A Scheme for End-to-End Prediction of Splice Sites Using Attention-based Bi-LSTM</title><title>IPSJ Transactions on Bioinformatics</title><addtitle>IPSJ Transactions on Bioinformatics</addtitle><description>Eukaryotic genomes contain exons and introns, and it is necessary to accurately identify exon-intron boundaries, i.e., splice sites, to annotate genomes. To address this problem, many previous works have proposed annotation methods/tools based on RNA-seq evidence. Many recent works exploit neural networks (NNs) as their prediction models, but only a few can be used to generate new genome annotation in practice. In this study, we propose AtLASS, a fully automated method for predicting splice sites from genomic and RNA-seq data using attention-based Bi-LSTM (Bidirectional Long Short-Term Memory). We exploit two-stage training on RNA-seq data to address the problem of biased label problem, thereby reducing the false positives. The experiments on the genomes of three species show that the performance of the proposed method itself is comparable to that of existing methods, but we can achieve better performance by combining the outputs of the proposed method and the existing method. The proposed method is the first program specialized in end-to-end splice site prediction using NNs.</description><subject>Annotations</subject><subject>AtLASS</subject><subject>deep learning</subject><subject>Exons</subject><subject>genome annotation</subject><subject>Genomes</subject><subject>intron</subject><subject>Introns</subject><subject>Long short-term memory</subject><subject>Neural networks</subject><subject>Prediction models</subject><subject>Ribonucleic acid</subject><subject>RNA</subject><subject>splice site</subject><issn>1882-6679</issn><issn>1882-6679</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNpVkN1LwzAUxYsoOKevPgd8bk3Sjyy-1TE_oKLQDXwLWXK7pWxNTbIH_3s7qqJP53DP79wLN4quCU4o4ezW9L4Na2MTUiQUn0QTMpvRuCgYP_3jz6ML71uMC45pNoney1CVdX2HSlSrLewBNdahRafjYONB0JsDbVQwtkO2QXW_MwpQbQJ4tPKm26AyBOiOebyWHjS6N3FVL18uo7NG7jxcfes0Wj0slvOnuHp9fJ6XVazSNMUxVZlONZMZHgxumMppgYnmSsqGEUwKDlxp0BgozdlaZsNsxhpFOSZA0zydRjfj3t7ZjwP4IFp7cN1wUlCeM8yyPE8HKhkp5az3DhrRO7OX7lMQLI7fEz_fE6QQFA-Fciy0PsgN_OLSBaN28A_HY-c3U1vpBHTpF3vkepc</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Harada, Ryo</creator><creator>Kume, Keitaro</creator><creator>Horie, Kazumasa</creator><creator>Nakayama, Takuro</creator><creator>Inagaki, Yuji</creator><creator>Amagasa, Toshiyuki</creator><general>Information Processing Society of Japan</general><general>Japan Science and Technology Agency</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7QO</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope></search><sort><creationdate>2023</creationdate><title>AtLASS: A Scheme for End-to-End Prediction of Splice Sites Using Attention-based Bi-LSTM</title><author>Harada, Ryo ; Kume, Keitaro ; Horie, Kazumasa ; Nakayama, Takuro ; Inagaki, Yuji ; Amagasa, Toshiyuki</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3330-2c4d3d7a40c4d0f7c52601d9caaf710169e9cded0e2257ba410187fc2901e2353</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Annotations</topic><topic>AtLASS</topic><topic>deep learning</topic><topic>Exons</topic><topic>genome annotation</topic><topic>Genomes</topic><topic>intron</topic><topic>Introns</topic><topic>Long short-term memory</topic><topic>Neural networks</topic><topic>Prediction models</topic><topic>Ribonucleic acid</topic><topic>RNA</topic><topic>splice site</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Harada, Ryo</creatorcontrib><creatorcontrib>Kume, Keitaro</creatorcontrib><creatorcontrib>Horie, Kazumasa</creatorcontrib><creatorcontrib>Nakayama, Takuro</creatorcontrib><creatorcontrib>Inagaki, Yuji</creatorcontrib><creatorcontrib>Amagasa, Toshiyuki</creatorcontrib><collection>CrossRef</collection><collection>Biotechnology Research Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><jtitle>IPSJ Transactions on Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Harada, Ryo</au><au>Kume, Keitaro</au><au>Horie, Kazumasa</au><au>Nakayama, Takuro</au><au>Inagaki, Yuji</au><au>Amagasa, Toshiyuki</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>AtLASS: A Scheme for End-to-End Prediction of Splice Sites Using Attention-based Bi-LSTM</atitle><jtitle>IPSJ Transactions on Bioinformatics</jtitle><addtitle>IPSJ Transactions on Bioinformatics</addtitle><date>2023</date><risdate>2023</risdate><volume>16</volume><spage>20</spage><epage>27</epage><pages>20-27</pages><issn>1882-6679</issn><eissn>1882-6679</eissn><abstract>Eukaryotic genomes contain exons and introns, and it is necessary to accurately identify exon-intron boundaries, i.e., splice sites, to annotate genomes. To address this problem, many previous works have proposed annotation methods/tools based on RNA-seq evidence. Many recent works exploit neural networks (NNs) as their prediction models, but only a few can be used to generate new genome annotation in practice. In this study, we propose AtLASS, a fully automated method for predicting splice sites from genomic and RNA-seq data using attention-based Bi-LSTM (Bidirectional Long Short-Term Memory). We exploit two-stage training on RNA-seq data to address the problem of biased label problem, thereby reducing the false positives. The experiments on the genomes of three species show that the performance of the proposed method itself is comparable to that of existing methods, but we can achieve better performance by combining the outputs of the proposed method and the existing method. The proposed method is the first program specialized in end-to-end splice site prediction using NNs.</abstract><cop>Tokyo</cop><pub>Information Processing Society of Japan</pub><doi>10.2197/ipsjtbio.16.20</doi><tpages>8</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1882-6679 |
ispartof | IPSJ Transactions on Bioinformatics, 2023, Vol.16, pp.20-27 |
issn | 1882-6679 1882-6679 |
language | eng |
recordid | cdi_proquest_journals_2957074553 |
source | J-STAGE Free; Freely Accessible Japanese Titles; EZB-FREE-00999 freely available EZB journals |
subjects | Annotations AtLASS deep learning Exons genome annotation Genomes intron Introns Long short-term memory Neural networks Prediction models Ribonucleic acid RNA splice site |
title | AtLASS: A Scheme for End-to-End Prediction of Splice Sites Using Attention-based Bi-LSTM |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T22%3A50%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=AtLASS:%20A%20Scheme%20for%20End-to-End%20Prediction%20of%20Splice%20Sites%20Using%20Attention-based%20Bi-LSTM&rft.jtitle=IPSJ%20Transactions%20on%20Bioinformatics&rft.au=Harada,%20Ryo&rft.date=2023&rft.volume=16&rft.spage=20&rft.epage=27&rft.pages=20-27&rft.issn=1882-6679&rft.eissn=1882-6679&rft_id=info:doi/10.2197/ipsjtbio.16.20&rft_dat=%3Cproquest_cross%3E2957074553%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2957074553&rft_id=info:pmid/&rfr_iscdi=true |