pysster: classification of biological sequences by learning sequence and structure motifs with convolutional neural networks

Abstract Summary Convolutional neural networks (CNNs) have been shown to perform exceptionally well in a variety of tasks, including biological sequence classification. Available implementations, however, are usually optimized for a particular task and difficult to reuse. To enable researchers to ut...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Bioinformatics 2018-09, Vol.34 (17), p.3035-3037
Hauptverfasser: Budach, Stefan, Marsico, Annalisa
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 3037
container_issue 17
container_start_page 3035
container_title Bioinformatics
container_volume 34
creator Budach, Stefan
Marsico, Annalisa
description Abstract Summary Convolutional neural networks (CNNs) have been shown to perform exceptionally well in a variety of tasks, including biological sequence classification. Available implementations, however, are usually optimized for a particular task and difficult to reuse. To enable researchers to utilize these networks more easily, we implemented pysster, a Python package for training CNNs on biological sequence data. Sequences are classified by learning sequence and structure motifs and the package offers an automated hyper-parameter optimization procedure and options to visualize learned motifs along with information about their positional and class enrichment. The package runs seamlessly on CPU and GPU and provides a simple interface to train and evaluate a network with a handful lines of code. Using an RNA A-to-I editing dataset and cross-linking immunoprecipitation (CLIP)-seq binding site sequences, we demonstrate that pysster classifies sequences with higher accuracy than previous methods, such as GraphProt or ssHMM, and is able to recover known sequence and structure motifs. Availability and implementation pysster is freely available at https://github.com/budach/pysster. Supplementary information Supplementary data are available at Bioinformatics online.
doi_str_mv 10.1093/bioinformatics/bty222
format Article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_6129303</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bioinformatics/bty222</oup_id><sourcerecordid>2026410205</sourcerecordid><originalsourceid>FETCH-LOGICAL-c452t-ae245b63e84e4d3b5f78f0c3d01b9bd14cc97e9087bad4d24b8593878e7e1cbe3</originalsourceid><addsrcrecordid>eNqNkctuFTEMhiMEoqXwCKAs2QzNbS5hgYQqblIlNnQdJRnPaWAmOcSZVkfqwzftKUftjpUt5_dnxz8hbzn7wJmWpy6kEKeUF1uCx1NXdkKIZ-SYq441grX6ec1l1zdqYPKIvEL8zVjLlVIvyZHQXat7ro_JzXaHWCB_pH62iGEKvgJTpGmidcScNrUwU4S_K0QPSN2OzmBzDHFzqFIbR4olr76sGeiSSpiQXodySX2KV2le75AVE2HN96Fcp_wHX5MXk50R3jzEE3Lx9cuvs-_N-c9vP84-nzdetaI0FoRqXSdhUKBG6dqpHybm5ci4027kynvdg2ZD7-yoRqHc0Go59AP0wL0DeUI-7bnb1S0weoilrmG2OSw270yywTx9ieHSbNKV6bjQkskKeP8AyKl-GYtZAnqYZxshrWgEE53irJ69Stu91OeEmGE6jOHM3Dlnnjpn9s7VvnePdzx0_bOqCthekNbtfzJvASk_ses</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2026410205</pqid></control><display><type>article</type><title>pysster: classification of biological sequences by learning sequence and structure motifs with convolutional neural networks</title><source>Access via Oxford University Press (Open Access Collection)</source><source>EZB-FREE-00999 freely available EZB journals</source><source>PubMed Central</source><source>Alma/SFX Local Collection</source><creator>Budach, Stefan ; Marsico, Annalisa</creator><contributor>Hancock, John</contributor><creatorcontrib>Budach, Stefan ; Marsico, Annalisa ; Hancock, John</creatorcontrib><description>Abstract Summary Convolutional neural networks (CNNs) have been shown to perform exceptionally well in a variety of tasks, including biological sequence classification. Available implementations, however, are usually optimized for a particular task and difficult to reuse. To enable researchers to utilize these networks more easily, we implemented pysster, a Python package for training CNNs on biological sequence data. Sequences are classified by learning sequence and structure motifs and the package offers an automated hyper-parameter optimization procedure and options to visualize learned motifs along with information about their positional and class enrichment. The package runs seamlessly on CPU and GPU and provides a simple interface to train and evaluate a network with a handful lines of code. Using an RNA A-to-I editing dataset and cross-linking immunoprecipitation (CLIP)-seq binding site sequences, we demonstrate that pysster classifies sequences with higher accuracy than previous methods, such as GraphProt or ssHMM, and is able to recover known sequence and structure motifs. Availability and implementation pysster is freely available at https://github.com/budach/pysster. Supplementary information Supplementary data are available at Bioinformatics online.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1460-2059</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/bty222</identifier><identifier>PMID: 29659719</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Applications Notes</subject><ispartof>Bioinformatics, 2018-09, Vol.34 (17), p.3035-3037</ispartof><rights>The Author(s) 2018. Published by Oxford University Press. 2018</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c452t-ae245b63e84e4d3b5f78f0c3d01b9bd14cc97e9087bad4d24b8593878e7e1cbe3</citedby><cites>FETCH-LOGICAL-c452t-ae245b63e84e4d3b5f78f0c3d01b9bd14cc97e9087bad4d24b8593878e7e1cbe3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC6129303/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC6129303/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,315,728,781,785,886,1605,27928,27929,53795,53797</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/29659719$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Hancock, John</contributor><creatorcontrib>Budach, Stefan</creatorcontrib><creatorcontrib>Marsico, Annalisa</creatorcontrib><title>pysster: classification of biological sequences by learning sequence and structure motifs with convolutional neural networks</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>Abstract Summary Convolutional neural networks (CNNs) have been shown to perform exceptionally well in a variety of tasks, including biological sequence classification. Available implementations, however, are usually optimized for a particular task and difficult to reuse. To enable researchers to utilize these networks more easily, we implemented pysster, a Python package for training CNNs on biological sequence data. Sequences are classified by learning sequence and structure motifs and the package offers an automated hyper-parameter optimization procedure and options to visualize learned motifs along with information about their positional and class enrichment. The package runs seamlessly on CPU and GPU and provides a simple interface to train and evaluate a network with a handful lines of code. Using an RNA A-to-I editing dataset and cross-linking immunoprecipitation (CLIP)-seq binding site sequences, we demonstrate that pysster classifies sequences with higher accuracy than previous methods, such as GraphProt or ssHMM, and is able to recover known sequence and structure motifs. Availability and implementation pysster is freely available at https://github.com/budach/pysster. Supplementary information Supplementary data are available at Bioinformatics online.</description><subject>Applications Notes</subject><issn>1367-4803</issn><issn>1460-2059</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>TOX</sourceid><recordid>eNqNkctuFTEMhiMEoqXwCKAs2QzNbS5hgYQqblIlNnQdJRnPaWAmOcSZVkfqwzftKUftjpUt5_dnxz8hbzn7wJmWpy6kEKeUF1uCx1NXdkKIZ-SYq441grX6ec1l1zdqYPKIvEL8zVjLlVIvyZHQXat7ro_JzXaHWCB_pH62iGEKvgJTpGmidcScNrUwU4S_K0QPSN2OzmBzDHFzqFIbR4olr76sGeiSSpiQXodySX2KV2le75AVE2HN96Fcp_wHX5MXk50R3jzEE3Lx9cuvs-_N-c9vP84-nzdetaI0FoRqXSdhUKBG6dqpHybm5ci4027kynvdg2ZD7-yoRqHc0Go59AP0wL0DeUI-7bnb1S0weoilrmG2OSw270yywTx9ieHSbNKV6bjQkskKeP8AyKl-GYtZAnqYZxshrWgEE53irJ69Stu91OeEmGE6jOHM3Dlnnjpn9s7VvnePdzx0_bOqCthekNbtfzJvASk_ses</recordid><startdate>20180901</startdate><enddate>20180901</enddate><creator>Budach, Stefan</creator><creator>Marsico, Annalisa</creator><general>Oxford University Press</general><scope>TOX</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20180901</creationdate><title>pysster: classification of biological sequences by learning sequence and structure motifs with convolutional neural networks</title><author>Budach, Stefan ; Marsico, Annalisa</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c452t-ae245b63e84e4d3b5f78f0c3d01b9bd14cc97e9087bad4d24b8593878e7e1cbe3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Applications Notes</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Budach, Stefan</creatorcontrib><creatorcontrib>Marsico, Annalisa</creatorcontrib><collection>Access via Oxford University Press (Open Access Collection)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Budach, Stefan</au><au>Marsico, Annalisa</au><au>Hancock, John</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>pysster: classification of biological sequences by learning sequence and structure motifs with convolutional neural networks</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2018-09-01</date><risdate>2018</risdate><volume>34</volume><issue>17</issue><spage>3035</spage><epage>3037</epage><pages>3035-3037</pages><issn>1367-4803</issn><eissn>1460-2059</eissn><eissn>1367-4811</eissn><abstract>Abstract Summary Convolutional neural networks (CNNs) have been shown to perform exceptionally well in a variety of tasks, including biological sequence classification. Available implementations, however, are usually optimized for a particular task and difficult to reuse. To enable researchers to utilize these networks more easily, we implemented pysster, a Python package for training CNNs on biological sequence data. Sequences are classified by learning sequence and structure motifs and the package offers an automated hyper-parameter optimization procedure and options to visualize learned motifs along with information about their positional and class enrichment. The package runs seamlessly on CPU and GPU and provides a simple interface to train and evaluate a network with a handful lines of code. Using an RNA A-to-I editing dataset and cross-linking immunoprecipitation (CLIP)-seq binding site sequences, we demonstrate that pysster classifies sequences with higher accuracy than previous methods, such as GraphProt or ssHMM, and is able to recover known sequence and structure motifs. Availability and implementation pysster is freely available at https://github.com/budach/pysster. Supplementary information Supplementary data are available at Bioinformatics online.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>29659719</pmid><doi>10.1093/bioinformatics/bty222</doi><tpages>3</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1367-4803
ispartof Bioinformatics, 2018-09, Vol.34 (17), p.3035-3037
issn 1367-4803
1460-2059
1367-4811
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_6129303
source Access via Oxford University Press (Open Access Collection); EZB-FREE-00999 freely available EZB journals; PubMed Central; Alma/SFX Local Collection
subjects Applications Notes
title pysster: classification of biological sequences by learning sequence and structure motifs with convolutional neural networks
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-16T16%3A01%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=pysster:%20classification%20of%20biological%20sequences%20by%20learning%20sequence%20and%20structure%20motifs%20with%20convolutional%20neural%20networks&rft.jtitle=Bioinformatics&rft.au=Budach,%20Stefan&rft.date=2018-09-01&rft.volume=34&rft.issue=17&rft.spage=3035&rft.epage=3037&rft.pages=3035-3037&rft.issn=1367-4803&rft.eissn=1460-2059&rft_id=info:doi/10.1093/bioinformatics/bty222&rft_dat=%3Cproquest_pubme%3E2026410205%3C/proquest_pubme%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2026410205&rft_id=info:pmid/29659719&rft_oup_id=10.1093/bioinformatics/bty222&rfr_iscdi=true