A hybrid method to cluster protein sequences based on statistics and artificial neural networks

We have recently proposed a method, based on artificial neural networks (ANNs) to cluster protein sequences into families according to their degree of sequence similarity. The network was trained with an unsupervised learning algorithm, using, as inputs, matrix patterns den ved from the hip eptide c...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Bioinformatics 1993-12, Vol.9 (6), p.671-680
Hauptverfasser:	Ferrán, Edgardo A., Pflugfelder, Bernard
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Analytical, structural and metabolic biochemistry Biological and medical sciences Biometry Fundamental and applied biological sciences. Psychology General aspects, investigation methods Humans Neural Networks (Computer) Proteins Proteins - classification Proteins - genetics Sequence Alignment - methods Sequence Alignment - statistics & numerical data Software
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	680
container_issue	6
container_start_page	671
container_title	Bioinformatics
container_volume	9
creator	Ferrán, Edgardo A. Pflugfelder, Bernard
description	We have recently proposed a method, based on artificial neural networks (ANNs) to cluster protein sequences into families according to their degree of sequence similarity. The network was trained with an unsupervised learning algorithm, using, as inputs, matrix patterns den ved from the hip eptide composition of the protein sequences. We describe here some frrther improvements to that approach. First, we propose a statistical method to cluster a set of bipeptidic matrices into families. It consists of three stages: (i) principal component analysis, (ii) detennination of the optimal number M of clusters and (iii) final class cation of the bipeptidic matrices into M clusters. Using a set of 444 protein sequences, we show that the class given by the statistical method is in agreement with biological knowledge. We also show that the resulting classification is very similar to the one previously obtained with the ANN approach. Finally, we propose a new hybrid method of the statistical and ANN approaches, in which the results of the statistical method are used to choose the number of neurons and inputs of the network. We show that a network built in this way, and fed with afew principal components of the set of bipeptidic matrices as input signals, can be trained in an extremely short computing time. The resulting topological maps do not essentially differ from the ones obtained with the initial ANN approach.
doi_str_mv	10.1093/bioinformatics/9.6.671
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_76265979</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>76265979</sourcerecordid><originalsourceid>FETCH-LOGICAL-c415t-61b9983349a98673fb4fedc7143a9f3d527b61bd70b36a73e270140d312347ab3</originalsourceid><addsrcrecordid>eNpVkFtLHTEUhUNp8VZ_Qkseim9zTGZnksmjSKsWQQpHkL6E3AZTZyaaZLD-e6PncMCnvcn6dtZiIfSdkhUlEk5NiGEeYpp0CTafyhVfcUE_oQPKOGla0snPdQcuGtYT2EeHOf8jpKOMsT2011MGtIMDpM7w_YtJweHJl_vocInYjksuPuHHFIsPM87-afGz9Rkbnb3DsT6VapvfnLGeHdaphCHYoEc8-yW9j_Ic00P-ir4Mesz-eDuP0O2vn-vzy-b65uLq_Oy6sYx2peHUSNkDMKllzwUMhg3eWVFjajmA61phKuMEMcC1AN8KQhlxQFtgQhs4Qiebf2vomjYXNYVs_Tjq2cclK8Fb3kkhK8g3oE0x5-QH9ZjCpNOLokS9Nas-Nquk4qo2Ww-_bR0WM3m3O9tWWfUfW11nq8ch6dmGvMOgZ52EvmLNBqv1-f87WaeH6gKiU5d3f5WQjPwmf9ZqDa-M9pYJ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>76265979</pqid></control><display><type>article</type><title>A hybrid method to cluster protein sequences based on statistics and artificial neural networks</title><source>Oxford Journals Open Access Collection</source><source>MEDLINE</source><source>Alma/SFX Local Collection</source><source>Oxford University Press Journals Digital Archive Legacy</source><creator>Ferrán, Edgardo A. ; Pflugfelder, Bernard</creator><creatorcontrib>Ferrán, Edgardo A. ; Pflugfelder, Bernard</creatorcontrib><description>We have recently proposed a method, based on artificial neural networks (ANNs) to cluster protein sequences into families according to their degree of sequence similarity. The network was trained with an unsupervised learning algorithm, using, as inputs, matrix patterns den ved from the hip eptide composition of the protein sequences. We describe here some frrther improvements to that approach. First, we propose a statistical method to cluster a set of bipeptidic matrices into families. It consists of three stages: (i) principal component analysis, (ii) detennination of the optimal number M of clusters and (iii) final class cation of the bipeptidic matrices into M clusters. Using a set of 444 protein sequences, we show that the class given by the statistical method is in agreement with biological knowledge. We also show that the resulting classification is very similar to the one previously obtained with the ANN approach. Finally, we propose a new hybrid method of the statistical and ANN approaches, in which the results of the statistical method are used to choose the number of neurons and inputs of the network. We show that a network built in this way, and fed with afew principal components of the set of bipeptidic matrices as input signals, can be trained in an extremely short computing time. The resulting topological maps do not essentially differ from the ones obtained with the initial ANN approach.</description><identifier>ISSN: 1367-4803</identifier><identifier>ISSN: 0266-7061</identifier><identifier>EISSN: 1460-2059</identifier><identifier>DOI: 10.1093/bioinformatics/9.6.671</identifier><identifier>PMID: 8143153</identifier><identifier>CODEN: COABER</identifier><language>eng</language><publisher>Washington, DC: Oxford University Press</publisher><subject>Algorithms ; Analytical, structural and metabolic biochemistry ; Biological and medical sciences ; Biometry ; Fundamental and applied biological sciences. Psychology ; General aspects, investigation methods ; Humans ; Neural Networks (Computer) ; Proteins ; Proteins - classification ; Proteins - genetics ; Sequence Alignment - methods ; Sequence Alignment - statistics & numerical data ; Software</subject><ispartof>Bioinformatics, 1993-12, Vol.9 (6), p.671-680</ispartof><rights>1994 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=3845938$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/8143153$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Ferrán, Edgardo A.</creatorcontrib><creatorcontrib>Pflugfelder, Bernard</creatorcontrib><title>A hybrid method to cluster protein sequences based on statistics and artificial neural networks</title><title>Bioinformatics</title><addtitle>Comput Appl Biosci</addtitle><description>We have recently proposed a method, based on artificial neural networks (ANNs) to cluster protein sequences into families according to their degree of sequence similarity. The network was trained with an unsupervised learning algorithm, using, as inputs, matrix patterns den ved from the hip eptide composition of the protein sequences. We describe here some frrther improvements to that approach. First, we propose a statistical method to cluster a set of bipeptidic matrices into families. It consists of three stages: (i) principal component analysis, (ii) detennination of the optimal number M of clusters and (iii) final class cation of the bipeptidic matrices into M clusters. Using a set of 444 protein sequences, we show that the class given by the statistical method is in agreement with biological knowledge. We also show that the resulting classification is very similar to the one previously obtained with the ANN approach. Finally, we propose a new hybrid method of the statistical and ANN approaches, in which the results of the statistical method are used to choose the number of neurons and inputs of the network. We show that a network built in this way, and fed with afew principal components of the set of bipeptidic matrices as input signals, can be trained in an extremely short computing time. The resulting topological maps do not essentially differ from the ones obtained with the initial ANN approach.</description><subject>Algorithms</subject><subject>Analytical, structural and metabolic biochemistry</subject><subject>Biological and medical sciences</subject><subject>Biometry</subject><subject>Fundamental and applied biological sciences. Psychology</subject><subject>General aspects, investigation methods</subject><subject>Humans</subject><subject>Neural Networks (Computer)</subject><subject>Proteins</subject><subject>Proteins - classification</subject><subject>Proteins - genetics</subject><subject>Sequence Alignment - methods</subject><subject>Sequence Alignment - statistics & numerical data</subject><subject>Software</subject><issn>1367-4803</issn><issn>0266-7061</issn><issn>1460-2059</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>1993</creationdate><recordtype>article</recordtype><sourceid>EIF</sourceid><recordid>eNpVkFtLHTEUhUNp8VZ_Qkseim9zTGZnksmjSKsWQQpHkL6E3AZTZyaaZLD-e6PncMCnvcn6dtZiIfSdkhUlEk5NiGEeYpp0CTafyhVfcUE_oQPKOGla0snPdQcuGtYT2EeHOf8jpKOMsT2011MGtIMDpM7w_YtJweHJl_vocInYjksuPuHHFIsPM87-afGz9Rkbnb3DsT6VapvfnLGeHdaphCHYoEc8-yW9j_Ic00P-ir4Mesz-eDuP0O2vn-vzy-b65uLq_Oy6sYx2peHUSNkDMKllzwUMhg3eWVFjajmA61phKuMEMcC1AN8KQhlxQFtgQhs4Qiebf2vomjYXNYVs_Tjq2cclK8Fb3kkhK8g3oE0x5-QH9ZjCpNOLokS9Nas-Nquk4qo2Ww-_bR0WM3m3O9tWWfUfW11nq8ch6dmGvMOgZ52EvmLNBqv1-f87WaeH6gKiU5d3f5WQjPwmf9ZqDa-M9pYJ</recordid><startdate>19931201</startdate><enddate>19931201</enddate><creator>Ferrán, Edgardo A.</creator><creator>Pflugfelder, Bernard</creator><general>Oxford University Press</general><scope>BSCLL</scope><scope>IQODW</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope></search><sort><creationdate>19931201</creationdate><title>A hybrid method to cluster protein sequences based on statistics and artificial neural networks</title><author>Ferrán, Edgardo A. ; Pflugfelder, Bernard</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c415t-61b9983349a98673fb4fedc7143a9f3d527b61bd70b36a73e270140d312347ab3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>1993</creationdate><topic>Algorithms</topic><topic>Analytical, structural and metabolic biochemistry</topic><topic>Biological and medical sciences</topic><topic>Biometry</topic><topic>Fundamental and applied biological sciences. Psychology</topic><topic>General aspects, investigation methods</topic><topic>Humans</topic><topic>Neural Networks (Computer)</topic><topic>Proteins</topic><topic>Proteins - classification</topic><topic>Proteins - genetics</topic><topic>Sequence Alignment - methods</topic><topic>Sequence Alignment - statistics & numerical data</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ferrán, Edgardo A.</creatorcontrib><creatorcontrib>Pflugfelder, Bernard</creatorcontrib><collection>Istex</collection><collection>Pascal-Francis</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ferrán, Edgardo A.</au><au>Pflugfelder, Bernard</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A hybrid method to cluster protein sequences based on statistics and artificial neural networks</atitle><jtitle>Bioinformatics</jtitle><addtitle>Comput Appl Biosci</addtitle><date>1993-12-01</date><risdate>1993</risdate><volume>9</volume><issue>6</issue><spage>671</spage><epage>680</epage><pages>671-680</pages><issn>1367-4803</issn><issn>0266-7061</issn><eissn>1460-2059</eissn><coden>COABER</coden><abstract>We have recently proposed a method, based on artificial neural networks (ANNs) to cluster protein sequences into families according to their degree of sequence similarity. The network was trained with an unsupervised learning algorithm, using, as inputs, matrix patterns den ved from the hip eptide composition of the protein sequences. We describe here some frrther improvements to that approach. First, we propose a statistical method to cluster a set of bipeptidic matrices into families. It consists of three stages: (i) principal component analysis, (ii) detennination of the optimal number M of clusters and (iii) final class cation of the bipeptidic matrices into M clusters. Using a set of 444 protein sequences, we show that the class given by the statistical method is in agreement with biological knowledge. We also show that the resulting classification is very similar to the one previously obtained with the ANN approach. Finally, we propose a new hybrid method of the statistical and ANN approaches, in which the results of the statistical method are used to choose the number of neurons and inputs of the network. We show that a network built in this way, and fed with afew principal components of the set of bipeptidic matrices as input signals, can be trained in an extremely short computing time. The resulting topological maps do not essentially differ from the ones obtained with the initial ANN approach.</abstract><cop>Washington, DC</cop><cop>Oxford</cop><pub>Oxford University Press</pub><pmid>8143153</pmid><doi>10.1093/bioinformatics/9.6.671</doi><tpages>10</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 1367-4803
ispartof	Bioinformatics, 1993-12, Vol.9 (6), p.671-680
issn	1367-4803 0266-7061 1460-2059
language	eng
recordid	cdi_proquest_miscellaneous_76265979
source	Oxford Journals Open Access Collection; MEDLINE; Alma/SFX Local Collection; Oxford University Press Journals Digital Archive Legacy
subjects	Algorithms Analytical, structural and metabolic biochemistry Biological and medical sciences Biometry Fundamental and applied biological sciences. Psychology General aspects, investigation methods Humans Neural Networks (Computer) Proteins Proteins - classification Proteins - genetics Sequence Alignment - methods Sequence Alignment - statistics & numerical data Software
title	A hybrid method to cluster protein sequences based on statistics and artificial neural networks
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-12T23%3A04%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20hybrid%20method%20to%20cluster%20protein%20sequences%20based%20on%20statistics%20and%20artificial%20neural%20networks&rft.jtitle=Bioinformatics&rft.au=Ferr%C3%A1n,%20Edgardo%20A.&rft.date=1993-12-01&rft.volume=9&rft.issue=6&rft.spage=671&rft.epage=680&rft.pages=671-680&rft.issn=1367-4803&rft.eissn=1460-2059&rft.coden=COABER&rft_id=info:doi/10.1093/bioinformatics/9.6.671&rft_dat=%3Cproquest_cross%3E76265979%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=76265979&rft_id=info:pmid/8143153&rfr_iscdi=true