Frequent Substring-Based Sequence Classification with an Ensemble of Support Vector Machines Trained Using Reduced Amino Acid Alphabets

We propose a frequent pattern-based algorithm for predicting functions and localizations of proteins from their primary structure (amino acid sequence). We use reduced alphabets that capture the higher rate of substitution between amino acids that are physiochemically similar. Frequent sub strings a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Chitraranjan, C. D., Alnemer, L., Al-Azzam, O., Salem, S., Denton, A. M., Iqbal, M. J., Kianian, S. F.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Amino acids Microorganisms Prediction algorithms Predictive models Proteins Support vector machines Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	185
container_issue
container_start_page	180
container_title
container_volume	2
creator	Chitraranjan, C. D. Alnemer, L. Al-Azzam, O. Salem, S. Denton, A. M. Iqbal, M. J. Kianian, S. F.
description	We propose a frequent pattern-based algorithm for predicting functions and localizations of proteins from their primary structure (amino acid sequence). We use reduced alphabets that capture the higher rate of substitution between amino acids that are physiochemically similar. Frequent sub strings are mined from the training sequences, transformed into different alphabets, and used as features to train an ensemble of SVMs. We evaluate the performance of our algorithm using protein sub-cellular localization and protein function datasets. Pair-wise sequence-alignment-based nearest neighbor and basic SVM k-gram classifiers are included as comparison algorithms. Results show that the frequent sub string-based SVM classifier demonstrates better performance compared with other classifiers on the sub-cellular localization datasets and it performs competitively with the nearest neighbor classifier on the protein function datasets. Our results also show that the use of reduced alphabets provides statistically significant performance improvements for half of the classes studied.
doi_str_mv	10.1109/ICMLA.2011.71
format	Conference Proceeding
fullrecord	<record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_6147669</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6147669</ieee_id><sourcerecordid>6147669</sourcerecordid><originalsourceid>FETCH-LOGICAL-i90t-78e6e3d69d42b05874ff96a569c5833577cf355f926f2ed72c12705aed2acb6a3</originalsourceid><addsrcrecordid>eNotkMFOwzAQRI0QElB65MTFP5BiO7EdH0PUlkqtkGjhWjnOmhqlTohdIb6A38YC5vJ2VtqRdhC6pWRGKVH3q3qzrmaMUDqT9AxdEykULwSR7BxNlSxpwaVkNC_YJZqG8E6ShFCKyiv0vRjh4wQ-4u2pCXF0_i170AFavP3dG8B1p0Nw1hkdXe_xp4sHrD2e-wDHpgPc23Q7DP0Y8SuY2I94o83BeQh4N-rEFr-ElIufoT2Z5Kqj8z2ujEtjNxx0AzHcoAuruwDTf07QbjHf1Y_Z-mm5qqt15hSJmSxBQN4K1RasIbyUhbVKaC6U4WWepzeNzTm3ignLoJXMUCYJ19AybRqh8wm6-4t1ALAfRnfU49de0EKmQvIfFuljMA</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Frequent Substring-Based Sequence Classification with an Ensemble of Support Vector Machines Trained Using Reduced Amino Acid Alphabets</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Chitraranjan, C. D. ; Alnemer, L. ; Al-Azzam, O. ; Salem, S. ; Denton, A. M. ; Iqbal, M. J. ; Kianian, S. F.</creator><creatorcontrib>Chitraranjan, C. D. ; Alnemer, L. ; Al-Azzam, O. ; Salem, S. ; Denton, A. M. ; Iqbal, M. J. ; Kianian, S. F.</creatorcontrib><description>We propose a frequent pattern-based algorithm for predicting functions and localizations of proteins from their primary structure (amino acid sequence). We use reduced alphabets that capture the higher rate of substitution between amino acids that are physiochemically similar. Frequent sub strings are mined from the training sequences, transformed into different alphabets, and used as features to train an ensemble of SVMs. We evaluate the performance of our algorithm using protein sub-cellular localization and protein function datasets. Pair-wise sequence-alignment-based nearest neighbor and basic SVM k-gram classifiers are included as comparison algorithms. Results show that the frequent sub string-based SVM classifier demonstrates better performance compared with other classifiers on the sub-cellular localization datasets and it performs competitively with the nearest neighbor classifier on the protein function datasets. Our results also show that the use of reduced alphabets provides statistically significant performance improvements for half of the classes studied.</description><identifier>ISBN: 9781457721342</identifier><identifier>ISBN: 1457721341</identifier><identifier>EISBN: 0769546072</identifier><identifier>EISBN: 9780769546070</identifier><identifier>DOI: 10.1109/ICMLA.2011.71</identifier><language>eng</language><publisher>IEEE</publisher><subject>Amino acids ; Microorganisms ; Prediction algorithms ; Predictive models ; Proteins ; Support vector machines ; Training</subject><ispartof>2011 10th International Conference on Machine Learning and Applications and Workshops, 2011, Vol.2, p.180-185</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6147669$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6147669$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Chitraranjan, C. D.</creatorcontrib><creatorcontrib>Alnemer, L.</creatorcontrib><creatorcontrib>Al-Azzam, O.</creatorcontrib><creatorcontrib>Salem, S.</creatorcontrib><creatorcontrib>Denton, A. M.</creatorcontrib><creatorcontrib>Iqbal, M. J.</creatorcontrib><creatorcontrib>Kianian, S. F.</creatorcontrib><title>Frequent Substring-Based Sequence Classification with an Ensemble of Support Vector Machines Trained Using Reduced Amino Acid Alphabets</title><title>2011 10th International Conference on Machine Learning and Applications and Workshops</title><addtitle>icmla</addtitle><description>We propose a frequent pattern-based algorithm for predicting functions and localizations of proteins from their primary structure (amino acid sequence). We use reduced alphabets that capture the higher rate of substitution between amino acids that are physiochemically similar. Frequent sub strings are mined from the training sequences, transformed into different alphabets, and used as features to train an ensemble of SVMs. We evaluate the performance of our algorithm using protein sub-cellular localization and protein function datasets. Pair-wise sequence-alignment-based nearest neighbor and basic SVM k-gram classifiers are included as comparison algorithms. Results show that the frequent sub string-based SVM classifier demonstrates better performance compared with other classifiers on the sub-cellular localization datasets and it performs competitively with the nearest neighbor classifier on the protein function datasets. Our results also show that the use of reduced alphabets provides statistically significant performance improvements for half of the classes studied.</description><subject>Amino acids</subject><subject>Microorganisms</subject><subject>Prediction algorithms</subject><subject>Predictive models</subject><subject>Proteins</subject><subject>Support vector machines</subject><subject>Training</subject><isbn>9781457721342</isbn><isbn>1457721341</isbn><isbn>0769546072</isbn><isbn>9780769546070</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2011</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNotkMFOwzAQRI0QElB65MTFP5BiO7EdH0PUlkqtkGjhWjnOmhqlTohdIb6A38YC5vJ2VtqRdhC6pWRGKVH3q3qzrmaMUDqT9AxdEykULwSR7BxNlSxpwaVkNC_YJZqG8E6ShFCKyiv0vRjh4wQ-4u2pCXF0_i170AFavP3dG8B1p0Nw1hkdXe_xp4sHrD2e-wDHpgPc23Q7DP0Y8SuY2I94o83BeQh4N-rEFr-ElIufoT2Z5Kqj8z2ujEtjNxx0AzHcoAuruwDTf07QbjHf1Y_Z-mm5qqt15hSJmSxBQN4K1RasIbyUhbVKaC6U4WWepzeNzTm3ignLoJXMUCYJ19AybRqh8wm6-4t1ALAfRnfU49de0EKmQvIfFuljMA</recordid><startdate>201112</startdate><enddate>201112</enddate><creator>Chitraranjan, C. D.</creator><creator>Alnemer, L.</creator><creator>Al-Azzam, O.</creator><creator>Salem, S.</creator><creator>Denton, A. M.</creator><creator>Iqbal, M. J.</creator><creator>Kianian, S. F.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201112</creationdate><title>Frequent Substring-Based Sequence Classification with an Ensemble of Support Vector Machines Trained Using Reduced Amino Acid Alphabets</title><author>Chitraranjan, C. D. ; Alnemer, L. ; Al-Azzam, O. ; Salem, S. ; Denton, A. M. ; Iqbal, M. J. ; Kianian, S. F.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i90t-78e6e3d69d42b05874ff96a569c5833577cf355f926f2ed72c12705aed2acb6a3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Amino acids</topic><topic>Microorganisms</topic><topic>Prediction algorithms</topic><topic>Predictive models</topic><topic>Proteins</topic><topic>Support vector machines</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Chitraranjan, C. D.</creatorcontrib><creatorcontrib>Alnemer, L.</creatorcontrib><creatorcontrib>Al-Azzam, O.</creatorcontrib><creatorcontrib>Salem, S.</creatorcontrib><creatorcontrib>Denton, A. M.</creatorcontrib><creatorcontrib>Iqbal, M. J.</creatorcontrib><creatorcontrib>Kianian, S. F.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chitraranjan, C. D.</au><au>Alnemer, L.</au><au>Al-Azzam, O.</au><au>Salem, S.</au><au>Denton, A. M.</au><au>Iqbal, M. J.</au><au>Kianian, S. F.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Frequent Substring-Based Sequence Classification with an Ensemble of Support Vector Machines Trained Using Reduced Amino Acid Alphabets</atitle><btitle>2011 10th International Conference on Machine Learning and Applications and Workshops</btitle><stitle>icmla</stitle><date>2011-12</date><risdate>2011</risdate><volume>2</volume><spage>180</spage><epage>185</epage><pages>180-185</pages><isbn>9781457721342</isbn><isbn>1457721341</isbn><eisbn>0769546072</eisbn><eisbn>9780769546070</eisbn><abstract>We propose a frequent pattern-based algorithm for predicting functions and localizations of proteins from their primary structure (amino acid sequence). We use reduced alphabets that capture the higher rate of substitution between amino acids that are physiochemically similar. Frequent sub strings are mined from the training sequences, transformed into different alphabets, and used as features to train an ensemble of SVMs. We evaluate the performance of our algorithm using protein sub-cellular localization and protein function datasets. Pair-wise sequence-alignment-based nearest neighbor and basic SVM k-gram classifiers are included as comparison algorithms. Results show that the frequent sub string-based SVM classifier demonstrates better performance compared with other classifiers on the sub-cellular localization datasets and it performs competitively with the nearest neighbor classifier on the protein function datasets. Our results also show that the use of reduced alphabets provides statistically significant performance improvements for half of the classes studied.</abstract><pub>IEEE</pub><doi>10.1109/ICMLA.2011.71</doi><tpages>6</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISBN: 9781457721342
ispartof	2011 10th International Conference on Machine Learning and Applications and Workshops, 2011, Vol.2, p.180-185
issn
language	eng
recordid	cdi_ieee_primary_6147669
source	IEEE Electronic Library (IEL) Conference Proceedings
subjects	Amino acids Microorganisms Prediction algorithms Predictive models Proteins Support vector machines Training
title	Frequent Substring-Based Sequence Classification with an Ensemble of Support Vector Machines Trained Using Reduced Amino Acid Alphabets
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T13%3A28%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Frequent%20Substring-Based%20Sequence%20Classification%20with%20an%20Ensemble%20of%20Support%20Vector%20Machines%20Trained%20Using%20Reduced%20Amino%20Acid%20Alphabets&rft.btitle=2011%2010th%20International%20Conference%20on%20Machine%20Learning%20and%20Applications%20and%20Workshops&rft.au=Chitraranjan,%20C.%20D.&rft.date=2011-12&rft.volume=2&rft.spage=180&rft.epage=185&rft.pages=180-185&rft.isbn=9781457721342&rft.isbn_list=1457721341&rft_id=info:doi/10.1109/ICMLA.2011.71&rft_dat=%3Cieee_6IE%3E6147669%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=0769546072&rft.eisbn_list=9780769546070&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6147669&rfr_iscdi=true