MSAGPT: Neural Prompting Protein Structure Prediction via MSA Generative Pre-Training
Multiple Sequence Alignment (MSA) plays a pivotal role in unveiling the evolutionary trajectories of protein families. The accuracy of protein structure predictions is often compromised for protein sequences that lack sufficient homologous information to construct high quality MSA. Although various...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Chen, Bo Bei, Zhilei Cheng, Xingyi Li, Pan Tang, Jie Song, Le |
description | Multiple Sequence Alignment (MSA) plays a pivotal role in unveiling the
evolutionary trajectories of protein families. The accuracy of protein
structure predictions is often compromised for protein sequences that lack
sufficient homologous information to construct high quality MSA. Although
various methods have been proposed to generate virtual MSA under these
conditions, they fall short in comprehensively capturing the intricate
coevolutionary patterns within MSA or require guidance from external oracle
models. Here we introduce MSAGPT, a novel approach to prompt protein structure
predictions via MSA generative pretraining in the low MSA regime. MSAGPT
employs a simple yet effective 2D evolutionary positional encoding scheme to
model complex evolutionary patterns. Endowed by this, its flexible 1D MSA
decoding framework facilitates zero or few shot learning. Moreover, we
demonstrate that leveraging the feedback from AlphaFold2 can further enhance
the model capacity via Rejective Fine tuning (RFT) and Reinforcement Learning
from AF2 Feedback (RLAF). Extensive experiments confirm the efficacy of MSAGPT
in generating faithful virtual MSA to enhance the structure prediction
accuracy. The transfer learning capabilities also highlight its great potential
for facilitating other protein tasks. |
doi_str_mv | 10.48550/arxiv.2406.05347 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2406_05347</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2406_05347</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-8a5c9caf8a5b0e173dd63e9ec5c6888e5ec799fa0c19c8aa971362ec3a446b073</originalsourceid><addsrcrecordid>eNotj09Lw0AUxPfiQaofwJP7BRI33f-9laJRaLXQeA6vLy9loU3Lugn67U2jpxlmmIEfYw-FyJXTWjxB_A5DPlfC5EJLZW_Z52a3LLfVgr9TH-HIt_F8uqTQHa4uUej4LsUeUx9pTKgJmMK540MAPi55SR1FSGGY2qyKELpxfMduWjh-0f2_zlj18lytXrP1R_m2Wq4zMNZmDjR6hHbUvaDCyqYxkjyhRuOcI01ovW9BYOHRAXhbSDMnlKCU2QsrZ-zx73biqi8xnCD-1Fe-euKTv6PAS1I</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>MSAGPT: Neural Prompting Protein Structure Prediction via MSA Generative Pre-Training</title><source>arXiv.org</source><creator>Chen, Bo ; Bei, Zhilei ; Cheng, Xingyi ; Li, Pan ; Tang, Jie ; Song, Le</creator><creatorcontrib>Chen, Bo ; Bei, Zhilei ; Cheng, Xingyi ; Li, Pan ; Tang, Jie ; Song, Le</creatorcontrib><description>Multiple Sequence Alignment (MSA) plays a pivotal role in unveiling the
evolutionary trajectories of protein families. The accuracy of protein
structure predictions is often compromised for protein sequences that lack
sufficient homologous information to construct high quality MSA. Although
various methods have been proposed to generate virtual MSA under these
conditions, they fall short in comprehensively capturing the intricate
coevolutionary patterns within MSA or require guidance from external oracle
models. Here we introduce MSAGPT, a novel approach to prompt protein structure
predictions via MSA generative pretraining in the low MSA regime. MSAGPT
employs a simple yet effective 2D evolutionary positional encoding scheme to
model complex evolutionary patterns. Endowed by this, its flexible 1D MSA
decoding framework facilitates zero or few shot learning. Moreover, we
demonstrate that leveraging the feedback from AlphaFold2 can further enhance
the model capacity via Rejective Fine tuning (RFT) and Reinforcement Learning
from AF2 Feedback (RLAF). Extensive experiments confirm the efficacy of MSAGPT
in generating faithful virtual MSA to enhance the structure prediction
accuracy. The transfer learning capabilities also highlight its great potential
for facilitating other protein tasks.</description><identifier>DOI: 10.48550/arxiv.2406.05347</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning ; Quantitative Biology - Biomolecules</subject><creationdate>2024-06</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2406.05347$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2406.05347$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Chen, Bo</creatorcontrib><creatorcontrib>Bei, Zhilei</creatorcontrib><creatorcontrib>Cheng, Xingyi</creatorcontrib><creatorcontrib>Li, Pan</creatorcontrib><creatorcontrib>Tang, Jie</creatorcontrib><creatorcontrib>Song, Le</creatorcontrib><title>MSAGPT: Neural Prompting Protein Structure Prediction via MSA Generative Pre-Training</title><description>Multiple Sequence Alignment (MSA) plays a pivotal role in unveiling the
evolutionary trajectories of protein families. The accuracy of protein
structure predictions is often compromised for protein sequences that lack
sufficient homologous information to construct high quality MSA. Although
various methods have been proposed to generate virtual MSA under these
conditions, they fall short in comprehensively capturing the intricate
coevolutionary patterns within MSA or require guidance from external oracle
models. Here we introduce MSAGPT, a novel approach to prompt protein structure
predictions via MSA generative pretraining in the low MSA regime. MSAGPT
employs a simple yet effective 2D evolutionary positional encoding scheme to
model complex evolutionary patterns. Endowed by this, its flexible 1D MSA
decoding framework facilitates zero or few shot learning. Moreover, we
demonstrate that leveraging the feedback from AlphaFold2 can further enhance
the model capacity via Rejective Fine tuning (RFT) and Reinforcement Learning
from AF2 Feedback (RLAF). Extensive experiments confirm the efficacy of MSAGPT
in generating faithful virtual MSA to enhance the structure prediction
accuracy. The transfer learning capabilities also highlight its great potential
for facilitating other protein tasks.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><subject>Quantitative Biology - Biomolecules</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj09Lw0AUxPfiQaofwJP7BRI33f-9laJRaLXQeA6vLy9loU3Lugn67U2jpxlmmIEfYw-FyJXTWjxB_A5DPlfC5EJLZW_Z52a3LLfVgr9TH-HIt_F8uqTQHa4uUej4LsUeUx9pTKgJmMK540MAPi55SR1FSGGY2qyKELpxfMduWjh-0f2_zlj18lytXrP1R_m2Wq4zMNZmDjR6hHbUvaDCyqYxkjyhRuOcI01ovW9BYOHRAXhbSDMnlKCU2QsrZ-zx73biqi8xnCD-1Fe-euKTv6PAS1I</recordid><startdate>20240608</startdate><enddate>20240608</enddate><creator>Chen, Bo</creator><creator>Bei, Zhilei</creator><creator>Cheng, Xingyi</creator><creator>Li, Pan</creator><creator>Tang, Jie</creator><creator>Song, Le</creator><scope>AKY</scope><scope>ALC</scope><scope>GOX</scope></search><sort><creationdate>20240608</creationdate><title>MSAGPT: Neural Prompting Protein Structure Prediction via MSA Generative Pre-Training</title><author>Chen, Bo ; Bei, Zhilei ; Cheng, Xingyi ; Li, Pan ; Tang, Jie ; Song, Le</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-8a5c9caf8a5b0e173dd63e9ec5c6888e5ec799fa0c19c8aa971362ec3a446b073</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><topic>Quantitative Biology - Biomolecules</topic><toplevel>online_resources</toplevel><creatorcontrib>Chen, Bo</creatorcontrib><creatorcontrib>Bei, Zhilei</creatorcontrib><creatorcontrib>Cheng, Xingyi</creatorcontrib><creatorcontrib>Li, Pan</creatorcontrib><creatorcontrib>Tang, Jie</creatorcontrib><creatorcontrib>Song, Le</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Quantitative Biology</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chen, Bo</au><au>Bei, Zhilei</au><au>Cheng, Xingyi</au><au>Li, Pan</au><au>Tang, Jie</au><au>Song, Le</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>MSAGPT: Neural Prompting Protein Structure Prediction via MSA Generative Pre-Training</atitle><date>2024-06-08</date><risdate>2024</risdate><abstract>Multiple Sequence Alignment (MSA) plays a pivotal role in unveiling the
evolutionary trajectories of protein families. The accuracy of protein
structure predictions is often compromised for protein sequences that lack
sufficient homologous information to construct high quality MSA. Although
various methods have been proposed to generate virtual MSA under these
conditions, they fall short in comprehensively capturing the intricate
coevolutionary patterns within MSA or require guidance from external oracle
models. Here we introduce MSAGPT, a novel approach to prompt protein structure
predictions via MSA generative pretraining in the low MSA regime. MSAGPT
employs a simple yet effective 2D evolutionary positional encoding scheme to
model complex evolutionary patterns. Endowed by this, its flexible 1D MSA
decoding framework facilitates zero or few shot learning. Moreover, we
demonstrate that leveraging the feedback from AlphaFold2 can further enhance
the model capacity via Rejective Fine tuning (RFT) and Reinforcement Learning
from AF2 Feedback (RLAF). Extensive experiments confirm the efficacy of MSAGPT
in generating faithful virtual MSA to enhance the structure prediction
accuracy. The transfer learning capabilities also highlight its great potential
for facilitating other protein tasks.</abstract><doi>10.48550/arxiv.2406.05347</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2406.05347 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2406_05347 |
source | arXiv.org |
subjects | Computer Science - Artificial Intelligence Computer Science - Learning Quantitative Biology - Biomolecules |
title | MSAGPT: Neural Prompting Protein Structure Prediction via MSA Generative Pre-Training |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T07%3A44%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=MSAGPT:%20Neural%20Prompting%20Protein%20Structure%20Prediction%20via%20MSA%20Generative%20Pre-Training&rft.au=Chen,%20Bo&rft.date=2024-06-08&rft_id=info:doi/10.48550/arxiv.2406.05347&rft_dat=%3Carxiv_GOX%3E2406_05347%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |