A primer on model-guided exploration of fitness landscapes for biological sequence design
Machine learning methods are increasingly employed to address challenges faced by biologists. One area that will greatly benefit from this cross-pollination is the problem of biological sequence design, which has massive potential for therapeutic applications. However, significant inefficiencies rem...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2020-10 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Sinai, Sam Kelsic, Eric D |
description | Machine learning methods are increasingly employed to address challenges faced by biologists. One area that will greatly benefit from this cross-pollination is the problem of biological sequence design, which has massive potential for therapeutic applications. However, significant inefficiencies remain in communication between these fields which result in biologists finding the progress in machine learning inaccessible, and hinder machine learning scientists from contributing to impactful problems in bioengineering. Sequence design can be seen as a search process on a discrete, high-dimensional space, where each sequence is associated with a function. This sequence-to-function map is known as a "Fitness Landscape". Designing a sequence with a particular function is hence a matter of "discovering" such a (often rare) sequence within this space. Today we can build predictive models with good interpolation ability due to impressive progress in the synthesis and testing of biological sequences in large numbers, which enables model training and validation. However, it often remains a challenge to find useful sequences with the properties that we like using these models. In particular, in this primer we highlight that algorithms for experimental design, what we call "exploration strategies", are a related, yet distinct problem from building good models of sequence-to-function maps. We review advances and insights from current literature -- by no means a complete treatment -- while highlighting desirable features of optimal model-guided exploration, and cover potential pitfalls drawn from our own experience. This primer can serve as a starting point for researchers from different domains that are interested in the problem of searching a sequence space with a model, but are perhaps unaware of approaches that originate outside their field. |
format | Article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2453523423</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2453523423</sourcerecordid><originalsourceid>FETCH-proquest_journals_24535234233</originalsourceid><addsrcrecordid>eNqNjkEKwjAQAIMgWLR_WPBcqEmjXkUUH-DFk8RmU1LSbMy24PPtwQd4msPMYRaikErtqmMj5UqUzH1d13J_kFqrQjxOkLIfMANFGMhiqLrJW7SAnxQom9HPghw4P0ZkhmCi5dYkZHCU4eUpUOdbE4DxPWFsESyy7-JGLJ0JjOWPa7G9Xu7nW5UyzSGPz56mHGf1lI1WWqpm_vyv-gIPqEKw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2453523423</pqid></control><display><type>article</type><title>A primer on model-guided exploration of fitness landscapes for biological sequence design</title><source>Freely Accessible Journals</source><creator>Sinai, Sam ; Kelsic, Eric D</creator><creatorcontrib>Sinai, Sam ; Kelsic, Eric D</creatorcontrib><description>Machine learning methods are increasingly employed to address challenges faced by biologists. One area that will greatly benefit from this cross-pollination is the problem of biological sequence design, which has massive potential for therapeutic applications. However, significant inefficiencies remain in communication between these fields which result in biologists finding the progress in machine learning inaccessible, and hinder machine learning scientists from contributing to impactful problems in bioengineering. Sequence design can be seen as a search process on a discrete, high-dimensional space, where each sequence is associated with a function. This sequence-to-function map is known as a "Fitness Landscape". Designing a sequence with a particular function is hence a matter of "discovering" such a (often rare) sequence within this space. Today we can build predictive models with good interpolation ability due to impressive progress in the synthesis and testing of biological sequences in large numbers, which enables model training and validation. However, it often remains a challenge to find useful sequences with the properties that we like using these models. In particular, in this primer we highlight that algorithms for experimental design, what we call "exploration strategies", are a related, yet distinct problem from building good models of sequence-to-function maps. We review advances and insights from current literature -- by no means a complete treatment -- while highlighting desirable features of optimal model-guided exploration, and cover potential pitfalls drawn from our own experience. This primer can serve as a starting point for researchers from different domains that are interested in the problem of searching a sequence space with a model, but are perhaps unaware of approaches that originate outside their field.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Bioengineering ; Biological effects ; Biologists ; Design of experiments ; Exploration ; Fitness ; Interpolation ; Landscape design ; Machine learning ; Prediction models ; Search process</subject><ispartof>arXiv.org, 2020-10</ispartof><rights>2020. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Sinai, Sam</creatorcontrib><creatorcontrib>Kelsic, Eric D</creatorcontrib><title>A primer on model-guided exploration of fitness landscapes for biological sequence design</title><title>arXiv.org</title><description>Machine learning methods are increasingly employed to address challenges faced by biologists. One area that will greatly benefit from this cross-pollination is the problem of biological sequence design, which has massive potential for therapeutic applications. However, significant inefficiencies remain in communication between these fields which result in biologists finding the progress in machine learning inaccessible, and hinder machine learning scientists from contributing to impactful problems in bioengineering. Sequence design can be seen as a search process on a discrete, high-dimensional space, where each sequence is associated with a function. This sequence-to-function map is known as a "Fitness Landscape". Designing a sequence with a particular function is hence a matter of "discovering" such a (often rare) sequence within this space. Today we can build predictive models with good interpolation ability due to impressive progress in the synthesis and testing of biological sequences in large numbers, which enables model training and validation. However, it often remains a challenge to find useful sequences with the properties that we like using these models. In particular, in this primer we highlight that algorithms for experimental design, what we call "exploration strategies", are a related, yet distinct problem from building good models of sequence-to-function maps. We review advances and insights from current literature -- by no means a complete treatment -- while highlighting desirable features of optimal model-guided exploration, and cover potential pitfalls drawn from our own experience. This primer can serve as a starting point for researchers from different domains that are interested in the problem of searching a sequence space with a model, but are perhaps unaware of approaches that originate outside their field.</description><subject>Algorithms</subject><subject>Bioengineering</subject><subject>Biological effects</subject><subject>Biologists</subject><subject>Design of experiments</subject><subject>Exploration</subject><subject>Fitness</subject><subject>Interpolation</subject><subject>Landscape design</subject><subject>Machine learning</subject><subject>Prediction models</subject><subject>Search process</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNjkEKwjAQAIMgWLR_WPBcqEmjXkUUH-DFk8RmU1LSbMy24PPtwQd4msPMYRaikErtqmMj5UqUzH1d13J_kFqrQjxOkLIfMANFGMhiqLrJW7SAnxQom9HPghw4P0ZkhmCi5dYkZHCU4eUpUOdbE4DxPWFsESyy7-JGLJ0JjOWPa7G9Xu7nW5UyzSGPz56mHGf1lI1WWqpm_vyv-gIPqEKw</recordid><startdate>20201023</startdate><enddate>20201023</enddate><creator>Sinai, Sam</creator><creator>Kelsic, Eric D</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20201023</creationdate><title>A primer on model-guided exploration of fitness landscapes for biological sequence design</title><author>Sinai, Sam ; Kelsic, Eric D</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_24535234233</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Algorithms</topic><topic>Bioengineering</topic><topic>Biological effects</topic><topic>Biologists</topic><topic>Design of experiments</topic><topic>Exploration</topic><topic>Fitness</topic><topic>Interpolation</topic><topic>Landscape design</topic><topic>Machine learning</topic><topic>Prediction models</topic><topic>Search process</topic><toplevel>online_resources</toplevel><creatorcontrib>Sinai, Sam</creatorcontrib><creatorcontrib>Kelsic, Eric D</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sinai, Sam</au><au>Kelsic, Eric D</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>A primer on model-guided exploration of fitness landscapes for biological sequence design</atitle><jtitle>arXiv.org</jtitle><date>2020-10-23</date><risdate>2020</risdate><eissn>2331-8422</eissn><abstract>Machine learning methods are increasingly employed to address challenges faced by biologists. One area that will greatly benefit from this cross-pollination is the problem of biological sequence design, which has massive potential for therapeutic applications. However, significant inefficiencies remain in communication between these fields which result in biologists finding the progress in machine learning inaccessible, and hinder machine learning scientists from contributing to impactful problems in bioengineering. Sequence design can be seen as a search process on a discrete, high-dimensional space, where each sequence is associated with a function. This sequence-to-function map is known as a "Fitness Landscape". Designing a sequence with a particular function is hence a matter of "discovering" such a (often rare) sequence within this space. Today we can build predictive models with good interpolation ability due to impressive progress in the synthesis and testing of biological sequences in large numbers, which enables model training and validation. However, it often remains a challenge to find useful sequences with the properties that we like using these models. In particular, in this primer we highlight that algorithms for experimental design, what we call "exploration strategies", are a related, yet distinct problem from building good models of sequence-to-function maps. We review advances and insights from current literature -- by no means a complete treatment -- while highlighting desirable features of optimal model-guided exploration, and cover potential pitfalls drawn from our own experience. This primer can serve as a starting point for researchers from different domains that are interested in the problem of searching a sequence space with a model, but are perhaps unaware of approaches that originate outside their field.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2020-10 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2453523423 |
source | Freely Accessible Journals |
subjects | Algorithms Bioengineering Biological effects Biologists Design of experiments Exploration Fitness Interpolation Landscape design Machine learning Prediction models Search process |
title | A primer on model-guided exploration of fitness landscapes for biological sequence design |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T19%3A23%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=A%20primer%20on%20model-guided%20exploration%20of%20fitness%20landscapes%20for%20biological%20sequence%20design&rft.jtitle=arXiv.org&rft.au=Sinai,%20Sam&rft.date=2020-10-23&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2453523423%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2453523423&rft_id=info:pmid/&rfr_iscdi=true |