Dirichlet Diffusion Score Model for Biological Sequence Generation
Designing biological sequences is an important challenge that requires satisfying complex constraints and thus is a natural problem to address with deep generative modeling. Diffusion generative models have achieved considerable success in many applications. Score-based generative stochastic differe...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Designing biological sequences is an important challenge that requires
satisfying complex constraints and thus is a natural problem to address with
deep generative modeling. Diffusion generative models have achieved
considerable success in many applications. Score-based generative stochastic
differential equations (SDE) model is a continuous-time diffusion model
framework that enjoys many benefits, but the originally proposed SDEs are not
naturally designed for modeling discrete data. To develop generative SDE models
for discrete data such as biological sequences, here we introduce a diffusion
process defined in the probability simplex space with stationary distribution
being the Dirichlet distribution. This makes diffusion in continuous space
natural for modeling discrete data. We refer to this approach as Dirchlet
diffusion score model. We demonstrate that this technique can generate samples
that satisfy hard constraints using a Sudoku generation task. This generative
model can also solve Sudoku, including hard puzzles, without additional
training. Finally, we applied this approach to develop the first human promoter
DNA sequence design model and showed that designed sequences share similar
properties with natural promoter sequences. |
---|---|
DOI: | 10.48550/arxiv.2305.10699 |