Evolution-guided diffusion generative model enables large-step exploration of functional protein sequence space from single sequences
Abstract
Protein evolution in nature and in the laboratory proceeds through incremental, largely undirected mutational steps, restricting exploration to local regions of sequence space and limiting access to remote yet potentially functional proteins. We present EvoGUD, a single-sequence–conditioned diffusion framework for large-step exploration of protein sequence space under learned evolutionary constraints. EvoGUD-generated sequences preserve natural-like co-evolutionary structure in representation space despite large sequence divergence. When assembled as virtual multiple sequence alignments, these sequences substantially improve AlphaFold3 single-sequence inference, restoring much of the backbone accuracy and atomic-level side-chain realism for recent deposited protein monomers as well as protein–nucleic-acid complexes, without evolutionary database search. Moreover, EvoGUD enables functional discovery in remote sequence space, yielding active variants of the adenine base-editing enzyme TadA in targeted validation experiments (80% success rate) and large numbers of functional variants of the intrinsically disordered antitoxin CcdA in high-throughput selection assays (19% success rate). Together, these results establish EvoGUD as a single-sequence, evolution-aware generative framework for large-step navigation of protein sequence space, with direct implications for structure modeling and functional protein discovery in previously unexplored sequence space.
Metrics
DOI:
Submission ID:
Downloads
Posted
How to Cite
Declaration of Competing Interests
The authors declare no competing interests to disclose.
Copyright
The copyright holder for this preprint is the author/funder.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.