Preprint / Version 1

Evolution-guided diffusion generative model enables large-step exploration of functional protein sequence space from single sequences

This article is a preprint and has not been certified by peer review.

Authors

    Xing Zhang,  
    Xing Zhang
    • Institute of Systems and Physical Biology, Shenzhen Bay Laboratory
    Jinle Tang,  
    Jinle Tang
    • Institute of Systems and Physical Biology, Shenzhen Bay Laboratory
    Tingkai Zhang,  
    Tingkai Zhang
    • Institute of Systems and Physical Biology, Shenzhen Bay Laboratory
    • School of Medicine, Southern University of Science and Technology
    Zhihang Chen,  
    Zhihang Chen
    Zhe Zhang,  
    Zhe Zhang
    • Institute of Systems and Physical Biology, Shenzhen Bay Laboratory
    Jian Zhan,  
    Jian Zhan
    • Institute of Systems and Physical Biology, Shenzhen Bay Laboratory
    • Ribopeutic (Shenzhen) Co., Ltd.
    • Ribopeutic Inc.
    Yaoqi Zhou
    Yaoqi Zhou
    • Institute of Systems and Physical Biology, Shenzhen Bay Laboratory
Categories
Keywords
single-sequence protein generation; evolution-guided diffusion model; MSA-free protein structure prediction; remote homolog discovery

Abstract

Protein evolution in nature and in the laboratory proceeds through incremental, largely undirected mutational steps, restricting exploration to local regions of sequence space and limiting access to remote yet potentially functional proteins. We present EvoGUD, a single-sequence–conditioned diffusion framework for large-step exploration of protein sequence space under learned evolutionary constraints. EvoGUD-generated sequences preserve natural-like co-evolutionary structure in representation space despite large sequence divergence. When assembled as virtual multiple sequence alignments, these sequences substantially improve AlphaFold3 single-sequence inference, restoring much of the backbone accuracy and atomic-level side-chain realism for recent deposited protein monomers as well as protein–nucleic-acid complexes, without evolutionary database search. Moreover, EvoGUD enables functional discovery in remote sequence space, yielding active variants of the adenine base-editing enzyme TadA in targeted validation experiments (80% success rate) and large numbers of functional variants of the intrinsically disordered antitoxin CcdA in high-throughput selection assays (19% success rate). Together, these results establish EvoGUD as a single-sequence, evolution-aware generative framework for large-step navigation of protein sequence space, with direct implications for structure modeling and functional protein discovery in previously unexplored sequence space.

Metrics

Favorites: 2
Views: 386
Downloads: 58

Downloads

Additional Files

Supplemental File(s)

Posted

2026-01-30

How to Cite

Zhang, X., Tang, J., Zhang, T., Chen, Z., Zhang, Z., Zhan, J., & Zhou, Y. (2026). Evolution-guided diffusion generative model enables large-step exploration of functional protein sequence space from single sequences. LangTaoSha Preprint Server. https://doi.org/10.65215/LTSpreprints.2026.01.29.000104

Declaration of Competing Interests

The authors declare no competing interests to disclose.