Evolution-guided diffusion generative model enables large-step exploration of functional protein sequence space from single sequences

Xing Zhang; Jinle Tang; Tingkai Zhang; Zhihang Chen; Zhe Zhang; Jian Zhan; Yaoqi Zhou

doi:10.65215/LTSpreprints.2026.01.29.000104

预印本 / 版本 1

Evolution-guided diffusion generative model enables large-step exploration of functional protein sequence space from single sequences

本文是预印本，尚未经过同行评审认证。

作者

分类

生命科学 x 人工智能

关键词

single-sequence protein generation; evolution-guided diffusion model; MSA-free protein structure prediction; remote homolog discovery

摘要

Protein evolution in nature and in the laboratory proceeds through incremental, largely undirected mutational steps, restricting exploration to local regions of sequence space and limiting access to remote yet potentially functional proteins. We present EvoGUD, a single-sequence–conditioned diffusion framework for large-step exploration of protein sequence space under learned evolutionary constraints. EvoGUD-generated sequences preserve natural-like co-evolutionary structure in representation space despite large sequence divergence. When assembled as virtual multiple sequence alignments, these sequences substantially improve AlphaFold3 single-sequence inference, restoring much of the backbone accuracy and atomic-level side-chain realism for recent deposited protein monomers as well as protein–nucleic-acid complexes, without evolutionary database search. Moreover, EvoGUD enables functional discovery in remote sequence space, yielding active variants of the adenine base-editing enzyme TadA in targeted validation experiments (80% success rate) and large numbers of functional variants of the intrinsically disordered antitoxin CcdA in high-throughput selection assays (19% success rate). Together, these results establish EvoGUD as a single-sequence, evolution-aware generative framework for large-step navigation of protein sequence space, with direct implications for structure modeling and functional protein discovery in previously unexplored sequence space.

指标

收藏: 2

查看次数: 398

下载次数: 60

DOI：

https://doi.org/10.65215/LTSpreprints.2026.01.29.000104

Submission ID：

104

下载次数

附加文件

补充文件

Supplementary Information.docx (英语)

已发布

2026-01-30

如何引用

Zhang, X., Tang, J., Zhang, T., Chen, Z., Zhang, Z., Zhan, J., & Zhou, Y. (2026). Evolution-guided diffusion generative model enables large-step exploration of functional protein sequence space from single sequences. 浪淘沙预印本平台. https://doi.org/10.65215/LTSpreprints.2026.01.29.000104

下载引用

利益冲突声明

作者声明无任何需要披露的利益冲突。

Copyright

本预印本的版权持有者为作者/资助方。

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.