预印本 / 版本 1

Robust enzyme kinetics prediction through pairwise relative learning

本文是预印本,尚未经过同行评审认证。

作者

    Xiongwen Li, 
    Xiongwen Li
    • 清华大学深圳国际研究生院
    • Tsinghua Shenzhen International Graduate School image/svg+xml
    Zhengkai Li, 
    Zhengkai Li
    Jiawei Zou, 
    Jiawei Zou
    Wenjie Chen, 
    Wenjie Chen
    Shujia Liu, 
    Shujia Liu
    Ke Wu, 
    Ke Wu
    Jiahao Luo, 
    Jiahao Luo
    Yu Chen, 
    Yu Chen
    Feiran Li
    Feiran Li
分类

摘要

Enzyme kinetics prediction is a fundamental challenge, and numerous computational methods have been developed. However, most methods rely on absolute regression across heterogeneous datasets, which are noisy and prone to regression-to-the-mean bias, limiting accurate identification of highly active enzymes. Here we present DeltaKcat, a siamese neural network framework that reformulates kinetics prediction as a pairwise relative learning problem. By learning pairwise differences within consistent experimental contexts, DeltaKcat improves accuracy, robustness, and generalization over state-of-the-art methods. This framework enables DeltaKcat to (i) mitigate inter-study variability, (ii) generalize to unseen enzymes and substrates, and (iii) prioritize highly active enzymes. We demonstrate these capabilities through systematic benchmarking and external adenylate kinase dataset validation. We anticipate that this approach will be broadly useful for enzyme discovery and engineering, and more generally for learning from noisy biological data.

参考文献

1. Currin, A., Swainston, N., Day, P.J. & Kell, D.B. Synthetic biology for the directed evolution of protein biocatalysts: navigating sequence space intelligently. Chem Soc Rev 44, 1172-1239 (2015).

2. Riva, S. Laccases: blue enzymes for green chemistry. Trends Biotechnol 24, 219-226 (2006).

3. Chen, Y. & Nielsen, J. Energy metabolism controls phenotypes by protein efficiency and allocation. Proc Natl Acad Sci U S A 116, 17592-17597 (2019).

4. Meghwanshi, G.K. et al. Enzymes for pharmaceutical and therapeutic applications. Biotechnol Appl Biochem 67, 586-601 (2020).

5. Raveendran, S. et al. Applications of Microbial Enzymes in Food Industry. Food Technol Biotechnol 56, 16-30 (2018).

6. Goldman, S., Das, R., Yang, K.K. & Coley, C.W. Machine learning modeling of family wide enzyme-substrate specificity screens. PLoS Comput Biol 18, e1009853 (2022).

7. Bateman, A. et al. UniProt: the Universal Protein Knowledgebase in 2025. Nucleic Acids Research 52, D609-D617 (2024).

8. Chang, A. et al. BRENDA, the ELIXIR core data resource in 2021: new developments and updates. Nucleic Acids Research 49, D498-D508 (2021).

9. Wittig, U., Rey, M., Weidemann, A., Kania, R. & Müller, W. SABIO-RK: an updated resource for manually curated biochemical reaction kinetics. Nucleic Acids Research 46, D656-D660 (2018).

10. Nilsson, A., Nielsen, J. & Palsson, B.O. Metabolic Models of Protein Allocation Call for the Kinetome. Cell Syst 5, 538-541 (2017).

11. Zhang, D. et al. Discovery of Toxin-Degrading Enzymes with Positive Unlabeled Deep Learning. ACS Catalysis 14, 3336-3348 (2024).

12. Cui, H. et al. Enzyme specificity prediction using cross-attention graph neural networks. Nature 647, 639-647 (2025).

13. Meier, J. et al. Language models enable zero-shot prediction of the effects of mutations on protein function. Advances in neural information processing systems 34, 29287-29303 (2021).

14. Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123-1130 (2023).

15. Li, F. et al. Deep learning-based k cat prediction enables improved enzyme-constrained model reconstruction. Nature Catalysis 5, 662-672 (2022).

16. Wang, J. et al. MPEK: a multitask deep learning framework based on pretrained language models for enzymatic reaction kinetic parameters prediction. Brief Bioinform 25 (2024).

17. Hua, C. et al. Reactzyme: A benchmark for enzyme-reaction prediction. Advances in Neural Information Processing Systems 37, 26415-26442 (2024).

18. Kroll, A., Rousset, Y., Hu, X.P., Liebrand, N.A. & Lercher, M.J. Turnover number predictions for kinetically uncharacterized enzymes using machine and deep learning. Nat Commun 14, 4139 (2023).

19. Shen, X. et al. EITLEM-Kinetics: A deep-learning framework for kinetic parameter prediction of mutant enzymes. Chem Catalysis (2024).

20. Wang, Z. et al. Robust enzyme discovery and engineering with deep learning using CataPro. Nat Commun 16, 2736 (2025).

21. Yu, H., Deng, H., He, J., Keasling, J.D. & Luo, X. UniKP: a unified framework for the prediction of enzyme kinetic parameters. Nat Commun 14, 8211 (2023).

22. Rives, A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc Natl Acad Sci U S A 118 (2021).

23. Honda, S., Shi, S. & Ueda, H.R. Smiles transformer: Pre-trained molecular fingerprint for low data drug discovery. arXiv preprint arXiv:1911.04738 (2019).

24. Sapoval, N. et al. Current progress and open challenges for applying deep learning across the biosciences. Nat Commun 13, 1728 (2022).

25. Lyu, B. et al. GotEnzymes2: expanding coverage of enzyme kinetics and thermal properties. Nucleic Acids Res 54, D583-D592 (2026).

26. Qiu, S., Saeed, H., Leonard, W., Li, F. & Yang, A. Machine learning for enzyme catalytic activity: current progress and future horizons. Brief Bioinform 27 (2026).

27. Sagi, O. & Rokach, L. Ensemble learning: A survey. Wiley interdisciplinary reviews: data mining and knowledge discovery 8, e1249 (2018).

28. Jimenez-Luna, J. et al. DeltaDelta neural networks for lead optimization of small molecule potency. Chem Sci 10, 10911-10918 (2019).

29. Yu, J. et al. Computing the relative binding affinity of ligands based on a pairwise binding comparison network. Nat Comput Sci 3, 860-872 (2023).

30. Sajeevan, K.A. et al. Robust prediction of enzyme variant kinetics with RealKcat. bioRxiv (2025).

31. Malli, A., Vasyutyn, D. & Kim, J.R. Advances in Machine Learning Models for Predicting Enzyme Kinetic Parameters. J Chem Inf Model 66, 42-60 (2026).

32. Wei, G., Ran, X., Ai-Abssi, R. & Yang, Z. Finding the dark matter: Large language model-based enzyme kinetic data extractor and its validation. Protein Sci 34, e70251 (2025).

33. Jiang, J. et al. Enzyme Co-scientist: harnessing large language models for enzyme kinetic data extraction from literature. BioRxiv, 2025.2003. 2003.641178 (2025).

34. Elnaggar, A. et al. ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning. IEEE Trans Pattern Anal Mach Intell 44, 7112-7127 (2022).

35. Edwards, C. et al. in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing 375-413 (2022).

36. Muir, D.F. et al. Evolutionary-scale enzymology enables exploration of a rugged catalytic landscape. Science 388, eadu1058 (2025).

37. Kerns, S.J. et al. The energy landscape of adenylate kinase during catalysis. Nat Struct Mol Biol 22, 124-131 (2015).

38. Bar-Even, A. et al. The moderately efficient enzyme: evolutionary and physicochemical trends shaping enzyme parameters. Biochemistry 50, 4402-4410 (2011).

39. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583-589 (2021).

40. Song, Y. et al. Accurately predicting enzyme functions through geometric graph learning on ESMFold-predicted structures. Nat Commun 15, 8180 (2024).

41. Kim, S. et al. PubChem 2025 update. Nucleic Acids Res 53, D1516-D1525 (2025).

指标

查看次数: 35
下载次数: 7

下载次数

已发布

2026-05-20

如何引用

Li, X., Li, Z., Zou, J., Chen, W., Liu, S., Wu, K., Luo, J., Chen, Y., & Li, F. (2026). Robust enzyme kinetics prediction through pairwise relative learning. 浪淘沙预印本平台. https://doi.org/10.65215/LTSpreprints.2026.05.19.000247

利益冲突声明

作者声明无任何需要披露的利益冲突。