CryoAtom2: inverse folding-inspired deep learning enables accurate model building of protein-nucleic acid complexes in cryo-EM
Abstract
Automated model building of protein–nucleic acid complexes in cryo-electron microscopy (cryo-EM) maps is hindered by the highly similar density signatures of chemically related bases at moderate resolutions. Here, we present CryoAtom2, an inverse folding-inspired multimodal deep learning framework for de novo model building of proteins, nucleic acids, and their complexes within a single pipeline. CryoAtom2 jointly integrates three complementary modalities: density, sequence, and—crucially—a diverse set of plausible 3D structural conformations. For each residue, rather than generating a single coordinate set, it produces 28 candidate coordinate sets (20 amino acids, 4 RNA nucleotides, and 4 DNA nucleotides) and employs a dedicated Structure Encoder to capture the rich structural context necessary for accurate sequence inference, following ProteinMPNN. By reasoning over this expanded structural space, the model effectively discriminates between similar bases, yielding globally optimal sequences that best fit the density map. This approach is analogous to the principle of inverse folding, where structural context guides accurate sequence assignment. Benchmarking on 175 non-redundant cryo-EM maps demonstrates that CryoAtom2 substantially outperforms existing state-of-the-art tools. It delivers more complete models with a superior fit to the density and improved geometric quality across proteins, nucleic acids, and their complexes. For example, under the most stringent criterion—completeness—which considers both atomic position and sequence identity—the CryoAtom2 model achieves an average of 60.0% for nucleic acids, significantly outperforming other methods: EM2NA (40.6%), ModelAngelo (32.6%), CryoREAD (32.4%), and Phenix (11.7%). We successfully applied CryoAtom2 to three challenging, unseen experimental structures: an in situ human 80S ribosome, an archaeal 70S ribosome, and a large RNA-only nanocage. In each case, CryoAtom2 achieved near-expert levels of model completeness, exceptional sequence accuracy, and stereochemically sound geometry. Together, these results position CryoAtom2 as a transformative tool for interpreting complex biological machinery in cryo-EM.
References
Burley, Stephen K. et al. Updated resources for exploring experimentally-determined PDB structures and Computed Structure Models at the RCSB Protein Data Bank. Nucleic Acids Research 53, D564-D574 (2025).
Kretsch, R.C. et al. Assessment of Nucleic Acid Structure Prediction in CASP16. Proteins: Structure, Function, and Bioinformatics 94, 192-217 (2026).
Wang, W. et al. trRosettaRNA: automated prediction of RNA 3D structure with transformer network. Nature Communications 14, 7266 (2023).
Doerr, A. Predicting RNA structures. Nature Methods 22, 2495-2495 (2025).
Wang, W., Su, B., Peng, Z. & Yang, J. Integrated experimental and AI innovations for RNA structure determination. Nature Biotechnology (2026).
Jamali, K. et al. Automated model building and protein identification in cryo-EM maps. Nature 628, 450-457 (2024).
Chen, S. et al. Protein complex structure modeling by cross-modal alignment between cryo-EM maps and protein sequences. Nature Communications 15, 8808 (2024).
Giri, N. & Cheng, J. De novo atomic protein structure modeling for cryoEM density maps using 3D transformer and HMM. Nature Communications 15, 5511 (2024).
Su, B., Huang, K., Peng, Z., Amunts, A. & Yang, J. CryoAtom improves model building for cryo-EM. Nature Structural & Molecular Biology (2025).
Li, T., Chen, J., Li, H., Cao, H. & Huang, S.-Y. EMProt improves structure determination from cryo-EM maps. Nature Structural & Molecular Biology (2025).
Wang, X., Zhu, H., Terashi, G., Taluja, M. & Kihara, D. DiffModeler: large macromolecular structure modeling for cryo-EM maps using a diffusion model. Nature Methods 21, 2307-2317 (2024).
Wang, X., Terashi, G. & Kihara, D. CryoREAD: de novo structure modeling for nucleic acids in cryo-EM maps using deep learning. Nature Methods 20, 1739-1747 (2023).
Li, T., Cao, H., He, J. & Huang, S.-Y. Automated detection and de novo structure modeling of nucleic acids from cryo-EM maps. Nature Communications 15, 9367 (2024).
Li, T. et al. All-atom RNA structure determination from cryo-EM maps. Nature Biotechnology 43, 97-105 (2025).
Wang, W., Yu, K., Hugonot, J., Fua, P.V. & Salzmann, M. Recurrent U-Net for Resource Constrained Segmentation. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2142-2151 (2019).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583-589 (2021).
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123-1130 (2023).
Chen, J. et al. Interpretable RNA foundation model from unannotated data for highly accurate RNA structure and function predictions. arXiv preprint arXiv:2204.00300 (2022).
Dauparas, J. et al. Robust deep learning–based protein sequence design using ProteinMPNN. Science 378, 49-56 (2022).
Terwilliger, T.C., Adams, P.D., Afonine, P.V. & Sobolev, O.V. A fully automatic method yielding initial models from high-resolution cryo-electron microscopy maps. Nature Methods 15, 905-908 (2018).
Pintilie, G. et al. Measurement of atom resolvability in cryo-EM maps with Q-scores. Nature Methods 17, 328-334 (2020).
Davis, I.W. et al. MolProbity: all-atom contacts and structure validation for proteins and nucleic acids. Nucleic Acids Research 35, W375-W383 (2007).
Liebschner, D. et al. Macromolecular structure determination using X-rays, neutrons and electrons: recent developments in Phenix. Acta Crystallographica Section D 75, 861-877 (2019).
Afonine, P.V. et al. New tools for the analysis and validation of cryo-EM maps and atomic models. Acta Crystallographica Section D 74, 814-840 (2018).
The ww, P.D.B.C. EMDB—the Electron Microscopy Data Bank. Nucleic Acids Research 52, D456-D465 (2024).
Zheng, W. et al. Visualizing the translation landscape in human cells at high resolution. Nature Communications 16, 10757 (2025).
Zhu, L. et al. A dual-factor complex governs archaeal ribosome hibernation by sensing energy status. bioRxiv, 2026.2001.2019.700304 (2026).
Wang, L. et al. Cryo-EM reveals mechanisms of natural RNA multivalency. Science 388, 545-550 (2025).
Kretsch, R.C. et al. Naturally ornate RNA-only complexes revealed by cryo-EM. Nature 643, 1135-1142 (2025).
Ling, X. et al. Cryo-EM structure of a natural RNA nanocage. Nature 644, 1107-1115 (2025).
Zhang, S. et al. Structural insights into higher-order natural RNA-only multimers. Nature Structural & Molecular Biology 32, 2012-2021 (2025).
Rives, A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of
Sciences 118, e2016239118 (2021).
Lin, T.Y., Goyal, P., Girshick, R., He, K. & Dollar, P. Focal Loss for Dense Object Detection. IEEE transactions on pattern analysis and machine intelligence 42, 318-327 (2020).
Schwerdtfeger, P. & Wales, D.J. 100 Years of the Lennard-Jones Potential. Journal of Chemical Theory and Computation 20, 3379-3405 (2024).
Eddy, S.R. Accelerated Profile HMM Searches. PLOS Computational Biology 7, e1002195 (2011).
Wang, W., Peng, Z. & Yang, J. Predicting RNA 3D structure and conformers using a pretrained secondary structure model and structure-aware attention. bioRxiv, 2025.2004.2009.647915 (2025).
Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next generation sequencing data. Bioinformatics 28, 3150-3152 (2012).
Metrics
DOI:
Submission ID:
Downloads
Posted
How to Cite
Download Citation
Declaration of Competing Interests
The authors declare no competing interests to disclose.
Copyright
The copyright holder for this preprint is the author/funder.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.