High-throughput cryo-EM characterization and automated model building of glycofibrils via CryoSeek
Abstract
With CryoSeek, a structure-first paradigm for discovery, we have determined high resolution 3D structures of a number of glycofibrils, in which well-ordered glycans either form a thick shell coating various protein cores or constitute the entire fibril. To improve the throughput of CryoSeek, we hereby report two methods. The recursive bisection clustering (RBC) strategy has been designed to enable high-throughput cryo-EM data processing of fibrils. EModelG is an AI-facilitated algorithm for automated model building of glycans. Using the RBC method, we have established a high-throughput workflow for CryoSeek and have reconstructed 3D EM maps for hundreds of fibrils that can be automatically modelled in EModelG. Based on their molecular compositions and structural features, we tentatively proposed a unified nomenclature scheme for the fibrils discovered via CryoSeek. These structures will lay the foundation for decoding the principles of glycan folding. Furthermore, to adapt to the high volume of cryo-EM structures quickly obtained with the CryoSeek strategy, we have established a namesake database for data archiving and sharing.
References
[1] T. Wang et al., CryoSeek: A strategy for bioentity discovery using cryoelectron microscopy. Proc Natl Acad Sci U S A 121, e2417046121(2024).
[2] T. Wang et al., CryoSeek II: Cryo-EM analysis of glycofibrils from freshwater reveals well-structured glycans coating linear tetrapeptide repeats. Proc Natl Acad Sci US A 122,e2423943122(2025).
[3] T.Wang,Y.Sun,Z.Li,N.Yan,The 8-nm spaghetti: well-structured glycans coating linear tetrapeptide repeats discovered from freshwater with CryoSeek.bioRxiv,2024.2012.2015.627649(2024).
[4] Z. Li et al., CryoSeek identification of glycofibrils with diverse compositions and structural assemblies.bioRxiv,2025.2009.2030.679562(2025).
[5] Q. Zhang et al., Absolute hand determination of glycofibrils from natural sources in cryo-EM. BioRxiv(2025).
[6] J. Jumper et al., Highly accurate protein structure prediction with AlphaFold.Nature 596, 583-589(2021).
[7] J. Abramson et al., Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493-500(2024).
[8] C.Rohl,C.Strauss,K.Misura,D.Baker,Protein structure prediction using Rosetta. Methods Enzymol 383, 66-93(2004).
[9] K.Xu,Z.Wang,J.Shi,H.Li,Q.Zhang(2019) A2-net: Molecular structure estimation from cryo-em density volumes. in Proceedings of the AAAI Conference on Artificial Intelligence,pp 1230-1237.
[10] K. Jamali et al., Automated model building and protein identification in cryo-EM maps. Nature 628, 450-457(2024).
[11] H. Simon,S. Teng, How good is recursive bisection? Siam J Sci Comput 18,1436-1445(1997).
[12] M. Lukaszczyk,B. Pradhan,H. Remaut, The biosynthesis and structures of bacterial pili. Bacterial cell walls and membranes,369-413(2019).
[13] L.Craig,M.Pique,J.Tainer,Type IV pilus structure and bacterial pathogenicity.Nat Rev Microbiol 2, 363-378(2004).
[14] H. Remaut et al., Donor-strand exchange in chaperone-assisted pilus assembly proceeds through a concerted beta strand displacement mechanism. Mol Cell 22,831-842(2006).
[15] P.Bork,T.Doerks,T.Springer,B.Snel,Domains in plexins: links to integrins and transcription factors. Trends Biochem Sci 24, 261-263(1999).
[16] P. Haynes, Phosphoglycosylation: a new structural class of glycosylation?Glycobiology 8,1-5(1998).
[17] P.Van den Steen,P.Rudd,R.Dwek,G.Opdenakker,Concepts and principles of O-linked glycosylation. Crit Rev Biochem Mol Biol 33, 151-208(1998).
[18] T.Nakajima,B.Volcani,3,4-dihydroxyproline: a new amino acid in diatom cell walls. Science 164, 1400-1401(1969).
[19] K. Jamali et al., Automated model building and protein identification in cryo-EM maps. Nature 628, 450-457(2024).
[20] S. Chen et al., Protein complex structure modeling by cross-modal alignment between cryo-EM maps and protein sequences. Nature Communications 15,8808(2024).
[21] P.Emsley,B. Lohkamp, W.G. Scott,K. Cowtan, Features and development of Coot. Biological crystallography 66,486-501(2010).
[22] H. Berman et al.,The Protein Data Bank. Nucleic Acids Research 28, 235-242(2000).
[23] C. Lawson et al., EMDataBank unified data resource for 3DEM. Nucleic Acids Res 44, D396-403(2016).
[24] P.D. Adams et al., PHENIX: a comprehensive Python-based system for macromolecular structure solution. Biological crystallography 66,213-221(2010).
[25] O. Ronneberger,P. Fischer,T. Brox(2015) U-net: Convolutional networks for biomedical image segmentation. in International Conference on Medical image computing and computer-assisted intervention(Springer), pp 234-241.
[26] H. M. Berman et al., The protein data bank. Nucleic acids research 28, 235-242(2000).
[27] G. Pintilie et al., Measurement of atom resolvability in cryo-EM maps with Q-scores. Nature methods 17, 328-334(2020).
[28] D.P.Kingma,J.Ba,Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980(2014).
[29] T.C.Terwilliger,O.V.Sobolev,P.V.Afonine,P.D.Adams, Automated map sharpening by maximization of detail and connectivity. Biological Crystallography 74, 545-559(2018).
[30] E. Pettersen et al., UCSF ChimeraX: Structure visualization for researchers,educators, and developers. Protein Sci 30, 70-82(2021).
Metrics
DOI:
Submission ID:
Downloads
Posted
How to Cite
Download Citation
Declaration of Competing Interests
The authors declare no competing interests to disclose.
Copyright
The copyright holder for this preprint is the author/funder.
All rights reserved. This work is protected by copyright. No part of this work may be reproduced, distributed, or transmitted in any form or by any means without the prior written permission of the copyright holder.