预印本 / 版本 1

The expression of satellite DNA-encoded proteins in the human genome

本文是预印本,尚未经过同行评审认证。

作者

分类
关键词
satellite DNA; dark matter protein; 6-frame translation; T2T genome; proteomics

摘要

The discovery of dark matter proteins has primarily relied on predicting open reading frames (ORFs) from known RNA molecules, followed by experimental validation. Although this approach has yielded many excellent studies in recent years, its positive output has become increasingly low while costs continue to rise. Meanwhile, whether large portions of the genomic dark matter-such as satellite DNA-encode proteins remains unknown. In this study, we employed 6-frame in silico translation to directly predict ORFs (istORFs) from genomic DNA sequences and their corresponding amino acid sequences (isteins), and then searched for signature peptides and matching RNA sequences in proteomics and ribo-seq databases. Unexpectedly, the positive rate was remarkably high. We identified a large number of satellite DNA-derived ORFs; among the 1,000 highest-copy-number isteins, 415 had detectable signature peptides in mass spectrometry datasets via Pepquery, and 401 showed matching translation signals in ribo-seq data. The MESSS gene family, derived from a novel form of Sat2/3, comprises over 10,000 predicted ORFs with a combined coding region exceeding 6 Mb. Using a specific antibody, we detected MESSS expression in multiple cell lines. These findings provide initial evidence that at least some satellite DNA sequences possess coding capacity and are capable of expressing proteins. We also analyzed conserved isteins across multiple primate and mouse T2T genomes and found that even among highly conserved ORFs, many had not been previously discovered or annotated. Based on these results, we propose a series of hypotheses that await future validation.

参考文献

1. Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A, et al. The complete sequence of a human genome. Science. 2022;376:44–53. https://doi.org/10.1126/science.abj6987.

2. He Y, Chu Y, Guo S, Hu J, Li R, Zheng Y, et al. T2T-YAO: A Telomere-to-telomere Assembled Diploid Reference Genome for Han Chinese. Genomics Proteomics Bioinformatics. 2023;21:1085–100. https://doi.org/10.1016/j.gpb.2023.08.001.

3. Chimpanzee Sequencing and Analysis Consortium. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature. 2005;437:69–87. https://doi.org/10.1038/nature04072.

4. Yoo D, Rhie A, Hebbar P, Antonacci F, Logsdon GA, Solar SJ, et al. Complete sequencing of ape genomes. Nature. 2025;641:401–18. https://doi.org/10.1038/s41586-025-08816-3.

5. Chen J, Brunner A-D, Cogan JZ, Nuñez JK, Fields AP, Adamson B, et al. Pervasive functional translation of noncanonical human open reading frames. Science. 2020;367:1140–6. https://doi.org/10.1126/science.aay0262.

6. Kesner JS, Chen Z, Shi P, Aparicio AO, Murphy MR, Guo Y, et al. Noncoding translation mitigation. Nature. 2023;617:395–402. https://doi.org/10.1038/s41586-023-05946-4.

7. Yang M, Xie Y, Wang L, Jungreis I, Ou T, Kellis M, et al. Proteogenomics-enabled discovery of novel small open reading frame (sORF)-encoded polypeptides in human and mouse tissues. Nucleic Acids Res. 2025;53:gkaf687. https://doi.org/10.1093/nar/gkaf687.

8. Deutsch EW, Kok LW, Mudge JM, Valls CF, Jungreis I, Ruiz-Orera J, et al. Expanding the human proteome with microproteins and peptideins. Nature. 2026;:1–13. https://doi.org/10.1038/s41586-026-10459-x.

9. Fonseca-Carvalho M, Veríssimo G, Lopes M, Ferreira D, Louzada S, Chaves R. Answering the Cell Stress Call: Satellite Non-Coding Transcription as a Response Mechanism. Biomolecules. 2024;14:124. https://doi.org/10.3390/biom14010124.

10. Kwan T, Thompson SR. Noncanonical Translation Initiation in Eukaryotes. Cold Spring Harb Perspect Biol. 2019;11:a032672. https://doi.org/10.1101/cshperspect.a032672.

11. Chao K-H, Zimin AV, Pertea M, Salzberg SL. The first gapless, reference-quality, fully annotated genome from a Southern Han Chinese individual. G3 Genes|Genomes|Genetics. 2023;13:jkac321. https://doi.org/10.1093/g3journal/jkac321.

12. Yang C, Zhou Y, Song Y, Wu D, Zeng Y, Nie L, et al. The complete and fully-phased diploid genome of a male Han Chinese. Cell Res. 2023;33:745–61. https://doi.org/10.1038/s41422-023-00849-5.

13. Hansen NF, Dwarshuis N, Ji HJ, Rhie A, Loucks H, Logsdon GA, et al. A complete diploid human genome benchmark for personalized genomics. 2025;:2025.09.21.677443. https://doi.org/10.1101/2025.09.21.677443.

14. Haubold B, Wiehe T. How repetitive are genomes? BMC Bioinformatics. 2006;7:541. https://doi.org/10.1186/1471-2105-7-541.

15. Jarmuż M, Glotzbach CD, Bailey KA, Bandyopadhyay R, Shaffer LG. The Evolution of Satellite III DNA Subfamilies among Primates. The American Journal of Human Genetics. 2007;80:495–501. https://doi.org/10.1086/512132.

16. Trivedi M, Gianfrate F, Gennaro L de, Ayllon M, Munson KM, Hoekzema K, et al. Rapid centromere turnover and the adaptive radiation of lemurs. 2026;:2026.05.16.725662. https://doi.org/10.64898/2026.05.16.725662.

17. Hsieh P, Soisangwan N, Gordon DS, Javidh A, Harvey WT, Porubsky D, et al. A global map for introgressed structural variation and selection in humans. Science. 2026;392:eadz7518. https://doi.org/10.1126/science.adz7518.

18. Popesco MC, Maclaren EJ, Hopkins J, Dumas L, Cox M, Meltesen L, et al. Human lineage-specific amplification, selection, and neuronal expression of DUF1220 domains. Science. 2006;313:1304–7. https://doi.org/10.1126/science.1127980.

19. Coe BP, Witherspoon K, Rosenfeld JA, van Bon BWM, Vulto-van Silfhout AT, Bosco P, et al. Refining analyses of copy number variation identifies specific genes associated with developmental delay. Nat Genet. 2014;46:1063–71. https://doi.org/10.1038/ng.3092.

20. Yue L, Jiang W, Li S, Luo M, Fan N, Zhan X, et al. Spatial distribution of the proteome in the human body and in cancers. Nature. 2026;:1–10. https://doi.org/10.1038/s41586-026-10660-y.

21. McCarthy SE, Makarov V, Kirov G, Addington AM, McClellan J, Yoon S, et al. Microduplications of 16p11.2 are associated with schizophrenia. Nat Genet. 2009;41:1223–7. https://doi.org/10.1038/ng.474.

22. Maillard AM, Ruef A, Pizzagalli F, Migliavacca E, Hippolyte L, Adaszewski S, et al. The 16p11.2 locus modulates brain structures common to autism, schizophrenia and obesity. Mol Psychiatry. 2015;20:140–7. https://doi.org/10.1038/mp.2014.145.

23. Ebert P, Audano PA, Zhu Q, Rodriguez-Martin B, Porubsky D, Bonder MJ, et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science. 2021;372:eabf7117. https://doi.org/10.1126/science.abf7117.

指标

查看次数: 103
下载次数: 15

下载次数

已发布

2026-07-02

如何引用

Chen, Y., Li, J., Zhou, X., Zhao, Q., Li, J., Wang, Y., Liu, H., Feng, M., Ni, W., Zhang, Y., & Guo, Y. (2026). The expression of satellite DNA-encoded proteins in the human genome. 浪淘沙预印本平台. https://doi.org/10.65215/LTSpreprints.2026.07.01.000280

利益冲突声明

作者声明无任何需要披露的利益冲突。