The expression of satellite DNA-encoded proteins in the human genome
Abstract
The discovery of dark matter proteins has primarily relied on predicting open reading frames (ORFs) from known RNA molecules, followed by experimental validation. Although this approach has yielded many excellent studies in recent years, its positive output has become increasingly low while costs continue to rise. Meanwhile, whether large portions of the genomic dark matter-such as satellite DNA-encode proteins remains unknown. In this study, we employed 6-frame in silico translation to directly predict ORFs (istORFs) from genomic DNA sequences and their corresponding amino acid sequences (isteins), and then searched for signature peptides and matching RNA sequences in proteomics and ribo-seq databases. Unexpectedly, the positive rate was remarkably high. We identified a large number of satellite DNA-derived ORFs; among the 1,000 highest-copy-number isteins, 415 had detectable signature peptides in mass spectrometry datasets via Pepquery, and 401 showed matching translation signals in ribo-seq data. The MESSS gene family, derived from a novel form of Sat2/3, comprises over 10,000 predicted ORFs with a combined coding region exceeding 6 Mb. Using a specific antibody, we detected MESSS expression in multiple cell lines. These findings provide initial evidence that at least some satellite DNA sequences possess coding capacity and are capable of expressing proteins. We also analyzed conserved isteins across multiple primate and mouse T2T genomes and found that even among highly conserved ORFs, many had not been previously discovered or annotated. Based on these results, we propose a series of hypotheses that await future validation.
References
1. Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A, et al. The complete sequence of a human genome. Science. 2022;376:44–53. https://doi.org/10.1126/science.abj6987.
2. He Y, Chu Y, Guo S, Hu J, Li R, Zheng Y, et al. T2T-YAO: A Telomere-to-telomere Assembled Diploid Reference Genome for Han Chinese. Genomics Proteomics Bioinformatics. 2023;21:1085–100. https://doi.org/10.1016/j.gpb.2023.08.001.
3. Chimpanzee Sequencing and Analysis Consortium. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature. 2005;437:69–87. https://doi.org/10.1038/nature04072.
4. Yoo D, Rhie A, Hebbar P, Antonacci F, Logsdon GA, Solar SJ, et al. Complete sequencing of ape genomes. Nature. 2025;641:401–18. https://doi.org/10.1038/s41586-025-08816-3.
5. Chen J, Brunner A-D, Cogan JZ, Nuñez JK, Fields AP, Adamson B, et al. Pervasive functional translation of noncanonical human open reading frames. Science. 2020;367:1140–6. https://doi.org/10.1126/science.aay0262.
6. Kesner JS, Chen Z, Shi P, Aparicio AO, Murphy MR, Guo Y, et al. Noncoding translation mitigation. Nature. 2023;617:395–402. https://doi.org/10.1038/s41586-023-05946-4.
7. Yang M, Xie Y, Wang L, Jungreis I, Ou T, Kellis M, et al. Proteogenomics-enabled discovery of novel small open reading frame (sORF)-encoded polypeptides in human and mouse tissues. Nucleic Acids Res. 2025;53:gkaf687. https://doi.org/10.1093/nar/gkaf687.
8. Deutsch EW, Kok LW, Mudge JM, Valls CF, Jungreis I, Ruiz-Orera J, et al. Expanding the human proteome with microproteins and peptideins. Nature. 2026;:1–13. https://doi.org/10.1038/s41586-026-10459-x.
9. Fonseca-Carvalho M, Veríssimo G, Lopes M, Ferreira D, Louzada S, Chaves R. Answering the Cell Stress Call: Satellite Non-Coding Transcription as a Response Mechanism. Biomolecules. 2024;14:124. https://doi.org/10.3390/biom14010124.
10. Kwan T, Thompson SR. Noncanonical Translation Initiation in Eukaryotes. Cold Spring Harb Perspect Biol. 2019;11:a032672. https://doi.org/10.1101/cshperspect.a032672.
11. Chao K-H, Zimin AV, Pertea M, Salzberg SL. The first gapless, reference-quality, fully annotated genome from a Southern Han Chinese individual. G3 Genes|Genomes|Genetics. 2023;13:jkac321. https://doi.org/10.1093/g3journal/jkac321.
12. Yang C, Zhou Y, Song Y, Wu D, Zeng Y, Nie L, et al. The complete and fully-phased diploid genome of a male Han Chinese. Cell Res. 2023;33:745–61. https://doi.org/10.1038/s41422-023-00849-5.
13. Hansen NF, Dwarshuis N, Ji HJ, Rhie A, Loucks H, Logsdon GA, et al. A complete diploid human genome benchmark for personalized genomics. 2025;:2025.09.21.677443. https://doi.org/10.1101/2025.09.21.677443.
14. Haubold B, Wiehe T. How repetitive are genomes? BMC Bioinformatics. 2006;7:541. https://doi.org/10.1186/1471-2105-7-541.
15. Jarmuż M, Glotzbach CD, Bailey KA, Bandyopadhyay R, Shaffer LG. The Evolution of Satellite III DNA Subfamilies among Primates. The American Journal of Human Genetics. 2007;80:495–501. https://doi.org/10.1086/512132.
16. Trivedi M, Gianfrate F, Gennaro L de, Ayllon M, Munson KM, Hoekzema K, et al. Rapid centromere turnover and the adaptive radiation of lemurs. 2026;:2026.05.16.725662. https://doi.org/10.64898/2026.05.16.725662.
17. Hsieh P, Soisangwan N, Gordon DS, Javidh A, Harvey WT, Porubsky D, et al. A global map for introgressed structural variation and selection in humans. Science. 2026;392:eadz7518. https://doi.org/10.1126/science.adz7518.
18. Popesco MC, Maclaren EJ, Hopkins J, Dumas L, Cox M, Meltesen L, et al. Human lineage-specific amplification, selection, and neuronal expression of DUF1220 domains. Science. 2006;313:1304–7. https://doi.org/10.1126/science.1127980.
19. Coe BP, Witherspoon K, Rosenfeld JA, van Bon BWM, Vulto-van Silfhout AT, Bosco P, et al. Refining analyses of copy number variation identifies specific genes associated with developmental delay. Nat Genet. 2014;46:1063–71. https://doi.org/10.1038/ng.3092.
20. Yue L, Jiang W, Li S, Luo M, Fan N, Zhan X, et al. Spatial distribution of the proteome in the human body and in cancers. Nature. 2026;:1–10. https://doi.org/10.1038/s41586-026-10660-y.
21. McCarthy SE, Makarov V, Kirov G, Addington AM, McClellan J, Yoon S, et al. Microduplications of 16p11.2 are associated with schizophrenia. Nat Genet. 2009;41:1223–7. https://doi.org/10.1038/ng.474.
22. Maillard AM, Ruef A, Pizzagalli F, Migliavacca E, Hippolyte L, Adaszewski S, et al. The 16p11.2 locus modulates brain structures common to autism, schizophrenia and obesity. Mol Psychiatry. 2015;20:140–7. https://doi.org/10.1038/mp.2014.145.
23. Ebert P, Audano PA, Zhu Q, Rodriguez-Martin B, Porubsky D, Bonder MJ, et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science. 2021;372:eabf7117. https://doi.org/10.1126/science.abf7117.
Metrics
DOI:
Submission ID:
Downloads
Additional Files
Supplemental File(s)
Posted
How to Cite
Download Citation
Declaration of Competing Interests
The authors declare no competing interests to disclose.
Copyright
The copyright holder for this preprint is the author/funder.

This work is licensed under a Creative Commons Attribution 4.0 International License.