LanguageFold: A Bio-inspired Hierarchical Sparse Attention Mechanism for Large Language Models
Abstract
Large language models predominantly rely on the Transformer architecture, whose self-attention mechanism incurs a quadratic computational cost O(N2) with respect to input length, leading to significant memory and computation bottlenecks when processing ultra-long contexts. This work proposes LanguageFold, a hierarchical sparse attention mechanism inspired by the Self-Returning Random Walk model of genome folding (Huang et al. 2020). LanguageFold decomposes global attention into dynamically constructed tree attention with a theoretical scaling of O(NlogN). Preliminary experiments on prompt-based generation and the DROP reading comprehension benchmark indicate that this tree-structured attention enables efficient language processing while preserving accuracy and enhancing structural interpretability. These results highlight the promise of genome-inspired attention mechanisms for optimizing the scalability of large language models.
Metrics
DOI:
Submission ID:
Downloads
Posted
How to Cite
Declaration of Competing Interests
The authors declare no competing interests to disclose.
Copyright
The copyright holder for this preprint is the author/funder.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.