LanguageFold: A Bio-inspired Hierarchical Sparse Attention Mechanism for Large Language Models
摘要
Large language models predominantly rely on the Transformer architecture, whose self-attention mechanism incurs a quadratic computational cost O(N2) with respect to input length, leading to significant memory and computation bottlenecks when processing ultra-long contexts. This work proposes LanguageFold, a hierarchical sparse attention mechanism inspired by the Self-Returning Random Walk model of genome folding (Huang et al. 2020). LanguageFold decomposes global attention into dynamically constructed tree attention with a theoretical scaling of O(NlogN). Preliminary experiments on prompt-based generation and the DROP reading comprehension benchmark indicate that this tree-structured attention enables efficient language processing while preserving accuracy and enhancing structural interpretability. These results highlight the promise of genome-inspired attention mechanisms for optimizing the scalability of large language models.
指标
DOI:
Submission ID:
下载次数
已发布
如何引用
利益冲突声明
作者声明无任何需要披露的利益冲突。
Copyright
本预印本的版权持有者为作者/资助方。

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.