Paper Readings
Paper Readings
A collection of my notes and summaries from reading research papers in AI, machine learning, and related fields.
Transformer Architecture
Attention Residuals
Attention Residuals: A Comprehensive Understanding
March 19, 2026 15-20 min read
This paper addresses a fundamental problem in training deep transformer models: uncontrolled hidden-state magnitude growth as model depth increases. The authors propose Attention Residuals (AttnRes),...
More paper readings coming soon!