Paper Readings

A collection of my notes and summaries from reading research papers in AI, machine learning, and related fields.

Transformer Architecture

Attention Residuals

Attention Residuals: A Comprehensive Understanding

March 19, 2026 15-20 min read

This paper addresses a fundamental problem in training deep transformer models: uncontrolled hidden-state magnitude growth as model depth increases. The authors propose Attention Residuals (AttnRes),...


More paper readings coming soon!