
ai-ml9 min
The Mathematics of Self-Attention: Deconstructing the Transformer
Inside the equation that powers modern AI
The Transformer architecture revolutionized AI with a single mechanism: self-attention. We break down the linear...
Akhilesh Yadav
Category
The deep mathematical foundations of artificial intelligence
3 articles

Inside the equation that powers modern AI
The Transformer architecture revolutionized AI with a single mechanism: self-attention. We break down the linear...

Why optimization in deep learning is a geometric problem
Gradient descent navigates high-dimensional loss landscapes shaped by curvature and saddle points. We explore the...

The Riemannian manifold of probability distributions
Information geometry equips the space of probability distributions with a Riemannian metric — the Fisher information....