Intro to Attention


A brief introduction to attention in the transformer architecture.
Read more ⟶

Sliding Window Attention


Altering the tokens to which a token in the input sequence attends.
Read more ⟶

Sparse Attention


Reducing the number of calculations to compute attention.
Read more ⟶