Intro to Attention
A brief introduction to attention in the transformer architecture.
Read more ⟶
Sliding Window Attention
Altering the tokens to which a token in the input sequence attends.
Read more ⟶
Sparse Attention
Reducing the number of calculations to compute attention.
Read more ⟶