Attention

Models like ChatGPT are Large Language Models (LLMs). An important concept in modern LLM architectures is thee attention mechanism. Suppose you pass the following as input to an LLM: What is the capital of France? The first thing the LLM will do is split this input into tokens. A token is just some combinations of characters. You can see an example of the tokenization outputs for the question below. $\colorbox{red}{What}\colorbox{magenta}{ is}\colorbox{green}{ the}\colorbox{orange}{ capital}\colorbox{purple}{ of}\colorbox{brown}{ France}\colorbox{cyan}?...

Attention

Introuction to Attention

Fancy Pants Attention Techniques