Introuction to Attention
Models like ChatGPT are Large Language Models (LLMs). An important concept in modern LLM architectures is thee attention mechanism. Suppose you pass the following as input to an LLM: What is the capital of France? The first thing the LLM will do is split this input into tokens. A token is just some combinations of characters. You can see an example of the tokenization outputs for the question below. $\colorbox{red}{What}\colorbox{magenta}{ is}\colorbox{green}{ the}\colorbox{orange}{ capital}\colorbox{purple}{ of}\colorbox{brown}{ France}\colorbox{cyan}?...