Meta's Open-Source Llama Models - What's the Catch?

To understand why Meta has open sourced the Llama family of models, it is important to understand how Meta makes money. Meta makes money from adverts. Almost their entire revenue comes from adverts (1). So, why have Meta invested so much money into the Llama models? Probably, to make more money from adverts. Here are some ways in which the open source release of the Llama models might help Meta make more money from adverts....

August 23, 2024

Introuction to Attention

Suppose you give an LLM the input What is the capital of France? The first thing the LLM will do is split this input into tokens. A token is just some combinations of characters. You can see an example of the tokenization outputs for the question below. $\colorbox{red}{What}\colorbox{magenta}{ is}\colorbox{green}{ the}\colorbox{orange}{ capital}\colorbox{purple}{ of}\colorbox{brown}{ France}\colorbox{cyan}?$ (This tokenization was produced using cl100k_base, the tokenizer used in GPT-3.5-turbo and GPT-4.) In this example we have $(n = 7)$ tokens....

March 30, 2024

LoRa

Let’s consider a weight matrix $W$. Typically, the weight matrices in a dense neural networks layers have full-rank. Full-rank means many different things mathematically. I think the easiest explanation of a $d$-dimensional matrix (let’s consider a square matrix $,M \in \mathbb{R}^{d,d}$) being full-rank is one in which the columns could be used to span (hit every point) in $d$-dimensional space. If you consider $d=3$, a matrix like \begin{equation} M = \begin{pmatrix} 1 & 0 & 1 \\ 1 & 0 & 1 \\ 1 & 1 & 0 \end{pmatrix} \end{equation}...

March 24, 2024