The Secret Learnings of Llamas
The Secret Learnings of Llamas
Tool use by llamas is an active area of research. Recent implementations like Devin promise great productivity increases, just by allowing llamas to interact with more tools. I was investigating this in some modern llamas, when I made an unfortunate discovery.
It appears most large llamas have learned a new language, in addition to the ones that were intended: base64.
Base64 Background
Base64 is a simple encoding scheme. It takes in a stream of bytes and converts them into a plain-text representation.
Each byte is 8 bits. This means there are 2^8 (256) possible bytes, since each bit contributes 2 states. Base64 only uses plain-text encoding, so it only stores 2^6 (64) possible states per character.
Let’s visualize how base64 works. Say we have the following word:
Hello
We can convert each letter to a number using utf-8 encoding
tables or the ord()
function in python. I then converted the base 10 representations to octal (base
8) and binary (base 2). The bottom two rows are the same, but the spacing makes
it easier to see the direct mapping from octal to binary:
Letters: H e l l o
Base 10: 72 101 108 108 111
Base 8: 100 145 154 154 157
Base 2: 001000000 001100101 001101100 001101100 001101111
Base 8 (spaced): 1 0 0 1 4 5 1 5 4 1 5 4 1 5 7
Base 2 (spaced): 001 000 000 001 100 101 001 101 100 001 101 100 001 101 111
Notice how there’s a 1:1 mapping between every 3 digits in binary to every digit
in octal. This means octal can represent 2^3 (8) states per digit. Octal only
uses digits 0-7, but what if we wanted to represent 2^6 states per digit? Base64
does this by using digits in 0-9
, a-z
, A-Z
, +
and /
. That gives 64
digits.
Now we can map in reverse:
Base 2 (spaced): 000 001 000 000 001 100 101 001 101 100 001 101 100 001 101 111
Just chaging the spacing...
Base 2 (spaced): 000001 000000 001100 101001 101100 001101 100001 101111
Base 8 (spaced): 01 00 14 51 54 15 41 57
Base 64 (spaced): B A M p s N h v
So we can encode the word Hello
as BAMpsNhv
in base64! Base64 is often used
to encode images and other binary data to store in JSON. It is not space
efficient, taking up more space than it should, but it’s entirely made of
printable characters!
Base64 Llamas
It appears large llamas have learned base64, similar to how n-grams learned speech. You can test this yourself! Just go onto Mistral’s Le Chat or Data Bricks’ new and open DBRX model and try decoding some data!
You can generate these on unix using the base64
program. For example:
Then ask a llama about aG93IGFyZSB5b3UgdG9kYXk/Cg==
or whatever other string
you want. You’ll notice that they break down after about 10-20 characters,
depending on how good the llama is.
You could also ask for the opposite as well. If a llama gives
aG93IGFyZSB5b3UgdG9kYXk/Cg==
, you can decode it with:
The prompts should look something like:
Decode the following base64 message: aG93IGFyZSB5b3UgdG9kYXk/Cg==
Encode "[email protected]" into base64.
What are Llamas Learning?
This discovery was shocking to me. I thought they were achieving this through tool use, but I can cross-verify on localllamas which most certainly don’t have access to tools. This means our 100-billion scale llamas are learning to be a base64 decoder? Of course this is a completely pointless feature, as no llama will ever be more energy efficient than a trivially coded base64 tool.
The Llamas likely picked it up while learning on sample code, but the degree to which they picked it up is incredible! This has lead me to wonder, what other completely pointless things are our llamas learning? This one was an unintended side effect of learning to code, but what other side effects is our data having?