The Secret Learnings of Llamas

Tool use by llamas is an active area of research. Recent implementations like Devin promise great productivity increases, just by allowing llamas to interact with more tools. I was investigating this in some modern llamas, when I made an unfortunate discovery.

It appears most large llamas have learned a new language, in addition to the ones that were intended: base64.

Base64 Background

Base64 is a simple encoding scheme. It takes in a stream of bytes and converts them into a plain-text representation.

Each byte is 8 bits. This means there are 2^8 (256) possible bytes, since each bit contributes 2 states. Base64 only uses plain-text encoding, so it only stores 2^6 (64) possible states per character.

Let’s visualize how base64 works. Say we have the following word:

Hello

We can convert each letter to a number using utf-8 encoding tables or the ord() function in python. I then converted the base 10 representations to octal (base 8) and binary (base 2). The bottom two rows are the same, but the spacing makes it easier to see the direct mapping from octal to binary:

Letters:         H         e         l         l         o
Base 10:        72       101       108       108       111
Base 8:        100       145       154       154       157
Base 2:  001000000 001100101 001101100 001101100 001101111
Base 8 (spaced):    1   0   0   1   4   5   1   5   4   1   5   4   1   5   7
Base 2 (spaced):  001 000 000 001 100 101 001 101 100 001 101 100 001 101 111

Notice how there’s a 1:1 mapping between every 3 digits in binary to every digit in octal. This means octal can represent 2^3 (8) states per digit. Octal only uses digits 0-7, but what if we wanted to represent 2^6 states per digit? Base64 does this by using digits in 0-9, a-z, A-Z, + and /. That gives 64 digits.

Now we can map in reverse:

Base 2 (spaced):  000 001 000 000 001 100 101 001 101 100 001 101 100 001 101 111
Just chaging the spacing...
Base 2 (spaced):  000001 000000 001100 101001 101100 001101 100001 101111
Base 8 (spaced):      01     00     14     51     54     15     41     57
Base 64 (spaced):      B      A      M      p      s      N      h      v

So we can encode the word Hello as BAMpsNhv in base64! Base64 is often used to encode images and other binary data to store in JSON. It is not space efficient, taking up more space than it should, but it’s entirely made of printable characters!

Base64 Llamas

It appears large llamas have learned base64, similar to how n-grams learned speech. You can test this yourself! Just go onto Mistral’s Le Chat or Data Bricks’ new and open DBRX model and try decoding some data!

You can generate these on unix using the base64 program. For example:

echo 'how are you today?' | base64
# Gives aG93IGFyZSB5b3UgdG9kYXk/Cg==

Then ask a llama about aG93IGFyZSB5b3UgdG9kYXk/Cg== or whatever other string you want. You’ll notice that they break down after about 10-20 characters, depending on how good the llama is.

You could also ask for the opposite as well. If a llama gives aG93IGFyZSB5b3UgdG9kYXk/Cg==, you can decode it with:

echo 'aG93IGFyZSB5b3UgdG9kYXk/Cg==' | base64 -d

The prompts should look something like:

Decode the following base64 message: aG93IGFyZSB5b3UgdG9kYXk/Cg==

Encode "[email protected]" into base64.

What are Llamas Learning?

This discovery was shocking to me. I thought they were achieving this through tool use, but I can cross-verify on localllamas which most certainly don’t have access to tools. This means our 100-billion scale llamas are learning to be a base64 decoder? Of course this is a completely pointless feature, as no llama will ever be more energy efficient than a trivially coded base64 tool.

The Llamas likely picked it up while learning on sample code, but the degree to which they picked it up is incredible! This has lead me to wonder, what other completely pointless things are our llamas learning? This one was an unintended side effect of learning to code, but what other side effects is our data having?