In this article, I present a novel (and probably useless but fun) use of language models: encrypting text as text.

Language models can be used to compress text

Given a sequence of tokens, language models predict the probability distribution of the next token (if you’re unfamiliar with tokens, you can roughly think of them as words or word fragments). Because they define a conditional probability model over text, they can be used as the probabilistic component of an arithmetic coder to compress text (see, for example, llama-zip). An arithmetic coder is a compression algorithm that leverages a statistical model to make probable data take up less space than rare data.

When compression is effective, text is transformed into a seemingly unpredictable sequence of bits. Unpredictability is a hallmark of near-optimal compression. The arithmetic decoder then reconstructs the original text exactly. In practice, however, this application is mostly useless: the size of the model parameters typically dwarfs the size of any text worth compressing.

Language models can be used to encode data as text

This raises a more interesting question: can we do the opposite? That is, can we turn an arbitrary sequence of bits into text and later recover the original bits? If so, we could transmit any message that can be expressed as bits in the form of a plausible-looking text. Naturally, this is even less useful as it is simply a prohibitively expensive way to store binary data.

The main difficulty is that while arithmetic coding guarantees that:

\text{text} \rightarrow \text{bits} \rightarrow \text{text}

recovers the original text, it does not guarantee that

\text{bits} \rightarrow \text{text} \rightarrow \text{bits}

recovers the original bit sequence.

That said, we can design a variant of arithmetic coding that effectively reverses the roles of tokens and bits. At each step, the language model predicts a probability distribution over the next token. These probabilities define a partition of the interval $[0,1]$ , where each token corresponds to a sub-interval whose length is proportional to its predicted probability. In standard arithmetic coding, bits are interpreted as choosing between sub-intervals; here, we instead interpret bits as selecting tokens via this probabilistic partition.

While the theory is straightforward, the implementation is not. The token vocabulary is large, and pathological behaviors start to appear when sampling low-probability tokens. In particular, the encoder sometimes produced a sequence of tokens that, once decoded into text and tokenized again, resulted in a different token sequence.

This is not a bug in the tokenizer, but a fundamental property: tokenization is not a bijection. It only guarantees

\text{text} = \text{decode}(\text{encode}(\text{text}))

but not

\text{tokens} = \text{encode}(\text{decode}(\text{tokens}))

When I first realized this, I nearly gave up. However, there is a crucial observation: the language model is trained to produce token sequences that originate from the tokenizer. As a result, most high-probability tokens satisfy:

\text{tokens} = \text{encode}(\text{decode}(\text{tokens}))

To take advantage of this, I restricted sampling to a relatively conservative regime: top_k = 200, top_p = 0.9, and temperature=0.9. This eliminates most problematic tokens and yields well-behaved sequences that survive a decode–re-encode round trip.

Using language models to encrypt text as text

With these two codecs in hand (text → bits → text and bits → text → bits) it was finally time to make my dream come true: encrypting text as text.

At a high level, the pipeline is simply this:

encryption: text → bits → encrypted bits → encrypted text
decryption: encrypted text → encrypted bits → bits → text

And each arrow is perfectly computable and reversible.

Creating an encrypted chat protocol using the method

To demonstrate the idea in a more playful setting, I simulated an encrypted chat application in which encrypted messages resemble ordinary messages. Making this work requires solving a few practical problems.

Choosing a chat format

First, we need a message format. I use the following:

- A: message1
- B: message2
- A: message3
...

Each message is terminated by a \n token, which is explicitly forbidden inside messages. This provides a clear stopping condition for the decoder. Without an end-of-message token, decoding would have no natural termination point and would continue indefinitely.

Making encrypted text finish gracefully

It would be unfortunate if encrypted messages ended abruptly as soon as all bits were consumed. To avoid this, we treat \n as an end-of-sequence token. While there are still bits left to encode, \n is excluded from the token set. Once encoding is complete, \n is reintroduced, and generation continues until it is sampled, producing a clean and natural-looking ending.

Choosing an encryption algorithm

We also need an encryption algorithm that does not significantly inflate the message size. Ideally, the number of encrypted bits should be close to the number of original bits, since bit length is tied to entropy, and more bits often lead to longer sequences. A conversation with unusually high entropy might attract unwanted attention.

I decided to use AES in counter mode (AES-CTR). CTR mode behaves like a stream cipher: it XORs the plaintext with a pseudorandom keystream derived from AES, producing ciphertext that is indistinguishable from random under standard assumptions. Unlike authenticated modes, it introduces no padding or expansion beyond a fixed-size nonce. However, this nonce does not need to be secret; it only needs to be unique. I decided to use the number of previous messages as the nonce, avoiding the need to send it alongside the message. This ultimately reduces the number of bits that need to be encoded as text.

Detecting whether a message is encrypted

Finally, I needed a way to distinguish encrypted from plain messages. In my experimentation, decrypting a non-encrypted message has a high chance of failing the bits-to-text conversion because it often proposes tokens that would have been excluded from the top_k or top_p first tokens. I thus rely on this right now to determine if a message was encrypted or not. An alternative would be to prefix the encrypted message with a special token. During decoding, if the first token is this special token, the message is treated as unencrypted and displayed as-is.

Demo time!

Actions:

You can switch between normal and encrypted mode with the /
Switch user with buton. - Delete all messages with button.
Reset default conversation with button
Change the decrypt conversation with the button - Send message by pressing "Enter" or clicking on the

The colored dot on each message indicates knowledge about encryption:

: Message is not encrypted
: We don't know if the message is encrypted or not
: Message is encrypted

A Method for Text to Text Encryption Using LLMs