# Reversing BIP-39 and the Power of Property Testing

I was recently asked how I would go about reversing the BIP-39 encoding algorithm we implemented previous and used to build our BIP-39 haiku miner.

Implementing the reverse of this algorithm seemed straight-forward at first, but it quickly led me down a rabbit hole that showed me just how powerful property-based testing can be.

Read on!

## What is BIP-39 Again?

If you’re asking yourself, “what is BIP-39 again?”, I highly recommend you check out the first article in this series, “From Bytes to Mnemonic using Elixir”, for a full rundown of the BIP-39 encoding algorithm.

As a quick summary, the BIP-39 encoding algorithm is intended to convert an arbitrary set of bytes into an easily memorizable sequence of words. The algorithm goes something like this:

- Have bytes you want to encode.
- Append a partial checksum to the end of your bytes.
- Map every eleven bit chunk of the resulting binary onto a wordlist of your choice.

The devil is in the detail, as we’ll see.

## Laying the Groundwork

Before we write the reverse of our BIP-39 encoding algorithm, we need to lay some initial groundwork.

The `Bip39.Mnemonic`

module we built in the previous article only had a single public function: `generate/0`

. The `generate/0`

function generated a random set of bytes and converted it into a BIP-39-style mnemonic.

```
def generate do
entropy
|> attach_checksum
|> map_onto_wordlist
end
```

Moving forward, we should separate the encoding functionality from the entropy generation so we can test the encoding algorithm independently, with our own data. This will simplify the testing of our final solution.

```
def generate do
encode(generate_entropy)
end
def encode(data) do
data
|> attach_checksum
|> map_onto_wordlist
end
```

For clarity’s sake, we’ve also renamed the `entropy/0`

function to `generate_entropy`

.

Great. Now that we have an `encode/1`

function that encodes a given binary, we’re set up to add a new function, `decode/1`

, that reverses the process and returns the binary data decoded from a given mnemonic.

```
def decode(mnemonic) do
...
end
```

## Decoding the Mnemonic

The high-level process for reversing our BIP-39 algorithm and decoding our mnemonic into a binary looks something like this:

- Maps the words in the mnemonic back into a binary.
- Separate the appended partial checksum from the encoded data.
- Verify that the appended checksum matches the actual checksum of the data.

Sounds like a plan.

The first step of decoding our mnemonic in our `decode/1`

function is to convert our encoded mnemonic wordlist back into a binary.

First, we’ll map each word onto its index in our `@wordlist`

. Next, we’ll convert each index into an eleven-bit binary and reduce that list of binaries down into a single concatenated binary:

```
data_and_checksum =
mnemonic
|> Enum.map(&Enum.find_index(@wordlist, fn w -> w == &1 end))
|> Enum.reduce(<<>>, fn n, acc -> <<acc::bits, n::11>> end)
```

What we’re left with is our originally encoded data concatenated with a variable-length partial checksum.

We know that the variable-length checksum is `1/32`

the length of our originally encoded data. Given that, we also know that the length of our originally encoded data is `32/33`

the length of `data_and_checksum`

. The concatenated checksum will fill the remaining space:

```
total_size = bit_size(data_and_checksum)
data_size = div(total_size * 32, 33)
checksum_size = total_size - data_size
```

Now that we know the structure of `data_and_checksum`

, we can pull out the individual pieces we care about using binary pattern matching:

```
<<data::bits-size(data_size), partial_checksum::bits-size(checksum_size)>> =
data_and_checksum
```

Fantastic.

Now all that’s left to do is verify that the `partial_checksum`

provided matches the calculated checksum of the provided `data`

binary. If the checksums match, we’ll return an `:ok`

tuple containing the decoded `data`

. Otherwise, we’ll return an `:error`

tuple explaining the situation:

```
if <<data::bits, partial_checksum::bits>> == attach_checksum(data) do
{:ok, data}
else
{:error, :bad_checksum}
end
```

That’s it!

We can now `encode/1`

a given binary into a mnemonic wordlist, and then `decode/1`

it to retrieve the original binary.

## Putting our Solution to the Test

Now that we’ve built our `Bip39.Mnemonic.encode/1`

and `Bip39.Mnemonic.decode/1`

functions, we need to test that our encoding and decoding process is working as expected.

Testing an encoder/decoder pair is perfectly suited to property-based testing, so we’ll use the StreamData library to test our solution. We’ll set up a new test module, `Bip39MnemonicTest`

, that scaffolds out a new property test for our mnemonic encoder:

```
defmodule Bip39MnemonicTest do
use ExUnit.Case
use ExUnitProperties
property "encodes and decodes mnemonics" do
end
end
```

The property that we’re trying to test is that a given binary is equal to its encoded mnemonic, decoded back into a binary. We can test this fairly easily with StreamData.

We know that the BIP-39 algorithm only supports encoding data between sixteen and thirty two bits:

The allowed size of ENT is 128-256 bits.

Given that, we’ll generate a stream of random binaries that fall within that size range:

```
check all data <- binary(min_length: 16, max_length: 32) do
end
```

Next, we’ll generate the `mnemonic`

for our randomly generated `data`

binary, and `assert`

that the decoded `mnemonic`

matches our original `data`

:

```
mnemonic = Bip39.Mnemonic.encode(data)
assert Bip39.Mnemonic.decode(mnemonic) == {:ok, data}
```

If all goes well, our test should pass.

## An Under-Specified Encoder

Unfortunately, things rarely go as planned.

Our new test seems to run through several successful iterations of the encode/decode assertion, but ultimately fails. Thankfully, StreamData shrinks the failing test as much as possible and gives us the failing input:

```
1) property encodes and decodes mnemonics (Bip39MnemonicTest)
test/bip39_mnemonic_test.exs:5
Failed with generated values (after 20 successful runs):
* Clause: data <- binary(min_length: 16, max_length: 32)
Generated: <<0, 0, 0, 0, 55, 157, 129, 190, 93, 189, 119, 124, 164, 131, 5, 67, 23, 225, 251, 162, 200>>
Assertion with == failed
code: assert Bip39.Mnemonic.decode(mnemonic) == {:ok, data}
left: {:error, :bad_checksum}
right: {:ok, <<0, 0, 0, 0, 55, 157, 129, 190, 93, 189, 119, 124, 164, 131, 5, 67, 23, 225, 251, 162, 200>>}
```

After an intense debugging session, I realized that there was nothing wrong with my `Bip39.Mnemonic.decode/1`

function. Instead, the problem was with my encoder.

The BIP-39 specification clearly states that in addition to being “128-256 bits” in length, the length of the binary data being encoded must also be a multiple of thirty two bits:

The mnemonic must encode entropy in a multiple of 32 bits.

Ignoring this requirement results in issues when generating and appending the partial checksum, and results in data loss during the decoding procedure.

To accommodate this requirement, let’s update our property test to truncate all generated binaries to the nearest thirty two bits:

```
check all bytes <- binary(min_length: 16, max_length: 32),
bits_to_truncate = bytes |> bit_size |> rem(32),
<<_::size(bits_to_truncate), data::bits>> = bytes do
mnemonic = Bip39.Mnemonic.encode(data)
assert Bip39.Mnemonic.decode(mnemonic) == {:ok, data}
end
```

Now our test passes, as expected!

## Tightening Up our Encoding Process

While our `Bip39.Mnemonic.encode/1`

functions works when passed the correct data, it’s probably not a good idea to assume that the developer knows what constitutes “good data”.

Instead, let’s refactor `Bip39.Mnemonic.encode/1`

to enforce the length requirements outlined in the BIP-39 specification.

Let’s update the function head to assert that `data`

is a binary, assert that its length falls between one hundred twenty eight and two hundred fifty six bits, and assert that its length in bits is a multiple of thirty two:

```
def encode(data)
when is_binary(data) and bit_size(data) >= 128 and bit_size(data) <= 256 and
rem(bit_size(data), 32) == 0 do
{:ok,
data
|> attach_checksum
|> map_onto_wordlist}
end
```

If all of these requirements hold, we’ll return the encoded `data`

wrapped in an `:ok`

tuple. Otherwise, we need to return an `:error`

tuple. We can do this with a second `encode/1`

function head:

```
def encode(_), do: {:error, :invalid_data}
```

Wrapping our `Bip39.Mnemonic.encode/1`

result in an `:ok`

tuple breaks our test. We’ll need to fix that:

```
check all bytes <- binary(min_length: 16, max_length: 32),
bits_to_truncate = bytes |> bit_size |> rem(32),
<<_::size(bits_to_truncate), data::bits>> = bytes do
{:ok, mnemonic} = Bip39.Mnemonic.encode(data)
assert Bip39.Mnemonic.decode(mnemonic) == {:ok, data}
end
```

We should also add property tests to ensure that invalid binaries can’t be encoded by mistake:

First we’ll test that short binaries are rejected:

```
property "rejects short binaries" do
check all bits <- integer(1..8),
<<_::size(bits), data::bits>> <- binary(max_length: 16) do
assert Bip39.Mnemonic.encode(data) == {:error, :invalid_data}
end
end
```

Next, we’ll test that long binaries are rejected:

```
property "rejects long binaries" do
check all bits <- integer(1..8),
bytes <- binary(min_length: 32),
data = <<bytes::binary, 0::size(bits)>> do
assert Bip39.Mnemonic.encode(data) == {:error, :invalid_data}
end
end
```

And finally, we’ll test that all “misaligned” binaries, or binaries who’s lengths don’t align to thirty two bits, are rejected:

```
property "rejects misaligned binaries" do
check all data <- bitstring(min_length: 129, max_length: 256),
data |> bit_size |> rem(32) != 0 do
assert Bip39.Mnemonic.encode(data) == {:error, :invalid_data}
end
end
```

Perfect. Now I’m fully confident in our BIP-39 encode/decode solution.

## Final Thoughts

While this seemingly simple task threw me down a rabbit hole I definitely didn’t expect, I’m grateful for the experience. This showed me in a very hands-on way just how powerful property-based testing can be. Without randomly generated test cases, I don’t think I would have recognized the issues with my `encode`

function.

If you’d like to see the BIP-39 encoder/decoder’s source in its entirity, be sure to check out the entire `Bip39`

project on Github.

I’d like to thank Pierre Martin for bringing up the topic of reversing our BIP-39 algorithm. After talking with me on the Elixir Slack group, he filed a Github issue with his solution to the problem. I highly recommend you check out his approach for a more fleshed out solution.