From Bytes to Mnemonic using Elixir

Written by Pete Corey on Feb 19, 2018.

I’m still working my way through Andreas Antonopoulos’ amazing Mastering Bitcoin book. In the chapter on Bitcoin wallets, he describes how deterministic wallet seeds can be represented as mnemonic groups of words.

Mnemonics are generated by hashing, appending, and chunking up a random series of bytes into a list of numbers that can be evenly mapped onto a given word list.

Creating these mnemonic word sequences seems like the perfect opportunity to flex our Elixir muscles.

Deterministic wallets, mnemonics, and seeds?

This terminology may sound like gibberish, but the underlying ideas are simple.

At its core, a Bitcoin wallet is just a collection of private keys. In the most basic type of wallet, the collected keys are just randomly generated numbers. They’re not related to each other in any way. In more sophisticated wallets, each private key is generated by securely transforming the key that came before it. The initial source of entropy for these “deterministic wallets” is known as the wallet’s “seed”.

The primary benefit of using a deterministic wallet is that you only need to keep track of the wallet’s seed, not every private key contained within it. All of the primary keys used by the wallet can be regenerated from the seed.

BIP-39 attempts to make it easier for humans to remember these initial seeds. It does this by mapping the original source of entropy used to create the wallet’s seed into a sequence of short, easily memorizable words, called a mnemonic.

For example, the following BIP-39 style mnemonic maps to an initial random seed value of 0xEAF9C684F84EACA7C6B0CE08F77A6784:


turtle soda patrol vacuum turn fault
bracket border angry rookie okay anger

Isn’t the mnemonic much easier to remember?

How to Generate a Mnemonic

A a high level, the algorithm for generating a BIP-39 mnemonic looks like this:

  1. Generate sixteen to thirty two random bytes.
  2. Append a partial SHA-256 checksum.
  3. Map the resulting bits onto your word list.

Let’s build a new Elixir module, Bip39.Mnemonic to encapsulate this algorithm:


defmodule Bip39.Mnemonic do
end

Inside our module, we’ll create a generate/0 function that walks through each step of our high-level algorithm:


def generate do
  entropy
  |> attach_checksum
  |> map_onto_wordlist
end

Our generate/0 function will call an entropy/0 function, which will generate our initial random bytes for us. We’ll pass the result into attach_checksum/1, which will (as you’ve probably guessed) compute and append our partial checksum. Finally, we’ll map the resulting bits onto our wordlist with map_onto_wordlist/1.

Now all we have to do is flesh out these three functions!

Generating Entropy

Erlang, and Elixir by proxy, ships with all of the tools we need to generate our cryptographically secure source of entropy.

The BIP-39 algorithm works with an initial source of sixteen to thirty two bytes of random data. We’ll use Erlang’s :crypto.rand_uniform/2 to determine exactly how many bytes we’ll generate, and :crypto.strong_rand_bytes/1 to actually generate the bytes:


defp entropy do
  :crypto.rand_uniform(16, 32 + 1)
  |> :crypto.strong_rand_bytes()
end

You’ll notice that we’re setting the upper range in our call to :crypto.rand_uniform/2 to 32 + 1. This is because the upper limit is non-inclusive, and we want to utilize the full range of sixteen to thirty two bytes.

Attaching our Checksum

Once we’ve generated our source of entropy, we’ll need to calculate its checksum and append a piece of the resulting checksum to the end of our binary. Once again, Elixir ships with all of the tools we need.

Let’s start by sketching out our attach_checksum/1 function:


defp attach_checksum(entropy) do
end

We’ll use Erlang’s :crypto.hash/2 function to create a SHA-256 hash of our newly generated entropy binary:


hash = :crypto.hash(:sha256, entropy)

Mastering Bitcoin explains that we’ll only need to append a portion of this hash to our entropy binary. The exact number of bits we need to append depends on the number of bits of entropy we’re working with.


size =
  entropy
  |> bit_size
  |> div(32)

The size in bits of our partial checksum is the length of entropy, in bits, divided by 32. Now we can pattern match on the first size bits in our hash binary, and assign them to a new checksum variable:


<<checksum::bits-size(size), _::bits>> = hash

Finally, we’ll append the resulting checksum bits onto the end of our entropy binary:


<<entropy::bits, checksum::bits>>

That’s it!

Mapping onto a Wordlist

The real magic of the BIP-39 algorithm happens when we map the bits of our resulting binary sequence onto the two thousand forty eight words specified in the English wordlist described in the BIP-39 document.

Before we actually do the mapping, we’ll need to get this wordlist into our Elixir application. The simplest way of doing this is through a config value.

In our config/config.exs file, we’ll add the entire set of words within a word list sigil:


config :bip39, wordlist: ~w[
  abandon
  ability
  able
  about
  ...
]

Now that the wordlist is available to us, let’s start by defining our map_onto_wordlist/1 function:


defp map_onto_wordlist(entropy) do
end

Within our function, we can grab a reference to the wordlist we just placed in our application’s configuration:


wordlist =
  Application.fetch_env!(
    :bip39,
    :wordlist
  )

The actual mapping process is straight forward. We iterate over the provided entropy binary in chunks of eleven bits. Each chunk of eleven bits represents a number that we use as an index into our wordlist array:


for <<chunk::11 <- entropy>> do
  Enum.at(wordlist, chunk)
end

After iterating over our entropy binary and replacing each chunk of eleven bits with a word in our wordlist array, we’re left with our final mnemonic!

Tying it All Together

Now that our Bip39.Mnemonic is complete, we can take it for a test drive. Let’s call generate/0 in our new module to generate out first mnemonic sequence:


iex(1)> Bip39.Mnemonic.generate
["budget", "album", "fresh", "security", "pear", "water", "weird", "success",
 "ahead", "enrich", "brush", "impact", "ribbon", "board", "spider", "dismiss"]

Perfect!

Be sure to check out the full Bip39.Mnemonic module on Github, and if this kind of thing interests you and you want to dive deeper into the world of Bitcoin development, be sure to check out Mastering Bitcoin.

Property Testing our Base58Check Encoder with an External Oracle

Written by Pete Corey on Feb 12, 2018.

Recently, we wrote a Base58Check encoder to power our Bitcoin private key and public address generator. Being the diligent developers that we are, we added a unit test to ensure that our encoder was working as we expected.

But was that enough?

Call me a coward, but relying on a solitary unit test based on a single example pulled from a wiki article doesn’t instill huge amounts of confidence in our solution.

Let’s thoroughly test our solution with the help of property-based testing tools and an external oracle!

Oracles and Property Testing

The Base58Check encoding algorithm has been implemented many times by many different developers. Wouldn’t it be great if we could automatically check our implementation against theirs?

We can!

In property-based testing vernacular, this is known as using an oracle. An oracle is another implementation of your solution that is known to be correct under some domain of inputs.

Thankfully, we have a perfect oracle in the form of the Bitcoin Explorer’s CLI tools. Bitcoin Explorer ships with a base58check-encode utility that Base58Check encodes any Base16 string with a given version byte:


> bx base58check-encode abc123 --version 0
17WWM7GLKg9

Given this oracle, we can thoroughly and concisely test our implementation with a single property. The primary desired property of our solution is that it should match the output of bx base58check-encode for all valid inputs.

Getting Comfortable with our Tools

Property testing is simple in concept, but more difficult in practice.

It’s easy to say that for any given binary and any given byte, the output of our solution should match the output of my oracle. Actually generating those inputs and coordinating those test executions is a whole different ball game.

Thankfully, the groundwork has already been laid for us, and there are plenty of Elixir-based property testing tools for us to chose from. For this exercise, let’s use StreamData.


To get our feet wet, let’s write a simple property test using StreamData that verifies the associative property of the Kernel.+/2 addition function:


property "addition is associative" do
  check all a <- integer(),
            b <- integer(),
            c <- integer() do
    l = Kernel.+(Kernel.+(a, b), c)
    r = Kernel.+(a, Kernel.+(b, c))
    assert l == r
  end
end

The property keyword defines our new property test with a short description of the property under test.

The check all block lets us define our automatically generated inputs and a function block that will use those inputs to make assertions about our property.

Put simply, we’re telling StreamData that we want three random integers: a, b, and c. For every set of a, b, and c, we want to verify that (a + b) + c equals a + (b + c).

StreamData does this by generating many (one hundred by default) random sets of a, b, and c and checking them against our assertions. If any assertion fails, StreamData will try to “shrink” the input set (a, b, and c, in this case) to the simplest possible failing test case and present it to us.


> mix test
.

Finished in 0.06 seconds
1 property, 0 failures

Thankfully, addition is associative, and our test passes!

Consulting the Oracle

Now let’s take the training wheels off and write a property test for our Base58Check encoder against our external oracle.

First, we’ll define a new test block:


property "gives the same results as bx base58check-encode" do
end

Within our test, we’ll generate two random variables, key and version:


check all key <- binary(min_length: 1),
          version <- byte() do
end

We’re telling StreamData that key can be any non-empty binary, and that version can be any byte.

Now that we have our set of test data, we’ll need to get the result of encoding key with version using our own implementation of the Base58Check encoding algorithm:


result = Base58Check.encode(key, <<version>>)

Next, we’ll use Elixir’s System.cmd to call bx base58check-encode, passing in our Base16-encoded key string and our version byte:


oracle =
  System.cmd("bx", [
    "base58check-encode",
    Base.encode16(key),
    "--version",
    "#{version}"
  ])
  |> elem(0)
  |> String.trim()

Now all that’s left to do is to verify that our result matches the output of our oracle:


assert result == oracle

If StreamData detects any failures in this assertion, it will simplify key and version to the simplest failing case and report the failure to us.

But thankfully, our implementation of the Base58Check encoding algorithm passes the test:


mix test
.

Finished in 1.0 seconds
1 property, 0 failures

Final Thoughts

I won’t pretend to be a property testing expert. I’m just a guy who’s read a few articles and who’s hopped on board the hype train. That said, property testing was the perfect tool for this job, and I can see it being an incredibly useful tool in the future. I’m excited to incorporate it into my testing arsenal.

If you’re interested in property-based testing, I recommend you check out Fred Hebert’s PropEr Testing, and Hillel Wayne’s articles on hypothesis testing with oracle functions and property testing with contracts.

Lastly, if you’re interested in Bitcoin development, I encourage you to check out Andreas Antonopoulos’ Mastering Bitcoin.

Mining for Bitcoin Vanity Addresses with Elixir

Written by Pete Corey on Feb 5, 2018.

We previously worked through the process of generating a Bitcoin private address and translating it into a shareable public address using only the tools and libraries shipped with Elixir and Erlang.

The guiding force behind that article was Andreas Antonopoulos’ excellent Mastering Bitcoin book.

Let’s take another bite out of Mastering Bitcoin and implement the algorithm Andreas describes for “mining for vanity addresses” at the end of chapter four. After we implement the basic algorithm, we’ll add our Elixir special sauce and turn it into a fully parallelized procedure.

What is a Vanity Address?

The concept of a vanity address is simple. It’s a normal Bitcoin public address that contains some sequence of desired characters.

For example, a random Bitcoin public address might look like the following:


1HKz4XU7ENT46ztEzsT83jRezyiDjvnBV8

On the live network, Bitcoin addresses always begin with 1, but the remaining characters are entirely random.

A vanity address might look like this:


1pete7qrCiembh8AEf1zRP2zn6nDsLoHC

You’ll notice that the first five characters of this address are 1pete. This isn’t an accident! I’ve intentionally sought out a public address that begins with my name, Pete, so people know who they’re sending their large sums of Bitcoin to.

While the term “mining” sounds intimidating, the actual process of generating these vanity addresses is laughably simple.

How do you Mine Vanity Addresses?

“Mining,” in this context, is just another term for repeatedly doing something until some condition is met. As in, “keep digging until you find gold!”

We’ll mine our vanity public address by repeatedly generating a private key, transforming it into a public address, and checking if the resulting address matches our desired pattern.

That’s it!

Building that in Elixir should be a walk in the park. We’ll start off by creating a new VanityAddress module and stubbing out a generate_private_key/2 function:


defmodule VanityAddress do
  def generate_private_key(regex, version \\ <<0x00>>)
end

Our generate_private_key/2 function expects a regex which represents the pattern we’re trying to find in a vanity address (like ~r/^1pete/), and a version byte that will used to indicate where this Bitcoin address will be used.

Within our generate_private_key/2 function, we’ll kick off the mining process by generating a random private key and transforming it into a public address:


private_key = PrivateKey.generate
public_address = PrivateKey.to_public_address(private_key)

If the public_address we generated matches the pattern we provided in our regex, we’ve successfully mined a vanity address! In that case, we’ll return the private_key. Otherwise, we’ll repeat the entire process with a recursive call to generate_private_key/2:


case public_address =~ regex do
  true -> private_key
  false -> generate_private_key(regex, version)
end

That’s all there is to it.

We can use our new generate_private_key/2 function in conjunction with the PrivateKey.to_public_address/2 function we built last time to view our newly mined vanity key:


VanityAddress.generate_private_key(~r/^1pete/)
|> PrivateKey.to_public_address

Congratulations; we’re miners!

Thinking in Parallel

The problem with our simple implementation of generate_private_key/2 is that it’s slow.

While it’s true that the mining algorithm is inherently slow, there are many optimizations we could make to the code we’ve written. The most obvious improvement that comes to mind when using a “process-oriented” programming language like Elixir is to parallelize the mining algorithm across multiple processes.

However, parallelizing our mining algorithm presents an interesting set of challenges.

Each individual call to generate_private_key/2 is completely synchronous and sequential. We won’t see much of a benefit by queuing up multiple concurrent calls to generate_private_key/2 on the same CPU core. That said, while we’re running generate_private_key/2 within a single process bound to a single CPU core, any other cores available to us are sitting idle.

Ideally, we could simultaneously run as many instances of our generate_private_key/2 execution as we have cores. The moment any of our parallel executions find a matching key, it would be returned to the caller.

Creating a Stream of Parallel Tasks

Elixir’s little known (to me) Task.async_stream/3 function is the tool we need to implement this functionality.

Task.async_stream/3 expects an enumerable as its first argument and a function to be applied concurrently to each element in the enumerable. Each element in the enumerable will have the provided function applied to it in a new process.

If we squint our eyes a little, we can see that this gives us what we need. The “enumerable” we pass into Task.async_stream/3 will really be an infinite stream of zero-argument anonymous functions. Each of those anonymous functions simply calls generate_private_key/2.

We’ll use Stream.cycle/2 to create an infinite stream of these functions:


[fn -> generate_private_key(regex, version) end]
|> Stream.cycle

The function that we want to run in parallel simply executes each of those passed in anonymous functions, one at a time, each in its own process:


|> Task.async_stream(fn f -> f.() end)

This is where our parallelization happens. Each call to generate_private_key/2 is happening in a new process, and Elixir’s scheduler will spread each new process out over the available cores in the system.

By default, Task.async_stream/3 will run up to System.schedulers_online/0 parallel instances of our generate_private_key/2 execution, and System.schedulers_online/0 defaults to the number of available CPU cores in the system. This means we’ll always have one instance of generate_private_key/2 running on each of our cores.

Perfect!

Filtering Our Stream

Task.async_stream/3 returns a stream that produces either an {:ok, value} tuple on success, or an {:exit, reason} tuple on failure. We don’t anticipate or care about failures in this situation, so we’ll nil them out with Stream.map/2:


|> Stream.map(fn
  {:ok, thing} -> thing
  _ -> nil
end)

Now we can use Stream.reject/2 to filter out any nil values from our mapped stream:


|> Stream.reject(&(&1 == nil))

Let’s wrap what we’ve done in a function called stream_private_keys/2 that accepts a regex and a version:


def stream_private_keys(regex, version \\ <<0x00>>) do
  [fn -> generate_private_key(regex, version) end]
  |> Stream.cycle
  |> Task.async_stream(fn f -> f.() end)
  |> Stream.map(fn
    {:ok, thing} -> thing
    _ -> nil
  end)
  |> Stream.reject(&(&1 == nil))
end

What we’re left with is a stream that will produce any number of valid Bitcoin vanity addresses for a given regex and version, using all of the available CPU cores on our system.

Putting Our Stream to Use

Our stream doesn’t actually do anything until we try to pull values out of it using a function from the Enum module. Let’s use Enum.take/2 to pull out three vanity Bitcoin addresses that match our desired pattern (123):


VanityAddress.stream_private_keys(~r/^123/)
|> Enum.take(3)
|> Enum.map(&PrivateKey.to_public_address/1)

["123avbA76Zk98jik3ymTHkjbjKKftAhJiZ",
 "123aGknGk5F2NPkB6e4e6pehVGL7gBR8az",
 "123hzDB1CxcyDwusfsb8Pfh3Ti2i4NQLGR"]

If we take a look at our CPU usage while our mining pipeline is chugging away, we’ll see that all of the CPUs on our machine are being fully utilized.

Success!

Final Thoughts

Spoiler alert: the process of mining for Bitcoin is nearly identical to mining for vanity addresses. Instead of hashing private keys and looking for a random leading string like 1pete, Bitcoin miners hash transaction data, looking for hashes that begin with some number of leading zeros corresponding to the current block difficulty.

There’s a huge amount of pomp and circumstance around the term “mining”, but at its core, it’s an incredibly simple and approachable idea.

Be sure to check out the VanityAddress module in my hello_bitcoin project on Github, and if this kind of thing is at all interesting to you, I highly recommend you pick up a copy of Andreas Antonopoulos’ Mastering Bitcoin.