An important piece of the process of transforming a Bitcoin private key into a public address, as outlined in the fantastic Mastering Bitcoin book, is the Base58Check encoding algorithm.
This algorithm seems especially well-suited to Elixir, so I thought it’d be a fun and useful exercise to build out
Base58Check modules to use in future Bitcoin and Elixir experiments.
Like Base64, but Less Confusing
Base58 is a binary-to-text encoding algorithm that’s designed to encode a blob of arbitrary binary data into human readable text, much like the more well known Base64 algorithm.
Unlike Base64 encoding, Bitcoin’s Base58 encoding algorithm omits characters that can be potentially confusing or ambiguous to a human reader. For example, the characters
l can look similar or identical to some readers or users of certain fonts.
To avoid that ambiguity, Base58 simply removes those characters from its alphabet.
Shrinking the length of the alphabet we map our binary data onto from sixty four characters down to fifty eight characters means that we can’t simply group our binary into six-bit chunks and map each chunk onto its corresponding letter in our alphabet.
Instead, our Base58 encoding algorithm works by treating our binary as a single large number. We repeatedly divide that number by the size of our alphabet (fifty eight), and use the remainder of that division to map onto a character in our alphabet.
Implementing Base58 in Elixir
This kind of algorithm can neatly be expressed in Elixir. We’ll start by creating a
Base58 module and adding our alphabet as a module attribute:
defmodule Base58 do @alphabet '123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz' end
Base58 module, we’ll define an
encode/2 function. If we pass
encode a binary, we want to convert it into a number using Erlang’s
def encode(data, hash \\ "") def encode(data, hash) when is_binary(data) do encode(:binary.decode_unsigned(data), hash) end
Once converted, we pass our binary-come-number into a recursive call to
encode/2 along with the beginning of our hash, an empty string.
For each recursive call to
encode/2, we use
rem to divide our number by
58 and find the reminder. We use that remainder to map into our
@alphabet, and prepend the resulting character onto our
def encode(data, hash) do character = <<Enum.at(@alphabet, rem(data, 58))>> encode(div(data, 58), hash <> character) end
We’ll continue recursing until we’ve divided our
data down to
0. In that case, we’ll return the
hash string we’ve built up:
def encode(0, hash), do: hash
This implementation of our Base58 encoded mostly works. We can encode any text string and receive correct results:
iex(1)> Base58.encode("hello") "Cn8eVZg"
Encoding Leading Zeros
However when we try to encode binaries with leading zero bytes, those bytes vanish from our resulting hash:
iex(1)> Base58.encode(<<0x00>> <> "hello") "Cn8eVZg"
That zero should become a leading
"1" in our resulting hash, but our process of converting the initial binary into a number is truncating those leading bytes. We’ll need to count those leading zeros, encode them manually, and prepend them to our final hash.
Let’s start by writing a function that counts the number of leading zeros in our initial binary:
defp leading_zeros(data) do :binary.bin_to_list(data) |> Enum.find_index(&(&1 != 0)) end
We use Erlang’s
:binary.bin_to_list to convert our binary into a list of bytes, and
Enum.find_index to find the first byte in our list that isn’t zero. This index value is equivalent to the number of leading zero bytes in our binary.
Next, we’ll write a function to manually encode those leading zeros:
defp encode_zeros(data) do <<Enum.at(@alphabet, 0)>> |> String.duplicate(leading_zeros(data)) end
We simply grab the character in our alphabet that maps to a zero byte (
"1"), and duplicate it as many times as we need.
Finally, we’ll update our initial
encode/2 function to prepend these leading zeros onto our resulting hash:
def encode(data, hash) when is_binary(data) do encode_zeros(data) <> encode(:binary.decode_unsigned(data), hash) end
Now we should be able to encode binaries with leading zero bytes and see their resulting
"1" values in our final hash:
iex(1)> Base58.encode(<<0x00>> <> "hello") "1Cn8eVZg"
Base58 + Checksum = Base58Check
Now that we have a working implementation of the Base58 encoding algorithm, we can implement our Base58Check algorithm!
Base58Check encoding is really just Base58 with an added checksum. This checksum is important to in the Bitcoin world to ensure that public addresses aren’t mistyped or corrupted before funds are exchanged.
At a high level, the process of Base58Check encoding a blob of binary data involves hashing that data, taking the first four bytes of the resulting hash and appending them to the end of the binary, and Base58 encoding the result.
We can implement Base58Check fairly easily using our newly written
Base58 module. We’ll start by creating a new
defmodule Base58Check do end
In our module, we’ll define a new
encode/2 function that takes a version byte and the binary we want to encode:
def encode(version, data)
Bitcoin uses the
version byte to specify the type of address being encoded. A version byte of
0x00 means that we’re encoding a regular Bitcoin address to be used on the live Bitcoin network.
The first thing we’ll need to do is generate our checksum from our
version and our
data. We’ll do that in a new function:
defp checksum(version, data) do version <> data |> sha256 |> sha256 |> split end
We concatenate our
data binaries together, hash them twice using a
sha256/1 helper function, and then returning the first four bytes of the resulting hash with a call to
split/1 is a helper function that pulls the first four bytes out of the resulting hash using binary pattern matching:
defp split(<< hash :: bytes-size(4), _ :: bits >>), do: hash
defp sha256(data), do: :crypto.hash(:sha256, data)
We’ve wrapped this in a helper function to facilitate Elixir-style piping.
Now that we have our four-byte checksum, we can flesh out our original
def encode(version, data) do version <> data <> checksum(version, data) |> Base58.encode end
We concatenate our
data, and the result of our
checksum function together, and Base58 encode the result. That’s it!
Base58Check encoding our
"hello" string with a
<<0x00>> should give us a result of
"12L5B5yqsf7vwb". We can go further and verify our implementation with an example pulled from the Bitcoin wiki:
iex(1)> Base58Check.encode(<<0x00>>, <<0x01, 0x09, 0x66, 0x77, 0x60, 0x06, 0x95, 0x3D, 0x55, 0x67, 0x43, 0x9E, 0x5E, 0x39, 0xF8, 0x6A, 0x0D, 0x27, 0x3B, 0xEE>>) "16UwLL9Risc3QfPqBUvKofHmBQ7wMtjvM"
If you’d like to see both modules in their full glory, I’ve included them in my
hello_bitcoin repository on Github. Here’s a direct link to the
Base58 module, and the
Base58Check module, along with a simple unit test. If that repository looks familiar, it’s because it was used in a previous article on controlling a Bitcoin node with Elixir.
I highly suggest you read through Andreas Antonopoulos’ Mastering Bitcoin book if you’re at all interested in how the Bitcoin blockchain works, or Bitcoin development in general. His book has been my primary source of inspiration and information for every Bitcoin article I’ve written to date.