Pete Corey Writing Work Contact

Bitcoin's Base58Check in Pure Elixir

Written by Pete Corey on Jan 8, 2018.

An important piece of the process of transforming a Bitcoin private key into a public address, as outlined in the fantastic Mastering Bitcoin book, is the Base58Check encoding algorithm.

The Bitcoin wiki has a great article on Base58Check encoding, and even gives an example implementation of the underlying Base58 encoding algorithm in C.

This algorithm seems especially well-suited to Elixir, so I thought it’d be a fun and useful exercise to build out Base58 and Base58Check modules to use in future Bitcoin and Elixir experiments.

Like Base64, but Less Confusing

Base58 is a binary-to-text encoding algorithm that’s designed to encode a blob of arbitrary binary data into human readable text, much like the more well known Base64 algorithm.

Unlike Base64 encoding, Bitcoin’s Base58 encoding algorithm omits characters that can be potentially confusing or ambiguous to a human reader. For example, the characters O and 0, or I and l can look similar or identical to some readers or users of certain fonts.

To avoid that ambiguity, Base58 simply removes those characters from its alphabet.

Shrinking the length of the alphabet we map our binary data onto from sixty four characters down to fifty eight characters means that we can’t simply group our binary into six-bit chunks and map each chunk onto its corresponding letter in our alphabet.

Instead, our Base58 encoding algorithm works by treating our binary as a single large number. We repeatedly divide that number by the size of our alphabet (fifty eight), and use the remainder of that division to map onto a character in our alphabet.

Implementing Base58 in Elixir

This kind of algorithm can neatly be expressed in Elixir. We’ll start by creating a Base58 module and adding our alphabet as a module attribute:

defmodule Base58 do
  @alphabet '123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz'

Inside our Base58 module, we’ll define an encode/2 function. If we pass encode a binary, we want to convert it into a number using Erlang’s :binary.decode_unsigned:

def encode(data, hash \\ "")
def encode(data, hash) when is_binary(data) do
  encode(:binary.decode_unsigned(data), hash)

Once converted, we pass our binary-come-number into a recursive call to encode/2 along with the beginning of our hash, an empty string.

For each recursive call to encode/2, we use div and rem to divide our number by 58 and find the reminder. We use that remainder to map into our @alphabet, and prepend the resulting character onto our hash:

def encode(data, hash) do
  character = <<, rem(data, 58))>>
  encode(div(data, 58), hash <> character)

We’ll continue recursing until we’ve divided our data down to 0. In that case, we’ll return the hash string we’ve built up:

def encode(0, hash), do: hash

This implementation of our Base58 encoded mostly works. We can encode any text string and receive correct results:

iex(1)> Base58.encode("hello")

Encoding Leading Zeros

However when we try to encode binaries with leading zero bytes, those bytes vanish from our resulting hash:

iex(1)> Base58.encode(<<0x00>> <> "hello")

That zero should become a leading "1" in our resulting hash, but our process of converting the initial binary into a number is truncating those leading bytes. We’ll need to count those leading zeros, encode them manually, and prepend them to our final hash.

Let’s start by writing a function that counts the number of leading zeros in our initial binary:

defp leading_zeros(data) do
  |> Enum.find_index(&(&1 != 0))

We use Erlang’s :binary.bin_to_list to convert our binary into a list of bytes, and Enum.find_index to find the first byte in our list that isn’t zero. This index value is equivalent to the number of leading zero bytes in our binary.

Next, we’ll write a function to manually encode those leading zeros:

defp encode_zeros(data) do
  <<, 0)>>
  |> String.duplicate(leading_zeros(data)) 

We simply grab the character in our alphabet that maps to a zero byte ("1"), and duplicate it as many times as we need.

Finally, we’ll update our initial encode/2 function to prepend these leading zeros onto our resulting hash:

def encode(data, hash) when is_binary(data) do
  encode_zeros(data) <> encode(:binary.decode_unsigned(data), hash)

Now we should be able to encode binaries with leading zero bytes and see their resulting "1" values in our final hash:

iex(1)> Base58.encode(<<0x00>> <> "hello")


Base58 + Checksum = Base58Check

Now that we have a working implementation of the Base58 encoding algorithm, we can implement our Base58Check algorithm!

Base58Check encoding is really just Base58 with an added checksum. This checksum is important to in the Bitcoin world to ensure that public addresses aren’t mistyped or corrupted before funds are exchanged.

At a high level, the process of Base58Check encoding a blob of binary data involves hashing that data, taking the first four bytes of the resulting hash and appending them to the end of the binary, and Base58 encoding the result.

We can implement Base58Check fairly easily using our newly written Base58 module. We’ll start by creating a new Base58Check module:

defmodule Base58Check do

In our module, we’ll define a new encode/2 function that takes a version byte and the binary we want to encode:

def encode(version, data)

Bitcoin uses the version byte to specify the type of address being encoded. A version byte of 0x00 means that we’re encoding a regular Bitcoin address to be used on the live Bitcoin network.

The first thing we’ll need to do is generate our checksum from our version and our data. We’ll do that in a new function:

defp checksum(version, data) do
  version <> data
  |> sha256
  |> sha256
  |> split

We concatenate our version and data binaries together, hash them twice using a sha256/1 helper function, and then returning the first four bytes of the resulting hash with a call to split/1.

split/1 is a helper function that pulls the first four bytes out of the resulting hash using binary pattern matching:

defp split(<< hash :: bytes-size(4), _ :: bits >>), do: hash

Our sha256/1 helper function uses Erlang’s :crypto.hash function to SHA-256 hash its argument:

defp sha256(data), do: :crypto.hash(:sha256, data)

We’ve wrapped this in a helper function to facilitate Elixir-style piping.

Now that we have our four-byte checksum, we can flesh out our original encode/2 function:

def encode(version, data) do
  version <> data <> checksum(version, data)
  |> Base58.encode

We concatenate our version, data, and the result of our checksum function together, and Base58 encode the result. That’s it!

Base58Check encoding our "hello" string with a version of <<0x00>> should give us a result of "12L5B5yqsf7vwb". We can go further and verify our implementation with an example pulled from the Bitcoin wiki:

iex(1)> Base58Check.encode(<<0x00>>, 
    <<0x01, 0x09, 0x66, 0x77, 0x60,
      0x06, 0x95, 0x3D, 0x55, 0x67,
      0x43, 0x9E, 0x5E, 0x39, 0xF8, 
      0x6A, 0x0D, 0x27, 0x3B, 0xEE>>)


Wrapping Up

If you’d like to see both modules in their full glory, I’ve included them in my hello_bitcoin repository on Github. Here’s a direct link to the Base58 module, and the Base58Check module, along with a simple unit test. If that repository looks familiar, it’s because it was used in a previous article on controlling a Bitcoin node with Elixir.

I highly suggest you read through Andreas Antonopoulos’ Mastering Bitcoin book if you’re at all interested in how the Bitcoin blockchain works, or Bitcoin development in general. His book has been my primary source of inspiration and information for every Bitcoin article I’ve written to date.

Things I Learned During the Advent of Code

Written by Pete Corey on Jan 1, 2018.

If you know me, you know that I’m a huge fan of programming challenges and code katas. Sometimes I think I missed my calling as a competitive programmer.

This past month I was able to satisfy my cravings for programming challenges with Eric Wastl’s Advent of Code.

Solving a programming challenge every day for nearly a month taught me interesting lessons about myself and my current programming language of choice: Elixir.

Here’s a quick rundown.

  • You’ll have good days. - There will be days where your mind is running on all cylinders. Solutions to problems will appear fully formed in your mind’s eye, and completing challenges is simply a matter of dictating your envisioned solution.

  • You’ll have bad days. - There will be other days where your mind feels like a tangle of rusty nails stuck together in a bucket of dried cement. Solutions will come slowly, if they come at all.

  • Most days will land somewhere in-between. - It’s important to try to avoid letting your perceived state of mind influence your decisions and determination. When I felt like I was having trouble, falling back to first principles always seemed to help devise a path forward.

  • Elixir’s Stream library is awesome. - Sasa Juric’s solutions to the first few days of Advent of Code inspired me to look into Elixir’s Stream module. I was so impressed with the module that I wrote an article on generating sequences with streams, and I went on to solve a good number of advent challenges with Stream-based generators.

  • Stick to the basics. - I found that my first instinct when sketching out a solution to a problem was to reach for functions shipped in standard modules like Enum, List, and Map. However, studying other people’s solutions made me realize that sticking to the basics of recursion and pattern matching can lead to more elegant, readable, and performant solutions.

  • Erlang ships with everything and the kitchen sink. - A few of this year’s challenges led me to standard modules shipped with Erlang and accessible from Elixir. Modules like :digraph, :digraph_utils, :gb_trees, and :binary are incredibly useful and overlook utilities.

  • You don’t want to split on empty strings. You want a list of graphemes. - Use String.graphemes/1 instead of String.codepoints/1 or String.split(&1, "") to split a string into its component characters. Read all about codepoints and grapheme clusters for more information.

  • Elixir’s “extended” regex modifier helps explain away the black magic. - Regexes are often cesspools of black magic. It often takes just as long, if not longer, to understand a written regex than to write it from scratch. Thankfully, Elixir’s “extended” modifier lets you explain away some of the black magic in your regexes by splitting them across multiple lines and interleaving comments.

  • Studying other people’s solutions can teach powerful lessons. - Nearly every language-specific tip or technique I learned during this year’s Advent of Code challenge came from studying other people’s solutions after I implemented my own. Reading other people’s code is, hands down, the most powerful tool for improving your own code.

I had a lot of fun with this year’s Advent of Code challenges. I learned quite a bit about Elixir and about myself. I even made some good friends in the #adventofcode channel of the Elixir Slack group.

I’ll definitely be doing this again next year.

If you didn’t do this year’s Advent of Code, it’s not too late! Go check out the problems, and if you’re curious, check out my solutions on Github.

Do you know that a man is not dead while his name is still spoken?

Written by Pete Corey on Dec 25, 2017.

In my free time I’ve been making many trips to Discworld over the past few months. I’ve been rereading my favorites like Going Postal, and devouring other classics like Hogfather for the first time.

I thought I’d take a pause this Christmas and offer up my contribution to an ongoing tribute to the late Terry Pratchett.

You know they’ll never really die while the Trunk is alive… It lives while the code is shifted, and they live with it, always Going Home.
— Terry Pratchett, Going Postal

The Clacks, as described in Going Postal, is a system of trans-continental semaphore towers used to send messages across great distances. Each packet of text sent through the Clacks is encoded using a set of standardized codes and encodings.

Sounds familiar, doesn’t it?

The codes G, N, and U, stand for “send the message on”, “do not log the message”, and “turn the message around at the end of the line” respectively. In Going Postal, these codes are used to send a character’s name perpetually up and down the Clacks.

As long as his name is spoken, he will never die.

We can write a Plug function to easily broadcast our own Clacks overhead messages from our Phoenix server:

pipeline :browser do
  plug :gnu_terry_pratchett

def gnu_terry_pratchett(conn, options) do
  Plug.Conn.put_resp_header(conn, "X-Clacks-Overhead", "GNU Terry Pratchett")

Of course, this is all a fantasy. The Clacks aren’t real, Terry’s name won’t live forever in the overhead, and any X-Clacks-Overhead messages we send will simply be ignored by the world at large.

That being said, “humans need fantasy to be human”.

“Tooth fairies? Hogfathers? Little—”
“So we can believe the big ones?”
— Terry Pratchett, Hogfather