Mining for Mnemonic Haiku with Elixir

Written by Pete Corey on Mar 5, 2018.

What are some interesting things we can do with the BIP-39 mnemonic generator we built in a previous article? The most obvious answer would be to create seeds for Bitcoin wallets, but I’m not looking for obvious today.

What if we sift through the generated mnemonics looking for structurally sound haiku? That sounds like a good time and a great excuse to play with Elixir!

Put on your lab coat and grab your Morty, we’re going full mad scientist today!

Mining for Haiku

If you’re not familiar, haiku are poems with a strictly defined structure. Haiku are three lines in length. The first and last lines are five syllables long, and the middle line is seven syllables in length.

Haiku traditionally focus on nature and the juxtaposition of two contrasting idea. Our haiku… will not.

exhibit horror
make obscure arrive unveil
detail law pig prize

Thinking broadly, our process for generating mnemonic haiku will be very similar to the process we used to mine for Bitcoin vanity addresses. We’ll repeatedly generate mnemonic sequences, filtering out those that don’t satisfy the structural criteria of a haiku.

Let’s create a new module to hold this functionality:


defmodule Bip39Haiku do
end

Inside our new Bip39Haiku module, we’ll create a new stream/0 function that returns an Elixir stream of valid mnemonic haiku, sketching out helper functions as we go:


def stream do
  fn ->
    Bip39Haiku.Mnemonic.generate()
  end
  |> Stream.repeatedly()
  |> Stream.filter(&Bip39Haiku.Haiku.is_valid?/1)
end

We create an anonymous function that generates a BIP-39 mnemonic with a call to our previously implemented Bip39Haiku.Mnemonic.generate/0. Next, we pass our anonymous function into Stream.repeatedly/1 to create an infinite stream of mnemonic sequences. Lastly, we use Stream.filter/2 to filter out mnemonic sequences that aren’t haiku.

All we have to do is implement Bip39Haiku.Haiku.is_valid?/1!

Validating Haiku

Our Bip39Haiku.Haiku.is_valid?/1 function will return true when given a wordlist that satisfies our structural criteria, and false in all other cases. Let’s start fleshing it out:


defmodule Bip39Haiku.Haiku do
  def is_valid?(wordlist) do
  end
end

The first thing we need to do is associate each word in our wordlist with its number of syllables. We’ll do this with a call to a attach_syllables/1 helper function:


wordlist = attach_syllables(wordlist)

For now we’ll wave our hands over the implementation of attach_syllables/1, but let’s assume the the result is a list of tuples:


[{"cat", 1}, {"tomato", 3}, {"pizza", 2}]

The first element of each tuple contains the original word in the wordlist, and the second element of the tuple is the number of syllables in that word.

Filtering Unknown Words

Sometimes, our attach_syllables/1 function might be given a word it doesn’t recognize. In that case, it’ll claim the word has zero syllables. Since we don’t know how many syllables are actually in the word, we can’t use it to construct a haiku. In that case we’ll have to assume the entire wordlist is invalid.


with nil <- Enum.find(wordlist, &zero_syllables?/1) do
  ...
else
  _ -> false
end

We can use Elixir’s with special form to model our happy path. If we can’t find any words with zero syllables, we’re in the clear and are free to move on with our structural checks. Otherwise, we’ll return false.

The zero_syllables?/1 function is a simple helper to find words with 0 syllables:


defp zero_syllables?({_, 0}), do: true
defp zero_syllables?({_, _}), do: false

Dropping Syllables

In general, our plan of attack for validating that a given wordlist is a haiku is to attempt to remove each line of syllables, one after the other. If we can successfully remove each line of syllables, and there are no words left in the wordlist, we have a haiku. If anything goes wrong along the way, we don’t have a haiku.

We can update our happy path to express this strategy:


with nil <- Enum.find(wordlist, &zero_syllables?/1),
     {:ok, wordlist} <- drop_syllables(wordlist, 5),
     {:ok, wordlist} <- drop_syllables(wordlist, 7),
     {:ok, wordlist} <- drop_syllables(wordlist, 5) do
  Enum.empty?(wordlist)
else
  _ -> false
end

Each call to drop_syllables/2 accepts a wordlist, removes the specified number of syllables from the wordlist (if possible), and returns the truncated list as a result.

The first step in writing drop_syllables/2 is to walk through our word list, accumulating the total number of syllables we’ve seen up until that point in the wordlist. We can do this with a little help from Elixir’s Enum.scan/3:


total_syllables =
  wordlist
  |> Enum.scan(0, fn {word, syllables}, total ->
    total + syllables
  end)

The total_syllables list for our previous example of ["cat", "tomato", "pizza"] would look like this:


[1, 4, 6]

Now all we have to do is find the index of our total_syllables list whose value, the total number of syllables seen up until that word, matches the number of syllables we’re looking to remove from wordlist:


index =
  total_syllables
  |> Enum.find_index(&(&1 == syllables))

If we can’t find an index, that means it’s not possible to split off the desired number of syllables from the beginning of our wordlist. In that case we return an error tuple:


case index do
  nil ->
    {:error, :not_possible}

  index ->
    {:ok, Enum.drop(wordlist, index + 1)}
end

Otherwise, we return our truncated wordlist, dropping every word through our found index.

Attaching Syllables

Let’s turn our attention to the previously glossed over attach_syllables/1 function.

Counting the number of syllables in a word is easy for us humans, but much more difficult for computers. To help out our computing allies, we’ll be using the Wordnik API to fetch the syllables in a given word.

Our attach_syllables/1 helper function will asynchronously look up the syllables in each of the words in wordlist with a call to a new Bip39Haiku.Wordnik.get_syllables/1 function:


defp attach_syllables(wordlist) do
  wordlist
  |> Enum.map(
    &Task.async(fn ->
      {&1, Bip39Haiku.Wordnik.get_syllables(&1)}
    end)
  )
  |> Enum.map(&Task.await(&1, :infinity))
end

Once we’ve fetched the syllables from Wordnik, attach_syllables/1 will pair the syllables with their associated words in a tuple, just as we expect.

Fetching Syllables

Because this is a mad science experiment and not a serious software development project, we’ll keep our get_syllables/1 function as simple as possible. To reiterate, get_syllables/1 should accept a word and return the number of syllables in that word:


defmodule Bip39Haiku.Wordnik do
  def get_syllables(word) do
  end
end

The first thing we need to do to fetch syllable information from Wordnik is to load our Wordnik API key:


wordnik_api_key =
  Application.fetch_env!(
    :bip39_haiku,
    :wordnik_api_key
  )

In our application’s configuration, we’ll pull the API key from the system’s environment variables, falling back to Wordnik’s demo API key if one isn’t available:


config :bip39_haiku,
  wordnik_api_key:
    System.get_env("WORDNIK_API_KEY") ||
      "a2a73e7b926c924fad7001ca3111acd55af2ffabf50eb4ae5"

Next, we’ll build the URL for the API endpoint we want to use. In this case, we’ll be using Wordnik’s hyphenation endpoint:


endpoint =
  "http://api.wordnik.com/v4/word.json/#{word}/hyphenation?api_key=#{
    wordnik_api_key
  }"

Lastly, we’ll make our request. If the request is successful, we’ll parse the HTTPotion response object with a call to our parse_response/1 helper. Otherwise, we’ll naively try to fetch the syllables again with a recursive call to get_syllables/1:


case HTTPotion.get(endpoint) do
  %HTTPotion.Response{
    status_code: 200,
    body: body
  } ->
    parse_response(body)

  _ ->
    get_syllables(word)
end

Our parse_response/1 function decodes the resulting JSON string returned by the API and counts the number of unique syllables in the provided word:


defp parse_response(body) do
  body
  |> Poison.decode!()
  |> Enum.map(& &1["seq"])
  |> Enum.uniq()
  |> length
end

That’s it!

Now that we’ve finished all of the components of our mnemonic haiku miner, let’s put it to the test!

On Being a Respectful API Consumer

It’s worth noting that in its current incarnation, our get_syllables/1 function is incredibly inefficient. It will hit the Wordnik API for every word passed into it, even if its seen that word before. This isn’t very respectful, and will surely result in our application running up against the API’s rate limits.

An immediate and significant optimization for this haiku miner would be to add a database layer that stores each word along with its syllable count after receiving each result from the Wordnik API. Subsequent calls to get_syllables/1 could avoid calls to the Wordnik API by returning cached results from the database.

That said, this article is already too long, and the value of a BIP-39 mnemonic haiku generator is questionable at best, so I’ll leave this improvement as an exercise for the reader.

A Warning - Don’t Use These Seeds!

Before I end this article, I feel that it’s important to mention in big, bold letters that the mnemonics generated with this tool should not be used to manage real Bitcoin wallets.

By restricting your wallet’s seed entropy to just the subset of random bytes that result in the generation of a structurally sound haiku, you’re drastically reducing the practical security of your wallet.

This project is only intended to be an experiment and a learning exercise.

Final Thoughts

Now that our haiku miner is finished, we can revel in the beauty of our cryptographically secure, randomly generated poetry.

useless squeeze topic
blind lawsuit quit tube hamster
reason empower

Beauty is in the eye of the beholder, I guess.

This was definitely a weird tangent, but this idea has been rolling around in the back of my head for weeks now. Now that I’ve built my mnemonic haiku miner, maybe I’ll find some peace. Be sure to check out the whole project on Github.

If you find this kind of Bitcoin development interesting, I highly recommend you check out Andreas Antonopoulos’ Mastering Bitcoin. Most of the examples in the book are written in Python and C, but as my previous Master Bitcoin articles demonstrate, Bitcoin development is perfectly suited for Elixir.

If you have any other ideas for mad science experiments with anything Elixir or Bitcoin related, let me know on Twitter!

Fear is the Mind Killer

Written by Pete Corey on Feb 26, 2018.

I’ve had a very stressful week. Stressful times at work don’t come often for me, but when they hit, they hit hard.

I’m working on a Node.js-based data ingestion service for a client. Everything is going well. The service is nearly finished, and I’m ready to start testing against simulated production load.

Because the requirements are vague, and the rules surrounding the system are in a constant state of flux, I’ve built the system in a way to prioritize understanding over performance. Given that, I expect the final solution to be somewhat slow. I’m ready for it. “If the ingestion is too slow”, I tell myself, “I’ll throw on a few layers of caching and things will be fine!”

I’m wrong.

I start to test the system against simulated production loads, and I realize that it’s ridiculously slow. Unusably slow. Unfixably slow.

The Fear starts to set in. I begin digging into each layer of my system, adding cache layers, swapping list traversals for map lookups, offloading work to the database, adding indexes, and throwing every trick I have at the problem. I’m not getting anywhere and real panic starts to wash over me.

That kind of fear always brings ugly, and often unrelated things to the surface. What if I’m just not good enough to fix this? What if I lose this contract? How will I find another job? What happens when my runway runs out and I can’t pay my bills?

I stop.

I breathe.

I force myself to recognize that right now, my stress levels are the deterrent that is keeping me from solving this problem. Every question that rushed into my mind while I was in a pit of fear is intangible and irrelevant.

I start to dig into what I’ve built with a new set of eyes. Seeing the forest for the trees, I realize that the architecture of my solution just wouldn’t work. Some rough, back of the napkin math revealed that it was approximately a O(n^4) solution, with gargantuan values of n.

Rather than give into anguish, I start thinking about other ways to solve the problem. How can I structure the solution to process data in large swaths, rather than sequentially? I start to form a solution. After a day of thinking and sketching, I have a roughly working prototype. Amazingly, the new prototype is orders of magnitude faster than my original solution. It’s working!

I log off on a high note.


The next day I get to work finalizing my new solution.

After another day’s work guided by my existing test suite, I’m back up to feature parity with my original solution. Testing against the simulated production load shows that it’s still several orders of magnitude faster than my original solution.

However, I soon notice that the ingestion process isn’t going as I’d expect. I seem to be skipping over large swaths of data.

Some investigation into my test suite shows that my new solution is broken for a particular data path; the data path used by the production simulation data. The simulated production data fails to satisfy a guard, and a huge portion of the new code I wrote is bypassed. No wonder my new solution is so fast. It isn’t doing anything.

Instantly the panic from yesterday washes over me. It’s even worse this time. Now I’ve actually tried, and I’ve failed. Thoughts about incompetence and existential dread start to well up, but again, I force myself to stop.

I breathe.

Intuitively and intellectually I know that this new solution is a better one. I just need to find the problem and fix it. After an hour or so of debugging, the problem reveals itself. It was a bug in how I was filtering data. A simple fix with massive performance gains.

Bolstered by this fix, I go on to make several more performance optimizations. At the end of the day, my new solution, now functioning, is even faster than it was in the morning.


All this is to say that stress, panic, and fear are all very real and unavoidable aspects of professional software development. At least for me.

It always strikes me as interesting that such an inherently analytical field can foster such intense emotions in people. Maybe it has to do with always working at the edge of your skill and the limit of your knowledge. Once you walk off that edge, it can be terrifying.

I don’t have many tools for coping with stress. My main tactic is to acknowledge it and put it out of my mind. Sometimes that can be easier said than done.

When I’m deep in the pit of fear, I try to remind myself that the pit has never done anything for me. Once you find yourself there, the first step is always to drag yourself out. It’s only once you’re out that you can assess your situation with a calm, rational mind, and begin the real work of fixing your situation.

I must not fear. Fear is the mind-killer. Fear is the little-death that brings total obliteration. I will face my fear. I will permit it to pass over me and through me. And when it has gone past I will turn the inner eye to see its path. Where the fear has gone there will be nothing. Only I will remain.
Dune

How do you handle stress as a software developer? How often does it hit you? Is your fear a constant, subtle thing, or does it hit you like a tsunami, wash you out to sea, and force you to swim for shore?

Let me know.

From Bytes to Mnemonic using Elixir

Written by Pete Corey on Feb 19, 2018.

I’m still working my way through Andreas Antonopoulos’ amazing Mastering Bitcoin book. In the chapter on Bitcoin wallets, he describes how deterministic wallet seeds can be represented as mnemonic groups of words.

Mnemonics are generated by hashing, appending, and chunking up a random series of bytes into a list of numbers that can be evenly mapped onto a given word list.

Creating these mnemonic word sequences seems like the perfect opportunity to flex our Elixir muscles.

Deterministic wallets, mnemonics, and seeds?

This terminology may sound like gibberish, but the underlying ideas are simple.

At its core, a Bitcoin wallet is just a collection of private keys. In the most basic type of wallet, the collected keys are just randomly generated numbers. They’re not related to each other in any way. In more sophisticated wallets, each private key is generated by securely transforming the key that came before it. The initial source of entropy for these “deterministic wallets” is known as the wallet’s “seed”.

The primary benefit of using a deterministic wallet is that you only need to keep track of the wallet’s seed, not every private key contained within it. All of the primary keys used by the wallet can be regenerated from the seed.

BIP-39 attempts to make it easier for humans to remember these initial seeds. It does this by mapping the original source of entropy used to create the wallet’s seed into a sequence of short, easily memorizable words, called a mnemonic.

For example, the following BIP-39 style mnemonic maps to an initial random seed value of 0xEAF9C684F84EACA7C6B0CE08F77A6784:


turtle soda patrol vacuum turn fault
bracket border angry rookie okay anger

Isn’t the mnemonic much easier to remember?

How to Generate a Mnemonic

A a high level, the algorithm for generating a BIP-39 mnemonic looks like this:

  1. Generate sixteen to thirty two random bytes.
  2. Append a partial SHA-256 checksum.
  3. Map the resulting bits onto your word list.

Let’s build a new Elixir module, Bip39.Mnemonic to encapsulate this algorithm:


defmodule Bip39.Mnemonic do
end

Inside our module, we’ll create a generate/0 function that walks through each step of our high-level algorithm:


def generate do
  entropy
  |> attach_checksum
  |> map_onto_wordlist
end

Our generate/0 function will call an entropy/0 function, which will generate our initial random bytes for us. We’ll pass the result into attach_checksum/1, which will (as you’ve probably guessed) compute and append our partial checksum. Finally, we’ll map the resulting bits onto our wordlist with map_onto_wordlist/1.

Now all we have to do is flesh out these three functions!

Generating Entropy

Erlang, and Elixir by proxy, ships with all of the tools we need to generate our cryptographically secure source of entropy.

The BIP-39 algorithm works with an initial source of sixteen to thirty two bytes of random data. We’ll use Erlang’s :crypto.rand_uniform/2 to determine exactly how many bytes we’ll generate, and :crypto.strong_rand_bytes/1 to actually generate the bytes:


defp entropy do
  :crypto.rand_uniform(16, 32 + 1)
  |> :crypto.strong_rand_bytes()
end

You’ll notice that we’re setting the upper range in our call to :crypto.rand_uniform/2 to 32 + 1. This is because the upper limit is non-inclusive, and we want to utilize the full range of sixteen to thirty two bytes.

Attaching our Checksum

Once we’ve generated our source of entropy, we’ll need to calculate its checksum and append a piece of the resulting checksum to the end of our binary. Once again, Elixir ships with all of the tools we need.

Let’s start by sketching out our attach_checksum/1 function:


defp attach_checksum(entropy) do
end

We’ll use Erlang’s :crypto.hash/2 function to create a SHA-256 hash of our newly generated entropy binary:


hash = :crypto.hash(:sha256, entropy)

Mastering Bitcoin explains that we’ll only need to append a portion of this hash to our entropy binary. The exact number of bits we need to append depends on the number of bits of entropy we’re working with.


size =
  entropy
  |> bit_size
  |> div(32)

The size in bits of our partial checksum is the length of entropy, in bits, divided by 32. Now we can pattern match on the first size bits in our hash binary, and assign them to a new checksum variable:


<<checksum::bits-size(size), _::bits>> = hash

Finally, we’ll append the resulting checksum bits onto the end of our entropy binary:


<<entropy::bits, checksum::bits>>

That’s it!

Mapping onto a Wordlist

The real magic of the BIP-39 algorithm happens when we map the bits of our resulting binary sequence onto the two thousand forty eight words specified in the English wordlist described in the BIP-39 document.

Before we actually do the mapping, we’ll need to get this wordlist into our Elixir application. The simplest way of doing this is through a config value.

In our config/config.exs file, we’ll add the entire set of words within a word list sigil:


config :bip39, wordlist: ~w[
  abandon
  ability
  able
  about
  ...
]

Now that the wordlist is available to us, let’s start by defining our map_onto_wordlist/1 function:


defp map_onto_wordlist(entropy) do
end

Within our function, we can grab a reference to the wordlist we just placed in our application’s configuration:


wordlist =
  Application.fetch_env!(
    :bip39,
    :wordlist
  )

The actual mapping process is straight forward. We iterate over the provided entropy binary in chunks of eleven bits. Each chunk of eleven bits represents a number that we use as an index into our wordlist array:


for <<chunk::11 <- entropy>> do
  Enum.at(wordlist, chunk)
end

After iterating over our entropy binary and replacing each chunk of eleven bits with a word in our wordlist array, we’re left with our final mnemonic!

Tying it All Together

Now that our Bip39.Mnemonic is complete, we can take it for a test drive. Let’s call generate/0 in our new module to generate out first mnemonic sequence:


iex(1)> Bip39.Mnemonic.generate
["budget", "album", "fresh", "security", "pear", "water", "weird", "success",
 "ahead", "enrich", "brush", "impact", "ribbon", "board", "spider", "dismiss"]

Perfect!

Be sure to check out the full Bip39.Mnemonic module on Github, and if this kind of thing interests you and you want to dive deeper into the world of Bitcoin development, be sure to check out Mastering Bitcoin.