Reversing BIP-39 and the Power of Property Testing

Written by Pete Corey on May 7, 2018.

I was recently asked how I would go about reversing the BIP-39 encoding algorithm we implemented previous and used to build our BIP-39 haiku miner.

Implementing the reverse of this algorithm seemed straight-forward at first, but it quickly led me down a rabbit hole that showed me just how powerful property-based testing can be.

Read on!

What is BIP-39 Again?

If you’re asking yourself, “what is BIP-39 again?”, I highly recommend you check out the first article in this series, “From Bytes to Mnemonic using Elixir”, for a full rundown of the BIP-39 encoding algorithm.

As a quick summary, the BIP-39 encoding algorithm is intended to convert an arbitrary set of bytes into an easily memorizable sequence of words. The algorithm goes something like this:

  1. Have bytes you want to encode.
  2. Append a partial checksum to the end of your bytes.
  3. Map every eleven bit chunk of the resulting binary onto a wordlist of your choice.

The devil is in the detail, as we’ll see.

Laying the Groundwork

Before we write the reverse of our BIP-39 encoding algorithm, we need to lay some initial groundwork.

The Bip39.Mnemonic module we built in the previous article only had a single public function: generate/0. The generate/0 function generated a random set of bytes and converted it into a BIP-39-style mnemonic.


def generate do
  entropy
  |> attach_checksum
  |> map_onto_wordlist
end

Moving forward, we should separate the encoding functionality from the entropy generation so we can test the encoding algorithm independently, with our own data. This will simplify the testing of our final solution.


def generate do
  encode(generate_entropy)
end

def encode(data) do
  data
  |> attach_checksum
  |> map_onto_wordlist
end

For clarity’s sake, we’ve also renamed the entropy/0 function to generate_entropy.

Great. Now that we have an encode/1 function that encodes a given binary, we’re set up to add a new function, decode/1, that reverses the process and returns the binary data decoded from a given mnemonic.


def decode(mnemonic) do
  ...
end

Decoding the Mnemonic

The high-level process for reversing our BIP-39 algorithm and decoding our mnemonic into a binary looks something like this:

  1. Maps the words in the mnemonic back into a binary.
  2. Separate the appended partial checksum from the encoded data.
  3. Verify that the appended checksum matches the actual checksum of the data.

Sounds like a plan.

The first step of decoding our mnemonic in our decode/1 function is to convert our encoded mnemonic wordlist back into a binary.

First, we’ll map each word onto its index in our @wordlist. Next, we’ll convert each index into an eleven-bit binary and reduce that list of binaries down into a single concatenated binary:


data_and_checksum =
  mnemonic
  |> Enum.map(&Enum.find_index(@wordlist, fn w -> w == &1 end))
  |> Enum.reduce(<<>>, fn n, acc -> <<acc::bits, n::11>> end)

What we’re left with is our originally encoded data concatenated with a variable-length partial checksum.

We know that the variable-length checksum is 1/32 the length of our originally encoded data. Given that, we also know that the length of our originally encoded data is 32/33 the length of data_and_checksum. The concatenated checksum will fill the remaining space:


total_size = bit_size(data_and_checksum)
data_size = div(total_size * 32, 33)
checksum_size = total_size - data_size

Now that we know the structure of data_and_checksum, we can pull out the individual pieces we care about using binary pattern matching:


<<data::bits-size(data_size), partial_checksum::bits-size(checksum_size)>> =
  data_and_checksum

Fantastic.

Now all that’s left to do is verify that the partial_checksum provided matches the calculated checksum of the provided data binary. If the checksums match, we’ll return an :ok tuple containing the decoded data. Otherwise, we’ll return an :error tuple explaining the situation:


if <<data::bits, partial_checksum::bits>> == attach_checksum(data) do
  {:ok, data}
else
  {:error, :bad_checksum}
end

That’s it!

We can now encode/1 a given binary into a mnemonic wordlist, and then decode/1 it to retrieve the original binary.

Putting our Solution to the Test

Now that we’ve built our Bip39.Mnemonic.encode/1 and Bip39.Mnemonic.decode/1 functions, we need to test that our encoding and decoding process is working as expected.

Testing an encoder/decoder pair is perfectly suited to property-based testing, so we’ll use the StreamData library to test our solution. We’ll set up a new test module, Bip39MnemonicTest, that scaffolds out a new property test for our mnemonic encoder:


defmodule Bip39MnemonicTest do
  use ExUnit.Case
  use ExUnitProperties

  property "encodes and decodes mnemonics" do
  end
end

The property that we’re trying to test is that a given binary is equal to its encoded mnemonic, decoded back into a binary. We can test this fairly easily with StreamData.

We know that the BIP-39 algorithm only supports encoding data between sixteen and thirty two bits:

The allowed size of ENT is 128-256 bits.

Given that, we’ll generate a stream of random binaries that fall within that size range:


check all data <- binary(min_length: 16, max_length: 32) do
end

Next, we’ll generate the mnemonic for our randomly generated data binary, and assert that the decoded mnemonic matches our original data:


mnemonic = Bip39.Mnemonic.encode(data)
assert Bip39.Mnemonic.decode(mnemonic) == {:ok, data}

If all goes well, our test should pass.

An Under-Specified Encoder

Unfortunately, things rarely go as planned.

Our new test seems to run through several successful iterations of the encode/decode assertion, but ultimately fails. Thankfully, StreamData shrinks the failing test as much as possible and gives us the failing input:


1) property encodes and decodes mnemonics (Bip39MnemonicTest)
   test/bip39_mnemonic_test.exs:5
   Failed with generated values (after 20 successful runs):
     
       * Clause:    data <- binary(min_length: 16, max_length: 32)
         Generated: <<0, 0, 0, 0, 55, 157, 129, 190, 93, 189, 119, 124, 164, 131, 5, 67, 23, 225, 251, 162, 200>>
     
   Assertion with == failed
   code:  assert Bip39.Mnemonic.decode(mnemonic) == {:ok, data}
   left:  {:error, :bad_checksum}
   right: {:ok, <<0, 0, 0, 0, 55, 157, 129, 190, 93, 189, 119, 124, 164, 131, 5, 67, 23, 225, 251, 162, 200>>}

After an intense debugging session, I realized that there was nothing wrong with my Bip39.Mnemonic.decode/1 function. Instead, the problem was with my encoder.

The BIP-39 specification clearly states that in addition to being “128-256 bits” in length, the length of the binary data being encoded must also be a multiple of thirty two bits:

The mnemonic must encode entropy in a multiple of 32 bits.

Ignoring this requirement results in issues when generating and appending the partial checksum, and results in data loss during the decoding procedure.

To accommodate this requirement, let’s update our property test to truncate all generated binaries to the nearest thirty two bits:


check all bytes <- binary(min_length: 16, max_length: 32),
          bits_to_truncate = bytes |> bit_size |> rem(32),
          <<_::size(bits_to_truncate), data::bits>> = bytes do
  mnemonic = Bip39.Mnemonic.encode(data)
  assert Bip39.Mnemonic.decode(mnemonic) == {:ok, data}
end

Now our test passes, as expected!

Tightening Up our Encoding Process

While our Bip39.Mnemonic.encode/1 functions works when passed the correct data, it’s probably not a good idea to assume that the developer knows what constitutes “good data”.

Instead, let’s refactor Bip39.Mnemonic.encode/1 to enforce the length requirements outlined in the BIP-39 specification.

Let’s update the function head to assert that data is a binary, assert that its length falls between one hundred twenty eight and two hundred fifty six bits, and assert that its length in bits is a multiple of thirty two:


def encode(data)
    when is_binary(data) and bit_size(data) >= 128 and bit_size(data) <= 256 and
           rem(bit_size(data), 32) == 0 do
  {:ok,
   data
   |> attach_checksum
   |> map_onto_wordlist}
end

If all of these requirements hold, we’ll return the encoded data wrapped in an :ok tuple. Otherwise, we need to return an :error tuple. We can do this with a second encode/1 function head:


def encode(_), do: {:error, :invalid_data}

Wrapping our Bip39.Mnemonic.encode/1 result in an :ok tuple breaks our test. We’ll need to fix that:


check all bytes <- binary(min_length: 16, max_length: 32),
          bits_to_truncate = bytes |> bit_size |> rem(32),
          <<_::size(bits_to_truncate), data::bits>> = bytes do
  {:ok, mnemonic} = Bip39.Mnemonic.encode(data)
  assert Bip39.Mnemonic.decode(mnemonic) == {:ok, data}
end

We should also add property tests to ensure that invalid binaries can’t be encoded by mistake:

First we’ll test that short binaries are rejected:


property "rejects short binaries" do
  check all bits <- integer(1..8),
            <<_::size(bits), data::bits>> <- binary(max_length: 16) do
    assert Bip39.Mnemonic.encode(data) == {:error, :invalid_data}
  end
end

Next, we’ll test that long binaries are rejected:


property "rejects long binaries" do
  check all bits <- integer(1..8),
            bytes <- binary(min_length: 32),
            data = <<bytes::binary, 0::size(bits)>> do
    assert Bip39.Mnemonic.encode(data) == {:error, :invalid_data}
  end
end

And finally, we’ll test that all “misaligned” binaries, or binaries who’s lengths don’t align to thirty two bits, are rejected:


property "rejects misaligned binaries" do
  check all data <- bitstring(min_length: 129, max_length: 256),
            data |> bit_size |> rem(32) != 0 do
    assert Bip39.Mnemonic.encode(data) == {:error, :invalid_data}
  end
end

Perfect. Now I’m fully confident in our BIP-39 encode/decode solution.

Final Thoughts

While this seemingly simple task threw me down a rabbit hole I definitely didn’t expect, I’m grateful for the experience. This showed me in a very hands-on way just how powerful property-based testing can be. Without randomly generated test cases, I don’t think I would have recognized the issues with my encode function.

If you’d like to see the BIP-39 encoder/decoder’s source in its entirity, be sure to check out the entire Bip39 project on Github.

I’d like to thank Pierre Martin for bringing up the topic of reversing our BIP-39 algorithm. After talking with me on the Elixir Slack group, he filed a Github issue with his solution to the problem. I highly recommend you check out his approach for a more fleshed out solution.

Visualizing the Oplog with Splunk

Written by Pete Corey on Apr 30, 2018.

I recently found myself investigating a mysterious occurrence in a production Meteor application. Seemingly randomly, without any obvious connection to user activity or periodic job activity, our Meteor server would spike to one hundred precent CPU consumption and stay pegged there until it was restarted.

After investigating nearly every hunch I could come up with, I was left with very few courses of action. My final theory was that a massive influx of MongoDB operations were flooding into our database. Any concerned observers listening within our Meteor application would be overwhelmed trying to catch up with the changes and consume all available CPU cycles on the server.

In order to test this theory, I wanted to plot the MongoDB Oplog as a time series chart and compare it against the timeline of known CPU spikes, looking for any correlations.

I had many options for how to approach this problem, but I decided to use Splunk to visualize and explore the Oplog data. I’m very happy with how Splunk performed, and I can see myself using it again.


I was interested in all Oplog events that happened in the twenty-four hour period surrounding a known CPU spike at 22:55 UTC on April 23rd, 2018. I fired up Studio 3T and ran the following query against the oplog.rs collection of my MongoDB database:


db['oplog.rs'].find({
  $and: [
    {ts: {$gte: new Timestamp(1524480600,1)}},
    {ts: {$lte: new Timestamp(1524567000,1)}}
  ]
});

The above query returned over seven hundred fifty thousand results, which I was able to export into a JSON file using Studio 3T (Studio 3T is the only MongoDB client I’ve found that supports saving an entire set of query results to file).


Once those seven hundred fifty thousand Oplog events were exported to disk, I was able to upload them directly into a Splunk index. Splunk gracefully parsed the JSON data and flattened each object into a neatly searchable collection.

With the data available in Splunk, I was free to start exploring.

My first step was to plot a time chart of all of the Oplog events. Given the large amount of data I was working with, I decided to bin my events into five minute bands:

index="oplog"
| timechart span=5m count

An overview of our data.

Interestingly, an obvious dip in Oplog events occurred around the time of the observed CPU spike. This is the exact opposite of what I expected to see given my working hypothesis.

Zooming in on the dip.

Investigating further, I decided to plot a time series for every type of Oplog event, based on the op field:

index="oplog"
| timechart span=1m count by op

To improve clarity, I also focused on a narrower time range, reduced my bin size, and switched to a log scale Y axis.

Everything working as intended.

This new chart shows that insert (i) and update (u) operations completely stop during the dip, but no-op (n) operations continue as usual. This seemed to indicate that the database was healthy, but the Meteor application stopped making insert and update requests.

This makes sense. If our server was eating up available CPU cycles, it probably wouldn’t find the time to query the database.


After visualizing the Oplog events around several of these CPU spikes, it became painfully obvious that my working hypothesis was not correct. There wasn’t any influx of database operations prior to a spike, and any dips in database activity were easily attributable to server restarts.

So now we’re back to square one. Was all of this for nothing?

Absolutely not!

When you’re investigating a problem, proving that something is not the cause of the problem can be incredibly valuable. By repeatedly narrowing down the possible set of culprits, we simplify the problem in our minds and make the real cause that much easier to find.


After spending more time digging into this issue, I’m convinced that it’s related to “fork bombs” crippling the server, and discussed in this issue filed against the Meteor project.

That said, this exploration proved to be incredibly valuable. By proving to myself that obverserver overload was not the cause of the spikes, I was able to rule out a huge swatch of potential fixes.

I was also able to spend some time trying out a fantastic new tool. I’m sure I’ll find myself using Splunk again in the future.

Connecting an Elixir Node to the Bitcoin Network

Written by Pete Corey on Apr 23, 2018.

I’ve been writing about implementing the nuts and bolts of the Bitcoin protocol in Elixir, and while I’ve developed a rough understanding of how Bitcoin works at a fundamental level, Bitcoin’s peer-to-peer network, the thing that makes block propagation possible, has always felt like a mysterious black box.

The Bitcoin white paper waves its hands over the implementation of the network, and even Andreas Antonopoulos’ Mastering Bitcoin skims over the implementation details of building a network-ready Bitcoin node.

While its true that the workings of this network are completely tangential to the workings of Bitcoin itself, I still find it extremely interesting. Since I first started diving into the world of Bitcoin development, I’ve wanted to build a simple node that connects to the network.

The Elixir programming language gives us some fantastic tools to implement a server in the peer-to-peer network. Let’s see how far they can take us!

Constructing our GenServer

Since this is our first foray into Bitcoin’s peer-to-peer network, we’ll stick to simple goals. Let’s lay down a list of things we’d like to accomplish:

  1. Create and maintain a connection to a single Bitcoin node.
  2. Fetch and count the list of that node’s peers.

These goals help inform us about the architecture of our final solution. Since we’re only attempting to connect to a single Bitcoin node, we should be able to model our node nicely with a single GenServer.

We’ll start by creating a new module, BitcoinNetwork.Node, to house our new GenServer:


defmodule BitcoinNetwork.Node do
  use GenServer
end

Next, we’ll tell our Node which Bitcoin peer node we want to connect to by passing in a tuple of the target node’s ip and port:


def start_link({ip, port}) do
  GenServer.start_link(__MODULE__, %{ip: ip, port: port})
end

Lastly, we’ll pass ip and port along to our init/1 function within a map. For now, we’ll simply log our intentions to connect to the specified node:


def init(state = %{ip: ip, port: port}) do
  [:bright, "Connecting to #{BitcoinNetwork.IP.to_string(ip)}:#{port}."]
  |> log()
end

The log/1 helper function adds some extra debug information to our log output, and uses IO.ANSI.format/2 to inject color into our logs:


defp log(message) do
  [:bright, :black, "[#{inspect(self())}] ", :reset, message]
  |> IO.ANSI.format()
  |> IO.puts()
end

Supervising our GenServer

Now that we’ve sketched out its basic shape, we can spawn an instance of our BitcoinNetwork.Node GenServer when our application starts up. We’ll modify the start/2 callback in our application’s entry point module, BitcoinNetwork.Application, to create a :one_for_one supervisor that supervises a single instance of our BitcoinNetwork.Node GenServer:


def start(_type, _args) do
  Supervisor.start_link(
    [
      {BitcoinNetwork.Node,
       {Application.get_env(:bitcoin_network, :ip),
        Application.get_env(:bitcoin_network, :port)}}
    ],
    strategy: :one_for_one
  )
end

We’re pulling the ip and port of the Bitcoin node we’re trying to connect to from our application’s configuration. We should set those values in our config.exs file:


config :bitcoin_network, ip: <<
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0xFF, 0xFF, 
    127, 0, 0, 1
  >>
config :bitcoin_network, port: 18333

For now, we’ll assume that we’re trying to connect to a testnet Bitcoin node running on localhost:18333. This port and address can be changed to point to any node on the Bitcoin network.

Working with IPs

You might have noticed that in our initial logging statement made a call out to the BitcoinNetwork.IP.to_string/1 function, and our application configuration specified the IP address of the node we’re connecting to as a binary.

The Bitcoin network supports both IPv4 and IPv6 addresses, and expects these addresses to be represented in their binary form when sent across the network, so for convenience we’ll represent all IP addresses in our application as sixteen-byte binaries. These binaries can be difficult to work with, so we’ll create a few helper functions in the BitcoinNetwork.IP module to make them easier to handle.

First, we’ll need to see the IP address of the peer node we’re connecting to, so we’ll need some way of transforming the IP address binary into a string. See BitcoinNetwork.IP.to_string/1:


def to_string(binary) do
  binary
  |> :binary.bin_to_list()
  |> Enum.chunk_every(4)
  |> Enum.map(&:binary.list_to_bin/1)
  |> Enum.map(&Base.encode16/1)
  |> Enum.join(":")
end

The Erlang tools we’ll use to manage our TCP connections will expect IP addresses in the form of eight-element tuples, rather than a binaries. BitcoinNetwork.IP.to_tuple/1 will handle that conversion for us:


def to_tuple(binary) do
  binary
  |> :binary.bin_to_list()
  |> Enum.chunk_every(2)
  |> Enum.map(&:binary.list_to_bin/1)
  |> Enum.map(&:binary.decode_unsigned/1)
  |> List.to_tuple()
end

Now we can freely shift between our base binary representation and the strings and tuple forms we’ll need to log addresses and connect to our nodes.

Hello, Bitcoin Network

If we were to spin up our application at this point, we’d see our log statement about how our BitcoinNetwork.Node GenServer is attempting to connect to our local Bitcoin node.

The first run of our application.

Unfortunately, nothing is actually happening. Let’s fix that.

Our first step in communicating with a Bitcoin node is connecting to it. The Bitcoin protocol plays out over TCP, so we’ll use Erlang’s :gen_tcp to manage our TCP connection to our peer node.

In our init/1 callback, let’s add a call to :gen_tcp.connect/3 that connect to our specified ip and port:


{:ok, socket} =
  :gen_tcp.connect(BitcoinNetwork.IP.to_tuple(ip), port, [:binary, active: true])

We’re specifying a :binary option on our connection, which means that we want any incoming packets to be delivered as binaries, rather than lists of bytes. We’re also specifying that our connection is active, which means that incoming messages will be sent to the current processes as messages which we’ll need to handle in handle_info/2 callbacks.

As an aside, pattern matching on an :ok tuple works fine for now, but a more robust solution would more gracefully handle connection failures. Failing to connect to a Bitcoin node is a fairly common occurance.

Once connected to our node, we’ll add the resulting socket to our process’ state and return from our init/1 callback:


{:ok, Map.put_new(state, :socket, socket)}

Now if we spin up our application, if all goes well, we’ll actually be connected to a Bitcoin node!

Building a Version Message

Opening a TCP connection to a Bitcoin node doesn’t get us far. Initially, the node we connect to won’t acknowledge our presence until we send it a “version” message. This messages acts as a handshake and introduces ourselves to the new node.

Let’s create a BitcoinNetwork.Protocol.Version struct to represent our version message:


defmodule BitcoinNetwork.Protocol.Version do
  defstruct version: nil,
            services: nil,
            timestamp: nil,
            recv_ip: nil,
            recv_port: nil,
            from_ip: nil,
            from_port: nil,
            nonce: nil,
            user_agent: nil,
            start_height: nil
end

Next, we’ll create a BitcoinNetwork.Protocol protocol that defines a serialize/1 function. We’ll use this protocol to establish a serialization interface that will be implemented by all of our network structure representations:


defprotocol BitcoinNetwork.Protocol do
  def serialize(_)
end

We’ll start by creating an implementation of that serialize/1 function for our Version struct:


defimpl BitcoinNetwork.Protocol, for: BitcoinNetwork.Protocol.Version do
  def serialize(version) do
    <<
      version.version::32-little,
      version.services::64-little,
      version.timestamp::64-little,
      Protocol.serialize(%VersionNetAddr{
        services: version.services,
        ip: version.recv_ip,
        port: version.recv_port
      })::binary,
      Protocol.serialize(%VersionNetAddr{
        services: version.services,
        ip: version.from_ip,
        port: version.from_port
      })::binary,
      version.nonce::64-little,
      Protocol.serialize(%VarStr{value: version.user_agent})::binary,
      version.start_height::32-little
    >>
  end
end

We’re using the little binary modifier to specify which fields should be encoded in little endian format, as specified by the Bitcoin protocol specifications.

We’re also including the serialized binaries of sub-types, such as the “version network addresses” (VersionNetAddr), and the “variable length string” (VarStr). Those structures are implemented similarly to our Version struct and corresponding serialize/1 implementation.

Composing our Message

Now that we’ve fleshed out our Version module and its corresponding serialize/1 implementation, we can generate a serialized version binary:


Protocol.serialize(%Version{
  version: 31900,
  services: 1,
  ...
})

However, this isn’t quite ready to send across the network to our peer node. First, we need to wrap the serialized binary with some additional metadata about the message being sent.

Let’s create a new struct called BitcoinNetwork.Protocol.Message to represent this metadata:


defmodule BitcoinNetwork.Protocol.Message do
  defstruct magic: nil, command: nil, size: nil, checksum: nil, payload: nil
end

Once again we’ll define an implementation of our serialize/1 function that transforms this struct into a properly encoded binary:


def serialize(%Message{command: command, payload: payload}) do
  <<
    Application.get_env(:bitcoin_network, :magic)::binary,
    String.pad_trailing(command, 12, <<0>>)::binary,
    byte_size(payload)::32-little,
    :binary.encode_unsigned(Message.checksum(payload))::binary,
    payload::binary
  >>
end

The magic field is a four-byte binary used to delineate packets send through the Bitcoin network. Since we’re connecting to a testnet Bitcoin node, we’ll need to use the byte sequence specific to the testnet:


config :bitcoin_network, magic: <<0x0B, 0x11, 0x09, 0x07>>

The Message.checksum/1 helper function double-hashes its input and returns the first four bytes of the resulting hash:


def checksum(payload) do
  <<checksum::32, _::binary>> =
    payload
    |> hash(:sha256)
    |> hash(:sha256)

  checksum
end

With the Message module’s implementation of serialize/1, we have everything we need to construct a network-ready version message:


Protocol.serialize(%Message{
  command: "version",
  payload: Protocol.serialize(%Version{
    version: 31900,
    services: 1,
    ...
  })
})

This is fine, but there’s some repetition here we can do without. Let’s write a serialize/2 function on the Message module that simplifies things a bit:


def serialize(command, payload) do
  Protocol.serialize(%Message{
    command: command,
    payload: Protocol.serialize(payload)
  })
end

This is where our Protocol shines. Because we’ve defined an implementation of serialize/1 for every type of structure we’d send across the network, we can serialize it by passing it to Protocol.serialize/1 without needing to know its type.

Let’s use our simpler interface to generate our network-ready, serialized version message:


Message.serialize("version", %Version{
  version: 31900,
  services: 1,
  ...
})

Beautiful.

Sending Version

Now that we’ve constructed the version message we want to send to our peer node, all that’s left to do is send it! In our init/1 callback in Node, we’ll construct our serialized message binary and send it to our peer node with a call to a send_message/1 helper function:


:ok =
  Message.serialize("version", %Version{
    version: 31900,
    services: 1,
    timestamp: :os.system_time(:seconds),
    recv_ip: ip,
    recv_port: port,
    from_ip: <<>>,
    from_port: 0,
    nonce: :binary.decode_unsigned(:crypto.strong_rand_bytes(8)),
    user_agent: "Elixir rules!",
    start_height: 1
  })
  |> send_message(socket)

The send_message/2 function simply prints the message being sent and uses :gen_tcp.send/2 to send it to our peer node through the provided socket:


defp send_message(message, socket) do
  print_message(message, [:bright, :yellow])
  :gen_tcp.send(socket, message)
end

Lastly, print_message/2 uses our previously implemented Hexdump module to print a hex dump of the message being sent:


defp print_message(data, colors) do
  output =
    data
    |> Hexdump.to_string()

  (colors ++ ("\n" <> output <> "\n"))
  |> IO.ANSI.format()
  |> IO.puts()

  data
end

Now if we start up our application, our Node GenServer will connect to our peer Bitcoin node and send our version message.

Inspecting our sent version message.

Processing Responses

If everything goes well, once our peer node receives our initial version message, it will send a message in response. In order to receive these responses from our peer node, we’ll have to implement a handle_info/2 callback on our Node GenServer that listens for :tcp messages and processes the incoming binary.

Unfortunately, processing incoming TCP messages is sightly more complicated that it seems. Because TCP is a streaming protocol, no guarantees are made that a single message will be delivered per packet, or that an entire message will be delivered in a single packet. Multiple messages may arrive smooshed into a single packet, a single message may be spread across multiple packets, or any combination of the two might occur.

To accommodate this, we’ll need to build up a buffer of every packet we receive from our peer node and parse full messages out of this running buffer as they’re completed.

We’ll use a field called rest in our GenServer’s state to represent our message buffer. We’ll initially set this to "" in our init/1 callback:


{:ok,
 state
 |> Map.put_new(:socket, socket)
 |> Map.put_new(:rest, "")}

Next, we’ll add a handle_info/2 callback that processes incoming packets sent by our peer Bitcoin node:


def handle_info({:tcp, _port, data}, state = %{rest: rest}) do
end

To process incoming messages, we first prepend rest to the data received from our peer node and pass this binary data into a function called chunk/2:


{messages, rest} = chunk(rest <> data)

Our chunk/2 helper function recursively parses fully-formed messages from our buffer and returns those messages along with the remaining, unparsed binary.


defp chunk(binary, messages \\ []) do
  case Message.parse(binary) do
    {:ok, message, rest} ->
      chunk(rest, messages ++ [message])

    nil ->
      {messages, binary}
  end
end

The Message.parse/1 function is essentially the opposite of the Message module’s implementation of serialize/1. Given a properly formed binary, parse/1 returns a fully populated Message struct, along with any unparsed trailing binary data.


def parse(binary) do
  with <<
         magic::32-little,
         command::binary-size(12),
         size::32-little,
         checksum::32-big,
         payload::binary-size(size),
         rest::binary
       >> <- binary do
    {:ok,
     %Message{
       magic: magic,
       command:
         command
         |> :binary.bin_to_list()
         |> Enum.reject(&(&1 == 0))
         |> :binary.list_to_bin(),
       size: size,
       checksum: checksum,
       payload: parse_payload(command, payload)
     }, rest}
  else
    _ -> nil
  end
end

If the message’s binary is incomplete, Message.parse/1 will return nil, and our chunk/1 function will return the accumulated list of messages.

Once we have a list of received Message structs, we’ll filter our any that have incorrect checksums using a Message.verify_checksum/1 helper, and asynchronously cast the rest to ourselves for further processing:


messages
|> Enum.filter(&Message.verify_checksum/1)
|> Enum.map(&GenServer.cast(self(), &1))

Finally, we’ll return from our handle_info/2 callback, updating rest with the unused trailing binary data we’ve accumulated:


{:noreply, %{state | rest: rest}}

Receiving Verack

Now that we’ve got a system in place to receive and parse messages sent by our peer node, we’re in a good position to handle the reply to our initial version message.

Whenever a Bitcoin node receives a version message, it’ll send a “verack” message in response. When our Node processes this response, it’ll parse the verack message into a Message struct with a command value of "verack", and asynchronously cast that Message to itself.

We’ll need to set up a handle_cast/2 callback to listen for it:


def handle_cast(%Message{command: "verack"}, state) do
  [:bright, "Got ", :green, "verack", :reset, :bright, "."]
  |> log()

  {:noreply, state}
end

When we receive our "verack" message, we’ll simply log that it was received.

If we spin up our application, we should see our node connect to its peer node, send a version message, and receive a verack response.

Our verack response.

Receiving Version

In the same way that we sent a version message to our peer node, our peer node will send a version message to our node. If we want to confirm our connection, we’ll have to send back a verack message as a reply.

Just like our version message, we can listen for a verack message by adding a new handle_cast/2 callback and pattern matching on commnand:


def handle_cast(%Message{command: "version"}, state) do
  :ok =
    Message.serialize("verack")
    |> send_message(state.socket)

  :ok =
    Message.serialize("getaddr")
    |> send_message(state.socket)

  {:noreply, state}
end

This time, rather than just logging that we received the message, we’ll two two things. First, we’ll construct and send a verack message. Next, we’ll send a followup “getaddr” message that instructs our peer node to send us all of the peers it’s connected to.

Neither the verack message nor the getaddr message contains any actual content, so our call to Message.serialize/1 will simply set the command on the message and send an empty payload.

Receiving Addresses

Finally, once our peer node receives our request for addresses it’ll start sending back collection of “network address” blocks wrapped in an “addr” message.

Once again, we’ll add a handle_cast/2 callback to listen for this message. This time, we’ll pull the binary payload our of the parsed message struct:


def handle_cast(%Message{command: "addr", payload: payload}, state) do
end

We know that the payload represents an addr message. We’ll need to write a parse/1 method in our BitcoinNetwork.Protocol.Addr module to transform this binary into an Addr struct:


def parse(binary) do
  with {:ok, %VarInt{value: count}, rest} <- VarInt.parse(binary) do
    {:ok,
     %Addr{
       count: count,
       addr_list:
         for <<binary::binary-size(30) <- rest>> do
           {:ok, net_addr, _rest} = NetAddr.parse(binary)
           net_addr
         end
     }}
  else
    _ -> nil
  end
end

We first parse the count “variable integer” out of the message, and then use a binary list comprehension to transform every thirty bytes of the remaining binary into network address structs. Once we’re finished, we return our final Addr struct, fully populated with count and addr_list.

Now that we can parse the addr message sent by our peer node, we can fulfill our second goal. Back in our handle_cast/2 callback we’ll print how many peer nodes we received from our current peer node:


{:ok, addr} = Addr.parse(payload)

[:bright, "Received ", :green, "#{length(addr.addr_list)}", :reset, :bright, " peers."]
|> log()

{:noreply, state}

If we fire up our application, both nodes will exchange version/verack handshakes, our node will ask for a list of peer addresses with a getaddr message, and our peer node will respond with up to one thousand known peers.

Receiving peer addresses.

Awesome!

Staying Alive

While we’ve accomplished our second goal, we still haven’t completely satisfied our first. In order for our node to stay connected to our peer node, we need to respond to “ping” messages to prove that we’re still alive and able to receive messages.

Let’s add one last handle_cast/2 callback to listen for pings:


  def handle_cast(%Message{command: "ping"}, state) do
    [:bright, "Got ", :green, "ping", :reset, :bright, ", sending ", :yellow, "pong", :reset, :bright, "."]
    |> log()

    :ok =
      Message.serialize("pong")
      |> send_message(state.socket)

    {:noreply, state}
  end

This should be familiar by now. When we receive a ping message from our peer, we log that we received the message and send an empty “pong” message as our response.

That’s all there is to it.

Our “Bitcoin node” should now stay connected to our peer node indefinitely, responding to occasional pings and receiving occasional collections of peer addresses.

Final Thoughts and Future Work

This was a behemoth of a post, but I hope the ideas and techniques are simple enough to follow along with. I tried to break things down by splitting related topics into prior articles, but there’s still quite a bit of ground to cover to get a node up and running.

I’ve got several follow-up articles planned for the near future, including extending our node to map the Bitcoin network by recursively connecting to every node it discovers, and more distant plans of building a fully operational Bitcoin node that verifies incoming blocks and potentially mines for new blocks.

If you haven’t guessed yet, I’m incredibly excited about this project. If you’re interested in exploring Bitcoin through Elixir, let me know!