Generating Test Fixtures with Wireshark

Written by Pete Corey on Jun 11, 2018.

My in-progress Elixir-based Bitcoin node is woefully lacking on the test front. This is especially problematic considering how finicky the Bitcoin protocol parsing and serialization process can be.

But how can we test this functionality without going through the mind-numbing process of manually constructing each packet under test and asserting that it parses and serializes as expected?

Thankfully, Wireshark’s support of the Bitcoin protocol turns this into a simple task. Let’s dig into how we can use Wireshark to generate binary fixtures for each of our Bitcoin packets under test, and explore how we can test against them using Elixir.

Generating Our Fixtures

Wireshark supports the Bitcoin protocol out of the box. That makes the process of generating test fixtures incredibly simple. To create a binary fixture for a given Bitcoin packet, we just need to follow these three steps:

Step one: Fire up Wireshark, start capturing on your network interface, and set bitcoin as your display filter:

Filtering for bitcoin packets.

Step two: Start bitcoind, and watch the packets roll in:

Bitcoin packets on the wire.

Step three: Notice that Wireshark teases out the Bitcoin-specific portion of every matching TCP packet it receives. Each packet can be exported by right clicking on the “Bitcoin protocol” breakdown, and choosing “Export Packet Bytes.”

High level packet information.

The bytes we’re exporting represent the entire packet, as it comes in over the wire.

Parsing Our Fixtures

Now that we’ve saved a handful of packets we’d like to test against, we can start the process of incorporating them into our test suite.

Let’s assume that we’ve saved all of our exported packets into a test/fixtures folder within our project. Let’s also assume that we want to start by testing our “version” packet (the most interesting packet we’re able to parse, so far).

Let’s make a new VersionTest test module and lay down some boilerplate:


defmodule BitcoinNetwork.Protocol.VersionTest do
  use ExUnit.Case

  alias BitcoinNetwork.Protocol
  alias BitcoinNetwork.Protocol.{Message, Version}
end

Next, we’ll add our test:


test "parses a version payload" do
end

The first thing we’ll need to do is load the data from our exported version packet binary:


assert {:ok, packet} = File.read("test/fixtures/version.bin")

We use Elixir’s File.read/1 to read the contents of our version.bin file, and assert that we’ll receive an :ok tuple containing the binary contents of our file in our new packet assignment.

Next, we’ll parse the binary, just like we do within our Node with a call to Message.parse/1:


assert {:ok, message, <<>>} = Message.parse(packet)

Once again, we assert that we’ll receive an :ok tuple with our resulting message. Because the data we exported from Wireshark relates specifically to our version packet, we expect the list of remaining, unparsed binary data to be empty (<<>>).

Now that we’ve parsed the message, we can compare the resulting Version struct found in message.parsed_payload with a pre-defined, expected version struct and assert that they’re equal:


assert message.parsed_payload == version

But where does version come from? How can we know the contents of our version.bin packet without manually parsing it ourselves, byte by byte?

Interpreting Our Fixtures

Once again, Wireshark comes to the rescue. In addition to letting us export our Bitcoin packets as raw binaries, Wireshark also lets us inspect the parsed contents of each of our Bitcoin packets.

If we go back to our version packet in our Wireshark capture file, we can open up the “Bitcoin protocol” section and see a complete breakdown of not only the high level message metadata, but also the specific information sent along in the version message:

Filtering for bitcoin packets.

We can use this information to construct our pre-defined version struct at the top of our test:


version = %Version{
  version: 70015,
  services: 13,
  timestamp: 1_528_146_756,
  recv_ip: <<0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 160, 16, 233, 215>>,
  recv_port: 18333,
  recv_services: 9,
  from_ip: <<0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0>>,
  from_port: 0,
  from_services: 13,
  nonce: 15_116_783_876_185_394_608,
  user_agent: "/Satoshi:0.14.2/",
  start_height: 1_322_730
}

And with that, we have a solid test of our version parsing functionality.

Testing Serialization

We can test the serialization of our version packet much like we tested the parsing functionality.

Let’s start off by adding a new test to our VersionTest module:


test "serializes a version struct" do
end

Once again, we’ll start off by using File.read/1 to load our binary fixture, and using Message.parse/1 to parse the resulting binary:


assert {:ok, packet} = File.read("test/fixtures/version.bin")
assert {:ok, message, <<>>} = Message.parse(packet)

Rather than comparing the message.parsed_payload to some pre-defined Version struct, we’ll instead serialize it with a call to Protocol.serialize/1 and compare the newly serialized version against the message’s payload binary:


assert Protocol.serialize(message.parsed_payload) == message.payload

And that’s it!

If our version serialization code is working correctly, it should return a binary identical to the version portion of the packet exported from Wireshark.

Final Thoughts

I’d like to give a huge shout out to Lucid Simple’s article on “Binary Fixtures with Wireshark”. It was a huge inspiration for me and a very well written article. I highly recommend you check it out if you’d like a more in-depth exploration of using Wireshark-generated binary fixtures.

For what it’s worth, this kind of testing has already resulted in a positive return on investment. Shortly after implementing these tests, I noticed that my version struct was incorrectly serializing messages, resulting in some strange behavior I’d been noticing with my node. Using the tests as a guide, I was able to quickly fix my implementation.

Three cheers for testing!

Be Careful Using With in Tests

Written by Pete Corey on Jun 4, 2018.

Last week I struck a chord in the Elixir community when I tweeted about a trap I fell into while writing a seemingly simple test using Elixir’s with special form. Based on the reaction to that tweet, I thought it’d be a good idea to explore where I went wrong and how I could have prevented it.

The Test

The test in question was fairly simple. Let’s imagine it looked something like this:


test "foo equals bar" do
  with {:ok, foo} <- do_foo(),
       {:ok, bar} <- do_bar() do
    assert foo == bar
  end
end

We’re using with to destructure the results of our calls to do_foo/0 and do_bar/0 function calls. Next, we’re asserting that foo should equal bar.

If do_foo/0 or do_bar/0 return anything other than an :ok tuple, we’d expect our pattern match to fail, causing our test to fail. On running our test, we see that it passes. Our do_foo/0 and do_bar/0 functions must be working as expected!

The False Positive

Unfortunately, we’re operating under a faulty assumption. In reality, our do_foo/0 and do_bar/1 functions actually look like this:


def do_foo, do: {:ok, 1}
def do_bar, do: {:error, :asdf}

Our do_bar/0 is returning an :error tuple, not the :ok tuple our test is expecting, but our test is still passing. What’s going on here?

It’s easy to forget (at least for me, apparently) that when a with expression fails a pattern match, it doesn’t throw an error. Instead, it immediately returns the unmatched value. So in our test, our with expression is returning the unmatched {:error, :asdf} tuple without ever executing its do block and skipping our assertion entirely.

Because our assertion is never given a chance to fail, our test passes!

The Fix

The fix for this broken test is simple once we recognize what the problem is. We’re expecting our assignments to throw errors if they fail to match. One surefire way to accomplish that is to use assignments rather than a with expression.


test "foo equals bar" do
  {:ok, foo} = do_foo()
  {:ok, bar} = do_bar()
  assert foo == bar
end

Now, the :error tuple returned by our do_bar/0 function will fail to match with our :ok tuple, and the test will fail. Not only that, but we’ve also managed to simplify our test in the process of fixing it.

Success!

The Better Fix

After posting the above fix in response to my original tweet, Michał Muskała replied with a fantastic tip to improve the error messaging of the failing test.

Michał's pro tip.

Currently, our test failure looks like this:


** (MatchError) no match of right hand side value: {:error, :asdf}
code: {:ok, bar} = do_bar()

If we add assertions to our pattern matching assignments, we set ourselves up to receive better error messages:


test "foo still equals bar" do
  assert {:ok, foo} = do_foo()
  assert {:ok, bar} = do_bar()
  assert foo == bar
end

Now our failing test reads like this:


match (=) failed
code:  assert {:ok, bar} = do_bar()
right: {:error, :asdf}

While we’re still given all of the same information about the failure, it’s presented in a way that’s easier to read and internalize, leading to a quicker understanding of how and why our test is failing.

I’ll be sure to incorporate that tip into my tests from now on. Thanks Michał!

Modeling Formulas with Recursive Discriminators

Written by Pete Corey on May 28, 2018.

I recently ran into an issue while trying to represent a nested, discriminator-based schema using Mongoose in a Node.js client project. The goal was to represent a logical formula by creating a hierarchy of “reducers” (&&, ||, etc…) that would reduce a series of nested “checks” down into a single value.

Let’s make that a little more relatable with an example. Imagine what we’re trying to represent the following formula:


x == 100 || (x <= 10 && x >= 0)

If we wanted to store this in MongoDB, we’d have to represent that somehow as a JSON object. Let’s take a stab at that:


{
  type: "reducer",
  reducer: "||",
  checks: [
    {
      type: "check",
      field: "x",
      comparator: "==",
      value: 100
    },
    {
      type: "reducer",
      reducer: "&&",
      checks: [
        {
          type: "check",
          field: "x",
          comparator: "<=",
          value: 10
        },
        {
          type: "check",
          field: "x",
          comparator: ">=",
          value: 0
        }
      ]
    }
  ]
}

What a behemoth!

While the JSON representation is ridiculously more verbose than our mathematical representation, it gives us everything we need to recreate our formula, and lets us store that formula in our database. This is exactly what we want.


The trouble comes when we try to represent this schema with Mongoose.

We can break our entire JSON representation into two distinct “types”. We have a “check” type that has field, comparator, and value fields, and a “reducer” type that has a reducer field, and a checks field that contains a list of either “check” or “reducer” objects.

Historically, Mongoose had trouble with a field in a document adhering to either one schema or another. That all changed with the introduction of “discriminators”, and later, “embedded discriminators”. Embedded discriminators let us say that an element of an array adheres to one of multiple schemas defined with different discriminators.

Again, let’s make that more clear with an example. If we wanted to store our formula within a document, we’d start by defining the schema for that wrapping “base” document:


const baseSchema = new Schema({
  name: String,
  formula: checkSchema
});

The formula field will hold our formula. We can define the shell of our checkSchema like so:


const checkSchema = new Schema(
  {},
  {
    discriminatorKey: "type",
    _id: false
  }
);

Here’s we’re setting the discriminatorKey to "type", which means that Mongoose will look at the value of "type" to determine what kind of schema the rest of this subdocument should adhere to.

Next, we have to define each type of our formula. Our "reducer" has a reducer field and a formula field:


baseSchema.path("formula").discriminator("reducer", new Schema(
  {
    reducer: {
      type: String,
      enum: ['&&', '||']
    },
    checks: [checkSchema]
  },
  { _id: false }
));

Similarly, our "check" type has its own unique set of fields:


baseSchema.path("formula").discriminator("check", new Schema(
  {
    field: String,
    comparator: {
      type: String,
      enum: ['&&', '||']
    },
    value: Number
  },
  { _id: false }
));

Unfortunately, this only works for the first level of our formula. Trying to define a top-level "reducer" or "check" works great, but trying to put a "reducer" or a "check" within a "reducer" fails. Those nested objects are stripped from our final object.


The problem is that we’re defining our discriminators based off of a path originating from the baseSchema:


baseSchema.path("formula").discriminator(...);

Our nested "reducer" subdocuments don’t have any discriminators attached to their checks. To fix this, we’d need to create two new functions that recursively builds each layer of our discriminator stack.

We’ll start with a buildCheckSchema function that simply returns a new schema for our "check"-type subdocuments. This schema doesn’t have any children, so it doesn’t need to define any new discriminators:


const buildCheckSchema = () =>
  new Schema({
    field: String,
    comparator: {
      type: String,
      enum: ['&&', '||']
    },
    value: Number
  }, { _id: false });

Our buildReducerSchema function needs to be a little more sophisticated. First, it needs to create the "reducer"-type sub-schema. Next, it needs to attach "reducer" and "check" discriminators to the checks field of that new schema with recursive calls to buildCheckSchema and buildReducerSchema:


const buildReducerSchema = () => {
    let reducerSchema = new Schema(
        {
            reducer: {
                type: String,
                enum: ['&&', '||']
            },
            checks: [checkSchema]
        },
        { _id: false }
    );
    reducerSchema.path('checks').discriminator('reducer', buildReducerSchema());
    reducerSchema.path('checks').discriminator('check', buildCheckSchema());
    return reducerSchema;
};

While this works in concept, it blows up in practice. Mongoose’s discriminator function greedily consumes the schemas passed into it, which creates an infinite recursive loop that blows the top off of our stack.


The solution I landed on with this problem is to limit the number of recursive calls we can make to buildReducerSchema to some maximum value. We can add this limit by passing an optional n argument to buildReducerSchema that defaults to 0. Every time we call buildReducerSchema from within buildReducerSchema, we’ll pass it an incremented value of n:


reducerSchema.path('checks').discriminator('reducer', buildReducerSchema(n + 1));

Next, we’ll use the value of n to enforce our maximum recursion limit:


const buildReducerSchema = (n = 0) => {
  if (n > 100) {
    return buildCheckSchema();
  }
  ...
};

If we reach one hundred recursions, we simply force the next layer to be a "check"-type schema, gracefully terminating the schema stack.

To finish things off, we need to pass our baseSchema these recursively constructed discriminators (without an initial value of n):


baseSchema.path("checks").discriminator("reducer", buildReducerSchema());
baseSchema.path("checks").discriminator("check", buildCheckSchema());

And that’s it!

Against all odds we managed to build a nested, discriminator-based schema that can fully represent any formula we throw at it, up to a depth of one hundred reducers deep. At the end of the day, I’m happy with that solution.