Written by Pete Corey on Mar 19, 2018.

I’ve never had much of a reason to worry about the “endianness” of my binary data when working on Elixir projects. For the most part, everything within an application will be internally consistent, and everything pulled in from external sources will be converted to the machine’s native ordering several layers of abstraction below where I tend to work.

That blissful ignorance came to an end when I found myself using Elixir to construct packets conforming to the Bitcoin peer-to-peer network protocol.

The Bitcoin Protocol

The Bitcoin protocol is a TCP-based protocol used by Bitcoin nodes to communicate over a peer-to-peer ad hoc network.

The real-world specifications of the protocol are defined to be “whatever the reference client does,” but this can be difficult to tease out from the code. Thankfully, the Bitcoin wiki maintains a fantastic technical description of the protocol.

The structures used throughout the protocol are a mishmash of endianness. As the wiki explains, “almost all integers are encoded in little endian,” but many other fields like checksums, strings, network addresses, and ports are expected to be big endian.

The net_addr structure is an excellent example of this endianness confusion. Both time and services are expected to be little endian encoded, but the IPv6/4 and port fields are expected to be big endian encoded.

How will we build this with Elixir?

First Attempt

My first attempt at constructing this net_addr binary structure was to create a net_addr function that accepts time, services, ip, and port arguments and returns a binary of the final structure in correct mixed-endian order.


def net_addr(time, services, ip, port) do
end

When manually constructing binaries, Elixir defaults to a big endian byte order. This means that I’d need to convert time and services into little endian byte order before adding them to the final binary.

My first attempt at endian conversion was to create a reverse/1 helper function that would take a binary, transform it into a list of bytes using :binary.bin_to_list, reverse that list of bytes, transform it back into a binary using :binary.list_to_bin, and return the result:


def reverse(binary) do
  binary
  |> :binary.bin_to_list
  |> Enum.reverse
  |> :binary.list_to_bin
end

Before I could pass time and services into reverse/1, I needed to transform them into binaries first. Thankfully, this is easy with Elixir’s binary special form.

For example, we can convert time into a four byte (32 bit) big endian binary and then reverse it to create its corresponding little endian representation:


reverse(<<time::32>>)

Using our helper, we can create out final net_addr binary:


<<
  <<time::32>> |> reverse::binary,
  <<services::64>> |> reverse::binary,
  :binary.decode_unsigned(ip)::128,
  port::16
>>

This works, but there’s some room for improvement.

A Faster Second Attempt

After doing some research, I discovered this set of benchmarks for several different techniques of reversing a binary in Elixir (thanks Evadne Wu!).

I realized that I could significantly improve the performance of my packet construction process by replacing my slow list-based solution with a solution that leverages the optional Endianness argument of :binary.decode_unsigned/2 and :binary.encode_unsigned/2:


def reverse(binary) do
  binary
  |> :binary.decode_unsigned(:little)
  |> :binary.encode_unsigned(:big)
end

While this was an improvement, I still wasn’t happy with my solution. Using my reverse/1 function meant that I had to transform my numbers into a binary before reversing them and ultimately concatenating them into the final binary. This nested binary structure was awkward and confusing.

After asking for guidance on Twitter, the ElixirLang account reached out with some sage advice:

Using Big and Little Modifiers

The big and little modifiers are binary special form modifiers, much like the bitstring and binary types. They can be used to specify the resulting endianness when coercing an integer, float, utf16 or utf32 value into a binary.

For example, we can replace our calls reversing the time and services binaries in our final binary concatenation by simply appending big to the final size of each:


<<
  time::32-little,
  services::64-little,
  :binary.decode_unsigned(ip)::128,
  port::16
>>

Awesome! That’s much easier to understand.

While Elixir defaults to a big endian format for manually constructed binaries, it doesn’t hurt to be explicit. We know that our ip and port should be big endian encoded, so let’s mark them that way:


<<
  time::32-little,
  services::64-little,
  :binary.decode_unsigned(ip)::128-big,
  port::16-big
>>

Beautiful.

Final Thoughts

I’m continually amazed by the quantity, diversity, and quality of the tooling that ships out of the box with Elixir and Erlang. Even when it comes to something as niche as low-level binary manipulation, Elixir’s tools are top notch.

If you want to see complete examples of the endian conversion code shown in this article, check out the BitcoinNetwork.Protocol.NetAddr module in my new bitcoin_network project on Github.