I’ve never had much of a reason to worry about the “endianness” of my binary data when working on Elixir projects. For the most part, everything within an application will be internally consistent, and everything pulled in from external sources will be converted to the machine’s native ordering several layers of abstraction below where I tend to work.
That blissful ignorance came to an end when I found myself using Elixir to construct packets conforming to the Bitcoin peer-to-peer network protocol.
The Bitcoin Protocol
The Bitcoin protocol is a TCP-based protocol used by Bitcoin nodes to communicate over a peer-to-peer ad hoc network.
The real-world specifications of the protocol are defined to be “whatever the reference client does,” but this can be difficult to tease out from the code. Thankfully, the Bitcoin wiki maintains a fantastic technical description of the protocol.
The structures used throughout the protocol are a mishmash of endianness. As the wiki explains, “almost all integers are encoded in little endian,” but many other fields like checksums, strings, network addresses, and ports are expected to be big endian.
net_addr structure is an excellent example of this endianness confusion. Both
services are expected to be little endian encoded, but the
port fields are expected to be big endian encoded.
How will we build this with Elixir?
My first attempt at constructing this
net_addr binary structure was to create a
net_addr function that accepts
port arguments and returns a binary of the final structure in correct mixed-endian order.
def net_addr(time, services, ip, port) do end
When manually constructing binaries, Elixir defaults to a big endian byte order. This means that I’d need to convert
services into little endian byte order before adding them to the final binary.
My first attempt at endian conversion was to create a
reverse/1 helper function that would take a binary, transform it into a list of bytes using
:binary.bin_to_list, reverse that list of bytes, transform it back into a binary using
:binary.list_to_bin, and return the result:
def reverse(binary) do binary |> :binary.bin_to_list |> Enum.reverse |> :binary.list_to_bin end
Before I could pass
reverse/1, I needed to transform them into binaries first. Thankfully, this is easy with Elixir’s binary special form.
For example, we can convert
time into a four byte (
32 bit) big endian binary and then reverse it to create its corresponding little endian representation:
Using our helper, we can create out final
<< <<time::32>> |> reverse::binary, <<services::64>> |> reverse::binary, :binary.decode_unsigned(ip)::128, port::16 >>
This works, but there’s some room for improvement.
A Faster Second Attempt
After doing some research, I discovered this set of benchmarks for several different techniques of reversing a binary in Elixir (thanks Evadne Wu!).
I realized that I could significantly improve the performance of my packet construction process by replacing my slow list-based solution with a solution that leverages the optional
Endianness argument of
def reverse(binary) do binary |> :binary.decode_unsigned(:little) |> :binary.encode_unsigned(:big) end
While this was an improvement, I still wasn’t happy with my solution. Using my
reverse/1 function meant that I had to transform my numbers into a binary before reversing them and ultimately concatenating them into the final binary. This nested binary structure was awkward and confusing.
After asking for guidance on Twitter, the ElixirLang account reached out with some sage advice:
Using Big and Little Modifiers
little modifiers are binary special form modifiers, much like the
binary types. They can be used to specify the resulting endianness when coercing an
utf32 value into a binary.
For example, we can replace our calls reversing the
services binaries in our final binary concatenation by simply appending
big to the final size of each:
<< time::32-little, services::64-little, :binary.decode_unsigned(ip)::128, port::16 >>
Awesome! That’s much easier to understand.
While Elixir defaults to a big endian format for manually constructed binaries, it doesn’t hurt to be explicit. We know that our
port should be big endian encoded, so let’s mark them that way:
<< time::32-little, services::64-little, :binary.decode_unsigned(ip)::128-big, port::16-big >>
I’m continually amazed by the quantity, diversity, and quality of the tooling that ships out of the box with Elixir and Erlang. Even when it comes to something as niche as low-level binary manipulation, Elixir’s tools are top notch.
If you want to see complete examples of the endian conversion code shown in this article, check out the
BitcoinNetwork.Protocol.NetAddr module in my new
bitcoin_network project on Github.