Hex Dumping with Elixir

Written by Pete Corey on Apr 9, 2018.

I recently written about my explorations into the Bitcoin peer-to-peer network from the context of an Elixir application. These explorations have caused me to get up-close and personal with binary data as I serialize and parse packets at the byte level.

Being a visual person, I wanted some way of inspecting the binaries I was constructing and sending. I decided that a hex dump would be the best way to visualize these packets.

What followed was an odyssey of finding and implementing a safe and fast process for printing hex dumps of arbitrary, untrusted binary data from within an Elixir application.

Calling Out to the System

My first instinct for implementing a hex dump method within my application was to not implement it at all! It made more sense to leverage the existing hexdump command line utility living on my system.

Specially, I wanted to render my packets with the hexdump -C command. The -C flag includes an ASCII rendering of the bytes being dumped:

0000   F9 BE B4 D9 76 65 72 73  69 6F 6E 00 00 00 00 00   ....version.....
0010   55 00 00 00 9C 7C 00 00  01 00 00 00 00 00 00 00   U....|..........
0020   E6 15 10 4D 00 00 00 00  01 00 00 00 00            ...M.........

Unfortunately, calling hexdump from within an Elixir application proved to be more challenging than I first expected.

When trying to call system commands, I reflexively reach for Elixir’s System.cmd/3 function. Unfortunately, hexdump relies on the input bytes through either stdin, or through a file. Because System.cmd/3 only lets you pass arguments to your system command, not data through stdin, we can’t use it to build our hex dumps.

Another option would be to write our packets to a temporary file, and use System.cmd/3 to instruct hexdump to load and dump the bytes in that file. Relying on temporary files seems like a poor choice for a logging utility that would be called hundreds to thousands of times per minute.

While System.cmd/3 won’t work, maybe we can use an Elixir Port directly. Unfortunately, while we can pipe our binary data directly to hexdump using a port, there is no way to signal the end of our data by sending the expected EOF (^D) signal. Without signaling the end of our byte stream, closing the port will leave us without any data to show for our work.

Here Be Dragons

Our third option for solving this problem is to dig deeper into our tool belt and pull out the big guns. Both System.cmd/3 and Port have limitations in this context, but Erlang’s :os.cmd/1 gives us exactly what we need:


output =
  ('echo "' ++ :binary.bin_to_list(data) ++ '" | hexdump -C')
  |> :os.cmd()

We can use :os.cmd/1 to evaulate any shell command, including compositions of commands strung together with pipes and redirections.

In this case, our command uses echo to pipe our binary into hexdump -C. The :os.cmd/1 function expects our shell command to be in the form of a character list, so we use Erlang’s :binary.bin_to_list/1 to inject our binary data directly into our echo argument.

However, this solution has major security issues.

Depending on the source of our data binary, we’re potentially giving outside sources free reign to run any shell command on our machine. Considering that this hex dump is intended to log packets received from external sources on the Bitcoin peer-to-peer network, this is a catastrophically bad idea.

We need to find another solution.

Going the Safe Route

Ultimately, I decided that the safest and fastest solution to my problem was to simply build my own hex dump utility in pure Elixir.

The general idea behind the hexdump tool is simple. For every line, display the current byte count in hex, followed by two chunks of eight bytes rendered in hex, followed by all sixteen bytes rendered together as ASCII characters.

This is a great chance to flex our Elixir muscles. Let’s implement this in a to_string/1 function within a new Hexdump module:


defmodule Hexdump do
  def to_string(data) when is_binary(data) do
    # TODO: Implement `to_string`...
  end
  def to_string(data), do: Kernel.inspect(data)
end

We only want to run our hex dump algorithm on binary data, so we’ll guard our first function head with an is_binary guard. If data isn’t binary, we’ll simply return the result of Kernel.inspect/2.

In order to work more easily with our data binary, let’s convert it into a list of bytes and chunk it into our lines of sixteen bytes:


data
|> :binary.bin_to_list()
|> Enum.chunk_every(16)

We want to divide each of our lines of sixteen bytes into two groups of eight (or fewer) bytes. If we don’t have enough bytes to create our second group, we’ll append an empty list to fill its place:


|> Enum.map(&Enum.chunk_every(&1, 8))
|> Enum.map(fn
  [a] -> [a, []]
  [a, b] -> [a, b]
end)

Now we’re left with a list of lines. Within each line is a list of two eight byte groupings.

We’ll use the index of the outer list to calculate and render how many bytes we’ve dumped so far. We’ll need to attach that to our list with Enum.with_index/2:


|> Enum.with_index()

Finally, we’ll map our lines over a function that transforms each line tuple into a string, and we’ll join the resulting list of strings with newlines:


|> Enum.map(&line_to_string/1)
|> Enum.join("\n")

Our line_to_string/1 function is a helper that accepts a tuple of eight byte parts and the current line index, and returns the string representation of the current line:


def line_to_string({parts, index}) do
end

The first job of line_to_string/1 is to build the current byte count in hex. The byte count needs to be padded to at least eight characters:


count =
  index
  |> Kernel.*(16)
  |> :binary.encode_unsigned()
  |> Base.encode16(case: :lower)
  |> String.pad_leading(8, "0")

Next, we map over each eight byte part of our line. We render each byte in each part into hex by converting it into a binary using Erlang’s :binary.encode_unsigned/1 and rendering it into base sixteen with Elixir’s Base.encode16/2. Next, can join the characters in each of our parts with spaces and pad the result to twenty three characters using String.pad_trailing/3:


bytes =
  parts
  |> Enum.map(fn bytes ->
    bytes
    |> Enum.map(fn byte ->
      byte
      |> :binary.encode_unsigned()
      |> Base.encode16(case: :lower)
    end)
    |> Enum.join(" ")
    |> String.pad_trailing(23, " ")
  end)

The ASCII component of each line is rendered in a similar way. Because we don’t want a divider in the middle of our ASCII rendering, we’ll flatten our eight byte parts and map each byte over a function that converts it into a printable string. If the byte falls between 0x20 and 0x7E, we convert it into a string. Otherwise, we return ".".


ascii =
  parts
  |> List.flatten()
  |> Enum.map(fn byte ->
    case byte <= 0x7E && byte >= 0x20 do
      true -> <<byte>>
      false -> "."
    end
  end)
  |> Enum.join("")

Now we can flatten our byte count, each of our two byte parts, and our ASCII representation into a single list and join them together with two characters of whitespace separating each component:


[count, bytes, ascii]
|> List.flatten()
|> Enum.join("  ")

And that’s it!

We can safely use our new Hexdump module to safely and quickly create a hex dump string of any binary packets encountered by our application:


<<
  0xF9, 0xBE, 0xB4, 0xD9, 0x76, 0x65, 0x72, 0x73,
  0x69, 0x6F, 0x6E, 0x00, 0x00, 0x00, 0x00, 0x00,
  0x55, 0x00, 0x00, 0x00, 0x9C, 0x7C, 0x00, 0x00
>>
|> Hexdump.to_string()
|> IO.puts()
0000  F9 BE B4 D9 76 65 72 73  69 6F 6E 00 00 00 00 00  ....version.....
0010  55 00 00 00 9C 7C 00 00                           U....|..

Final Thoughts

What an adventure. The moral of this story is that using the tools and resources available to you is fantastic in the right situations.

That said, it’s important to always be aware of the downsides and potential costs of using existing solutions. Sometimes, rolling your own solution is the right choice. In my case, using the command line version of hexdump would have been horrendously insecure and most likely less performant than implementing my own solution.

If you’re interested in the full source of the Hexdump module we created in this article, check it out on Github.

Shutting Down and Open Sourcing Inject Detect

Written by Pete Corey on Apr 2, 2018.

It’s with a heavy heart that I’m announcing that my security-focused SaaS application, Inject Detect, is shutting down.

The goal of Inject Detect was to fight against NoSQL Injection vulnerabilities in Meteor applications. I still believe that this is a worthy cause, but I don’t think the approach taken by Inject Detect was the right one.

I talked with many customers and their primary concern with Inject Detect was the idea of sending their applications’ queries to a third-party service. No amount of explaining that only the structure of these queries, not the queries themselves were being transmitted could assuage their worries.

It makes me happy to think that my customers’ focus on security dissuaded them from using an application focused on security, and it’s obvious, in hindsight, that this would be an issue.


Inject Detect was the largest Elixir-based project I’d ever worked on at the time it was released, and it was my first real foray into the world Event Sourced systems. I invested nearly six months of my free time and time between client engagements working on Inject Detect, and I don’t want that work to go to waste.

With that in mind, I’ve decided to open source the Inject Detect project on Github. While you’re digging through the code, be sure to check out the InjectDetect.CommandHandler module and the InjectDetect.State module. These two modules are the heart of the system and the driving force behind my implementation of Event Sourcing.

Truth be told, I’m still in love with the concept of Event Sourcing, and I believe that it’s the future of web development. I plan on spending more time in the future diving into that topic.


While I’m shutting down Inject Detect, I’m not giving up the war against NoSQL Injection. Instead, I’m doubling down and focusing my efforts on my newest project, Secure Meteor.

Secure Meteor is an upcoming guide designed to help you secure your Meteor application by teaching you the ins and outs of Meteor security.

If you’re a Meteor application owner, a Meteor developer, or are just interested in Meteor security and NoSQL Injection, I highly recommend you head over to www.securemeteor.com and grab a copy of my Meteor security checklist.

RIP Inject Detect. Long live Secure Meteor!

Building Mixed Endian Binaries with Elixir

Written by Pete Corey on Mar 19, 2018.

I’ve never had much of a reason to worry about the “endianness” of my binary data when working on Elixir projects. For the most part, everything within an application will be internally consistent, and everything pulled in from external sources will be converted to the machine’s native ordering several layers of abstraction below where I tend to work.

That blissful ignorance came to an end when I found myself using Elixir to construct packets conforming to the Bitcoin peer-to-peer network protocol.

The Bitcoin Protocol

The Bitcoin protocol is a TCP-based protocol used by Bitcoin nodes to communicate over a peer-to-peer ad hoc network.

The real-world specifications of the protocol are defined to be “whatever the reference client does,” but this can be difficult to tease out from the code. Thankfully, the Bitcoin wiki maintains a fantastic technical description of the protocol.

The structures used throughout the protocol are a mishmash of endianness. As the wiki explains, “almost all integers are encoded in little endian,” but many other fields like checksums, strings, network addresses, and ports are expected to be big endian.

The net_addr structure is an excellent example of this endianness confusion. Both time and services are expected to be little endian encoded, but the IPv6/4 and port fields are expected to be big endian encoded.

How will we build this with Elixir?

First Attempt

My first attempt at constructing this net_addr binary structure was to create a net_addr function that accepts time, services, ip, and port arguments and returns a binary of the final structure in correct mixed-endian order.


def net_addr(time, services, ip, port) do
end

When manually constructing binaries, Elixir defaults to a big endian byte order. This means that I’d need to convert time and services into little endian byte order before adding them to the final binary.

My first attempt at endian conversion was to create a reverse/1 helper function that would take a binary, transform it into a list of bytes using :binary.bin_to_list, reverse that list of bytes, transform it back into a binary using :binary.list_to_bin, and return the result:


def reverse(binary) do
  binary
  |> :binary.bin_to_list
  |> Enum.reverse
  |> :binary.list_to_bin
end

Before I could pass time and services into reverse/1, I needed to transform them into binaries first. Thankfully, this is easy with Elixir’s binary special form.

For example, we can convert time into a four byte (32 bit) big endian binary and then reverse it to create its corresponding little endian representation:


reverse(<<time::32>>)

Using our helper, we can create out final net_addr binary:


<<
  <<time::32>> |> reverse::binary,
  <<services::64>> |> reverse::binary,
  :binary.decode_unsigned(ip)::128,
  port::16
>>

This works, but there’s some room for improvement.

A Faster Second Attempt

After doing some research, I discovered this set of benchmarks for several different techniques of reversing a binary in Elixir (thanks Evadne Wu!).

I realized that I could significantly improve the performance of my packet construction process by replacing my slow list-based solution with a solution that leverages the optional Endianness argument of :binary.decode_unsigned/2 and :binary.encode_unsigned/2:


def reverse(binary) do
  binary
  |> :binary.decode_unsigned(:little)
  |> :binary.encode_unsigned(:big)
end

While this was an improvement, I still wasn’t happy with my solution. Using my reverse/1 function meant that I had to transform my numbers into a binary before reversing them and ultimately concatenating them into the final binary. This nested binary structure was awkward and confusing.

After asking for guidance on Twitter, the ElixirLang account reached out with some sage advice:

Using Big and Little Modifiers

The big and little modifiers are binary special form modifiers, much like the bitstring and binary types. They can be used to specify the resulting endianness when coercing an integer, float, utf16 or utf32 value into a binary.

For example, we can replace our calls reversing the time and services binaries in our final binary concatenation by simply appending big to the final size of each:


<<
  time::32-little,
  services::64-little,
  :binary.decode_unsigned(ip)::128,
  port::16
>>

Awesome! That’s much easier to understand.

While Elixir defaults to a big endian format for manually constructed binaries, it doesn’t hurt to be explicit. We know that our ip and port should be big endian encoded, so let’s mark them that way:


<<
  time::32-little,
  services::64-little,
  :binary.decode_unsigned(ip)::128-big,
  port::16-big
>>

Beautiful.

Final Thoughts

I’m continually amazed by the quantity, diversity, and quality of the tooling that ships out of the box with Elixir and Erlang. Even when it comes to something as niche as low-level binary manipulation, Elixir’s tools are top notch.

If you want to see complete examples of the endian conversion code shown in this article, check out the BitcoinNetwork.Protocol.NetAddr module in my new bitcoin_network project on Github.