NoSQL injection is a class of application vulnerability where a malicious user can inject control structures into a query against a NoSQL database. MongoDB is the usual victim in these types of attacks, for reasons we’ll discuss towards the end of the article.

Coming most recently from a Meteor background, NoSQL injection is no stranger to me. It’s one of the most prevalent vulnerabilities I find during Meteor security assessments.

Interestingly, as I’ve been diving headfirst into Elixir and the Phoenix framework, I’ve been seeing the NoSQL injection monster raising its ugly head.

Let’s take a look at what NoSQL injection actually is, what it looks like in an Elixir/Phoenix application, and how it can be prevented.

What is NoSQL Injection

NoSQL injection is an interesting vulnerability that’s especially prevalent in systems built with MongoDB. NoSQL injection can occur when a user’s unvalidated, unsanitized input is inserted directly into a MongoDB query object.

To make things more real, let’s demonstrate NoSQL injection with a (slightly contrived) example.

Imagine you have a Phoenix channel that removes shopping cart “item” documents from a MongoDB collection whenever it receives an "empty_cart" event:

def handle_in("empty_cart", cart_id, socket) do
  |> Mongo.delete_many("items", %{"cart_id" => cart_id})
  {:noreply, socket}

This code is making the assumption that the "empty_cart" channel event will always be invoked with cart_id as a string. However, it’s important to realize that cart_id can be any JSON-serializable type.

What would happy if a malicious user passed in {$gte: ""} as a cart_id? Our resulting MongoDB query would look like this:

Mongo.delete_many("items", %{"cart_id" => %{"$gte" => ""}})

This query would remove every item document in the database.

Similar types of attacks can be used to fetch large amounts of unauthorized data from find and findOne queries.

Even more dangerously (and more contrived), let’s imagine we have a channel event handler that lets users search through their cart items for items matching a user-provided key/value pair:

def handle_in("find_items", %{"key" => key, "value" => value}, socket) do
  items = MongoPool
  |> Mongo.find("items", %{
       key => value,
       "user_id" => socket.assigns[:user]._id
  |> Enum.to_list
  {:reply, {:ok, items}, socket}

This seems relatively safe. We’re assuming that the user will pass in values like "foo"/"bar" for "key" and "value", respectively.

However, what would happen if a malicious user passed in "$where" and "d = new Date; do {c = new Date;} while (c - d < 10000);" as a "key"/"value" pair?

The resulting MongoDB query would look like this:

Mongo.find("items", %{
  "$where" => "d = new Date; do {c = new Date;} while (c - d < 10000);",
  "user_id" => socket.assigns[:user].id

By exploiting the $where operator in this way, the malicious user could peg the CPU of the server running the MongoDB instance at 100% for ten seconds per document in the collection, preventing any other queries from executing during that time.

This malicious elixir loop could easily be modified to run indefinitely, requiring you to either kill the query manually, or restart your database process.

How to Prevent It

Preventing this flavor of NoSQL injection is fairly straight-forward. You simply need to make assertions about the types of your user-provided data.

If you’re expecting cart_id to be a string, make sure it’s a string before working with it.

In Elixir, this type of type checking can be neatly accomplished with pattern matching. We can patch up our first example with a simple pattern match that checks the type of cart_id:

def handle_in("empty_cart", cart_id, socket) when is_binary(cart_id) do
  |> Mongo.delete_many("items", %{"cart_id" => cart_id})
  {:noreply, socket}

The when is_binary(cart_id) guard expression asserts that cart_id is a binary type (i.e., a string) before pattern matching on this instance of the handle_in function.

If a malicious user passed in %{"$gte" => ""} for an cart_id, this version of our "empty_cart" handler would not be evaluated, preventing the possibility of NoSQL injection.

Our "find_items" example is also susceptible to query objects being passed in as value, and would benefit from guard clauses.

However, the fundamental flaw with this example is that user input is being directly used to construct a root level MongoDB query.

A better version of our "find_items" channel event handler might look something like this:

def build_query("name", value), do: %{ "name" => value }
def build_query("category", value), do: %{ "category" => value }

def handle_in("find_items",
              %{"key" => key,
                "value" => value},
              socket) when is_binary(key) and is_binary(value)
  query = build_query(key, value)
  |> Map.put("user_id", socket.assigns[:user]._id
  items = MongoPool
  |> Mongo.find("items", query)
  |> Enum.to_list
  {:reply, {:ok, items}, socket}

By mapping between the provided key value and a list of known MongoDB query objects, we know that nothing can be injected into the root of our query.

Alternatively, we can continue to use the raw value of key to construct our query, but we can add a key in ["name", "category"] guard clause to our handle_in function to assert that the user is only searching over the "name" or "category" fields:

def handle_in("find_items",
              %{"key" => key,
                "value" => value},
              socket) when key in ["name", "category"] and is_binary(value)

By preventing malicious users from controlling the root level of our MongoDB query, we can prevent several types of nasty NoSQL injection vulnerabilities within our application.

That being said, the best way to prevent these kinds of injection attacks is to use a query builder, like Ecto.

Unfortunately, as we discussed last week, the Mongo.Ecto adapter is currently in a state of flux and does not play nicely with Ecto 1.1 or Ecto 2.0.

Picking on MongoDB

This type of NoSQL injection mostly applies to applications using MongoDB. This is because MongoDB has made the “interesting” design decision to intermix query control structures and query data in a single query object.

If a malicious user can inject data into this object, they can potentially inject query control structures as well. This is the fundamental idea behind NoSQL injection.

Looking at other NoSQL databases, it becomes apparent that MongoDB is alone in making this design decision.

Redis, for example, is a much simpler solution overall. Redis doesn’t mix data and control structures. The query type is specified up-front, almost always by the application, and unescapable data follows.

As another example, CouchDB lets developers build custom queries through “views”, but these views are written in advance and stored on the server. They can’t be modified at runtime, let alone modified by a malicious user.

There are already a host of compelling reasons not to use MongoDB. I would add MongoDB’s decision to intermix data and control structures to this ever growing list.

Final Thoughts

While MongoDB does have its short-comings, it’s important to realize that it’s still being used extensively in the Real World™. In fact, MongoDB is the most popular NoSQL database, standing heads and shoulders above its competition in usage statistics.

For this reason, it’s incredibly important to understand MongoDB-flavored NoSQL injection and how to prevent it in your applications.

For more information on NoSQL injection, check out the “NoSQL Injection in Modern Web Applications” presentation I gave at last year’s Crater Conference, and be sure to grab a copy of my “Five Minute Introduction to NoSQL Injection”.