Advanced MongoDB Query Batching with DataLoader and Sift

Written by Pete Corey on Aug 21, 2017.

Last week we dove into the gloriously efficient world of batching GraphQL queries with DataLoader.

In all of the examples we explored, the queries being batched were incredibly simple. For the most part, our queries consisted of _id lookups followed by a post-processing step that matched each result to each _id being queried for.

This simplicity is great when working with examples, but the real world is a much more complicated, nuanced place.

Let’s dive into how we can work with more complicated MongoDB queries, and use sift.js to map those queries back to individual results in a batched set of query results.

A More Complicated Example

Instead of simply querying for patients or beds by _id, let’s set up a more complicated example.

Imagine that we’re trying to find if a patient has been in a particular set of beds in the past week. Our query might look something like this:


return Beds.find({
    patientId: patient._id,
    bedCode: { $in: bedCodes },
    createdAt: { $gte: moment().subtract(7, "days").toDate() }
});

If this query were used as a resolver within our GraphQL patient type, we would definitely need to batch this query to avoid N + 1 inefficiencies:


{
    patients {
        name
        recentlyInBed([123, 234, 345]) {
            bedCode
            createdAt
        }
    }
}

Just like last time, our new query would be executed once for every patient returned by our patients resolver.

Batching with DataLoader

Using DataLoader, we can write a loader function that will batch these queries for us. We’d just need to pass in our patient._id, bedCodes, and our createdAt date:


return loaders.recentlyInBedLoader({
    patientId: patient._id,
    bedCodes,
    createdAt: { $gte: moment().subtract(7, "days").toDate()
});

Now let’s implement the recentlyInBedLoader function:


export const recentlyInBedLoader = new DataLoader(queries => {
    return Beds.find({ $or: queries }).then(beds => {
        // TODO: ???
    });
});

Because we passed our entire MongoDB query object into our data loader, we can execute all of our batched queries simultaneously by grouping them under a single $or query operator.

But wait, how do we map the results of our batched query batch to the individual queries we passed into our loader?

We could try to manually recreate the logic of our query:


return queries.map(query => {
    return beds.filter(bed => {
        return query.patientId == bed.patientId &&
            _.includes(query.bedCodes, bed.bedCode) &&
            query.createdAt.$gte <= bed.createdAt;
    });
});

This works, but it seems difficult to manage, maintain, and test. Especially for more complex MongoDB queries. There has to be a better way!

Sift.js to the Rescue

Sift.js is a library that lets you filter in-memory arrays using MongoDB query objects. This is exactly what we need! We can rewrite our loader function using sift:


export const recentlyInBedLoader = new DataLoader(queries => {
    return Beds.find({ $or: queries }).then(beds => {
        return queries.map(query => sift(query, beds));
    });
});

Perfect!

Now we can write loaders with arbitrarily complex queries and easily map the batched results back to the individual queries sent into the loader.

Sift can actually be combined with DataLoader to write completely generic loader functions that can be used to batch and match any queries of any structure against a collection, but that’s a post for another day.

Batching GraphQL Queries with DataLoader

Written by Pete Corey on Aug 14, 2017.

One of the biggest drawbacks of an out-of-the-box GraphQL solution is its tendency to make ridiculous numbers of N+1 queries. For example, consider the following GraphQL query:


{
    patients {
        name
        bed {
            code
        }
    }
}

We’re trying to grab all of the patients in our system, and for each patient, we also want their associated bed.

While that seems simple enough, the resulting database queries are anything but. Using the most obvious resolvers, our GraphQL server would ultimate make N+1 queries, where N represents the number of patients in our system.


const resolvers = {
    Query: {
        patients: (_root, _args, _context) =>  Patients.find({})
    },
    Patient: {
        bed: ({ bedId }, _args, _context) => Beds.findOne({ _id: bedId })
    }
};

Our application first queries for all patients (Patients.find), and then makes a Beds.findOne query for each patient it finds. Thus, we’ve made N (bed for patients) +1 (patients) queries.

This is unfortunate.

We could easily write a traditional REST endpoint that fetches and returns this data to the client using exactly two queries and some post-query transformations:


return Patients.find({}).then(patients => {
    return Beds.find({ _id: { $in: _.map(patients, 'bedId') } }).then(beds => {
        let bedsById = _.keyBy(beds, '_id');
        return patients.map(patient => {
            return _.extend({}, patient, {
                bed: bedsById[patient.bedId]
            });
        });
    });
});

Despite its elegance, the inefficiency of the GraphQL solution make it a no-go for many real-world applications.

Thankfully, there’s a solution! 🎉

Facebook’s dataloader package is the solution to our GraphQL inefficiency problems.

DataLoader is a generic utility to be used as part of your application’s data fetching layer to provide a consistent API over various backends and reduce requests to those backends via batching and caching.

There are many fantastic resources for learning about DataLoader, and even on using DataLoader in an Apollo-based project. For that reason, we’ll skip some of the philosophical questions of how and why DataLoader works and dive right into wiring it into our Apollo server application.

All we need to get DataLoader working in our application is to create our “batch”, or “loader” functions and drop them into our GraphQL context for every GraphQL request received by our server:


import loaders from "./loaders";
...
server.use('/graphql', function(req, res) {
    return graphqlExpress({
        schema,
        context: { loaders }
    })(req, res);
});

Continuing on with our current patient and bed example, we’ll only need a single loader to batch and cache our repeated queries against the Beds collection.

Let’s call it bedLoader and add it to our loaders.js file:


export const bedLoader = new DataLoader(bedIds => {
    // TODO: Implement bedLoader
});

Now that bedLoader is being injected into our GraphQL context, we can replace our resolvers’ calls to Beds.findOne with calls to bedLoader.load:


const resolvers = {
    Patient: {
        bed: ({ bedId }, _args, { loaders }) => loaders.bedLoader.load(bedId)
    }
};

DataLoader will magically aggregate all of the bedId values that are passed into our call to bedLoader.load, and pass them into our bedLoader DataLoader callback.

Our job is to write our loader function so that it executes a single query to fetch all of the required beds, and then returns them in order. That is, if bedIds is [1, 2, 3], we need to return bed 1 first, bed 2 second, and bed 3 third. If we can’t find a bed, we need to return undefined in its place:


export const bedLoader = new DataLoader(bedIds => {
    return Beds.find({ _id: { $in: bedIds } }).then(beds => {
        const bedsById = _.keyBy(beds, "_id");
        return bedIds.map(bedId => bedsById[bedId]);
    });
});

That’s it!

Now our system will make a single query to grab all of the patients in our system. For every patient we find, our bed resolver will fire and that patient’s bedId into our bedLoader DataLoader. Our bedLoader DataLoader will gather all of our bedId values, make a single query against the Beds collection, and return the appropriate bed to the appropriate bed resolver.

Thanks to DataLoader we can have the elegance of a GraphQL approach, combined with the efficiency and customizability of the manual approach.

What if Elixir were Homoiconic?

Written by Pete Corey on Aug 7, 2017.

Because of it’s fantastically powerful macro system, Elixir is sometimes mistakenly referred to as a homoiconic programming language.

That being said, let’s put on our day-dreaming hats and think about what Elixir would look like if it were homoiconic.

What is Homoiconicity?

Before we start throwing around the word “homoiconic” and exploring how it applies to Elixir, let’s take the time to talk about what it means.

Boiled down to its essence, “homoiconic” when referring to programming languages means that “code is data”. That is, the code used to express a program is written using the data structures of that language.

The archetypal homoiconic family of programming languages is the Lisp family. The Lisp family includes languages like Common Lisp, Scheme, Clojure, and so on.

In most Lisps, list data structures are represented by values within sets of parentheses, separated by spaces:


(1 2 3)

Similarly, programs are represented by keywords and values within sets of parentheses, separated by spaces. Here’s an example of a function that calculates the Fibonacci sequence written in Scheme:


(define (fib n)
    (cond
      ((= n 0) 0)
      ((= n 1) 1)
      (else
        (+ (fib (- n 1))
           (fib (- n 2))))))

If we view this code through a homoiconic lens, we can see that it’s really just a set of nested lists. At its highest level, we’re looking at a list of three elements. The first element is the keyword define, while the second and third arguments are new lists.

This code is data, and this data is code.

Going deeper down the rabbit hole, we could write code (read: data) that takes code (read: data) as an argument and outputs new code (read: data). This type of function would be referred to as a macro.

Not only does homoiconicity give us powerful metaprogramming tools, but it’s also sublimely beautiful.

Is Elixir Homoiconic?

The Elixir programming language is not homiconic. Elixir programs aren’t written using data structures from the language itself. That being said, Elixir does have an incredibly powerful macro system that gives us many of the benefits of a truly homoiconic language.

Macros operate on Elixir’s abstract syntax tree (AST), which is basically a data structure that represents the structure of a given piece of Elixir code.

To visualize that idea, here’s a simple piece of Elixir code followed by its AST equivalent:


if (foo) do
  bar
end

{:if, [context: Elixir, import: Kernel],
 [{:foo, [], Elixir}, [do: {:bar, [], Elixir}]]}

Much of Elixir’s syntax is actually constructed with macros that operate directly on these ASTs. In fact, if itself is a macro and is replaced at compile-time with a case statement!

We can generate an AST for any piece of Elixir code using quote:


ast = quote do
  if (foo) do
    bar
  end
end

We can then use Macro.to_string to convert our AST back into printable code:


ast
|> Macro.to_string
|> IO.puts

This would result in our original if statement being printed to the console.

If Elixir Were Homoiconic…

If Elixir were homoiconic, we would essentially be writing these abstract syntax trees by hand, bypassing the lexing and parsing phase of Elixir compilation.

Let’s quickly break down Elixir’s AST structure so we can better understand what we would be writing.

Elixir ASTs, unlike Lisp programs which are composed of nested lists, are composed of nested tuples. Each tuple contains three parts: the name of the function being called, any necessary metadata related to the function call, any any arguments being passed into that function.


{:if, [context: Elixir, import: Kernel],
 [{:foo, [], Elixir}, [do: {:bar, [], Elixir}]]}

Using our previous example of an if statement, we can see that the first tuple is calling the :if function with two arguments: {:foo, [], Elixir}, and [do: {:bar, [], Elixir}].

This type of representation of an Elixir program is very similar to a Lisp, because a Lisp is essentially a textual representation of a program’s AST!


Using this newfound way of writing Elixir code, let’s write a basic GenServer module:


{:defmodule, [],
 [{:__aliases__, [], [:Stack]},
  [do: {:__block__, [],
    [{:use, [],
      [{:__aliases__, [], [:GenServer]}]},
     {:def, [],
      [{:handle_call, [],
        [:pop, {:_from, [], Elixir},
         [{:|, [],
           [{:h, [], Elixir},
            {:t, [], Elixir}]}]]},
       [do: {:{}, [],
         [:reply, {:h, [], Elixir},
          {:t, [], Elixir}]}]]},
     {:def, [],
      [{:handle_cast, [],
        [{:push, {:item, [], Elixir}}, {:state, [], Elixir}]},
       [do: {:noreply,
         [{:|, [], [{:item, [], Elixir}, {:state, [], Elixir}]}]}]]}]}]]}

Beautiful, isn’t it? No, I guess not.

In case you can’t grok what’s going on in the above code, it’s simply the basic implementation of a stack using GenServer as described by the Elixir documentation:


defmodule Stack do
  use GenServer

  def handle_call(:pop, _from, [h | t]) do
    {:reply, h, t}
  end

  def handle_cast({:push, item}, state) do
    {:noreply, [item | state]}
  end
end

It turns out that vanilla Elixir syntax is much easier to understand than our homoiconic representation.

Final Thoughts

If this has shown us anything, it’s that homoiconicity is something special.

It takes considerable upfront design work on the behalf of a language designer to create a homoiconic language that’s pleasant to use.

That being said, Elixir’s built-in macro system lets us take advantage of many of the benefits of a truly homoiconic language, while still giving us a syntax that is easy to use and understand.