Inject Detect is Launching Soon

Written by Pete Corey on Aug 28, 2017.

It’s been a long and winding road, but development and testing on the initial release of my latest project, Inject Detect, is done.

Finished!

I first announced Inject Detect back in March of this year. At that time, I estimated that the project would be finished and released mid-year.

At that point in time, I estimated that three to four months of development time was ample for the project I intended to build. Unfortunately, as software projects often do (especially when personal projects), scope creeped and timelines extended.

Take a look at the project timeline below:

As you can see, development took longer than expected. Work on the project largely took place in the evenings and in free blocks of time scheduled around my existing client work.

Not only that, but Inject Detect underwent three large refactors during its first iteration. Check out this GitHub graph of insertions and deletions for a visualization of my inner turmoil:

In hindsight, I should have prioritized shipping over perfecting my engineering vision.

That being said, I’m happy I took the time to explore alternative implementations of my idea. Along the way I learned a great deal about event sourcing, building distributed systems, Elixir, Phoenix, GraphQL, and React.

The development of Inject Detect is finished, but it’s not currently available to the public as of yet.

If you’re interested in getting your hands on Inject Detect as soon as possible, be sure to sign up for the Inject Detect newsletter where I’ll be first announcing the release in the coming weeks!

Advanced MongoDB Query Batching with DataLoader and Sift

Written by Pete Corey on Aug 21, 2017.

Last week we dove into the gloriously efficient world of batching GraphQL queries with DataLoader.

In all of the examples we explored, the queries being batched were incredibly simple. For the most part, our queries consisted of _id lookups followed by a post-processing step that matched each result to each _id being queried for.

This simplicity is great when working with examples, but the real world is a much more complicated, nuanced place.

Let’s dive into how we can work with more complicated MongoDB queries, and use sift.js to map those queries back to individual results in a batched set of query results.

A More Complicated Example

Instead of simply querying for patients or beds by _id, let’s set up a more complicated example.

Imagine that we’re trying to find if a patient has been in a particular set of beds in the past week. Our query might look something like this:


return Beds.find({
    patientId: patient._id,
    bedCode: { $in: bedCodes },
    createdAt: { $gte: moment().subtract(7, "days").toDate() }
});

If this query were used as a resolver within our GraphQL patient type, we would definitely need to batch this query to avoid N + 1 inefficiencies:


{
    patients {
        name
        recentlyInBed([123, 234, 345]) {
            bedCode
            createdAt
        }
    }
}

Just like last time, our new query would be executed once for every patient returned by our patients resolver.

Batching with DataLoader

Using DataLoader, we can write a loader function that will batch these queries for us. We’d just need to pass in our patient._id, bedCodes, and our createdAt date:


return loaders.recentlyInBedLoader({
    patientId: patient._id,
    bedCodes,
    createdAt: { $gte: moment().subtract(7, "days").toDate()
});

Now let’s implement the recentlyInBedLoader function:


export const recentlyInBedLoader = new DataLoader(queries => {
    return Beds.find({ $or: queries }).then(beds => {
        // TODO: ???
    });
});

Because we passed our entire MongoDB query object into our data loader, we can execute all of our batched queries simultaneously by grouping them under a single $or query operator.

But wait, how do we map the results of our batched query batch to the individual queries we passed into our loader?

We could try to manually recreate the logic of our query:


return queries.map(query => {
    return beds.filter(bed => {
        return query.patientId == bed.patientId &&
            _.includes(query.bedCodes, bed.bedCode) &&
            query.createdAt.$gte <= bed.createdAt;
    });
});

This works, but it seems difficult to manage, maintain, and test. Especially for more complex MongoDB queries. There has to be a better way!

Sift.js to the Rescue

Sift.js is a library that lets you filter in-memory arrays using MongoDB query objects. This is exactly what we need! We can rewrite our loader function using sift:


export const recentlyInBedLoader = new DataLoader(queries => {
    return Beds.find({ $or: queries }).then(beds => {
        return queries.map(query => sift(query, beds));
    });
});

Perfect!

Now we can write loaders with arbitrarily complex queries and easily map the batched results back to the individual queries sent into the loader.

Sift can actually be combined with DataLoader to write completely generic loader functions that can be used to batch and match any queries of any structure against a collection, but that’s a post for another day.

Batching GraphQL Queries with DataLoader

Written by Pete Corey on Aug 14, 2017.

One of the biggest drawbacks of an out-of-the-box GraphQL solution is its tendency to make ridiculous numbers of N+1 queries. For example, consider the following GraphQL query:


{
    patients {
        name
        bed {
            code
        }
    }
}

We’re trying to grab all of the patients in our system, and for each patient, we also want their associated bed.

While that seems simple enough, the resulting database queries are anything but. Using the most obvious resolvers, our GraphQL server would ultimate make N+1 queries, where N represents the number of patients in our system.


const resolvers = {
    Query: {
        patients: (_root, _args, _context) =>  Patients.find({})
    },
    Patient: {
        bed: ({ bedId }, _args, _context) => Beds.findOne({ _id: bedId })
    }
};

Our application first queries for all patients (Patients.find), and then makes a Beds.findOne query for each patient it finds. Thus, we’ve made N (bed for patients) +1 (patients) queries.

This is unfortunate.

We could easily write a traditional REST endpoint that fetches and returns this data to the client using exactly two queries and some post-query transformations:


return Patients.find({}).then(patients => {
    return Beds.find({ _id: { $in: _.map(patients, 'bedId') } }).then(beds => {
        let bedsById = _.keyBy(beds, '_id');
        return patients.map(patient => {
            return _.extend({}, patient, {
                bed: bedsById[patient.bedId]
            });
        });
    });
});

Despite its elegance, the inefficiency of the GraphQL solution make it a no-go for many real-world applications.

Thankfully, there’s a solution! 🎉

Facebook’s dataloader package is the solution to our GraphQL inefficiency problems.

DataLoader is a generic utility to be used as part of your application’s data fetching layer to provide a consistent API over various backends and reduce requests to those backends via batching and caching.

There are many fantastic resources for learning about DataLoader, and even on using DataLoader in an Apollo-based project. For that reason, we’ll skip some of the philosophical questions of how and why DataLoader works and dive right into wiring it into our Apollo server application.

All we need to get DataLoader working in our application is to create our “batch”, or “loader” functions and drop them into our GraphQL context for every GraphQL request received by our server:


import loaders from "./loaders";
...
server.use('/graphql', function(req, res) {
    return graphqlExpress({
        schema,
        context: { loaders }
    })(req, res);
});

Continuing on with our current patient and bed example, we’ll only need a single loader to batch and cache our repeated queries against the Beds collection.

Let’s call it bedLoader and add it to our loaders.js file:


export const bedLoader = new DataLoader(bedIds => {
    // TODO: Implement bedLoader
});

Now that bedLoader is being injected into our GraphQL context, we can replace our resolvers’ calls to Beds.findOne with calls to bedLoader.load:


const resolvers = {
    Patient: {
        bed: ({ bedId }, _args, { loaders }) => loaders.bedLoader.load(bedId)
    }
};

DataLoader will magically aggregate all of the bedId values that are passed into our call to bedLoader.load, and pass them into our bedLoader DataLoader callback.

Our job is to write our loader function so that it executes a single query to fetch all of the required beds, and then returns them in order. That is, if bedIds is [1, 2, 3], we need to return bed 1 first, bed 2 second, and bed 3 third. If we can’t find a bed, we need to return undefined in its place:


export const bedLoader = new DataLoader(bedIds => {
    return Beds.find({ _id: { $in: bedIds } }).then(beds => {
        const bedsById = _.keyBy(beds, "_id");
        return bedIds.map(bedId => bedsById[bedId]);
    });
});

That’s it!

Now our system will make a single query to grab all of the patients in our system. For every patient we find, our bed resolver will fire and that patient’s bedId into our bedLoader DataLoader. Our bedLoader DataLoader will gather all of our bedId values, make a single query against the Beds collection, and return the appropriate bed to the appropriate bed resolver.

Thanks to DataLoader we can have the elegance of a GraphQL approach, combined with the efficiency and customizability of the manual approach.