Recently, while reviewing a system I built for a client, I realized how strongly the “decision” to use MongoDB effected the architecture and structure of the system.

Using a better tool for the job would have significantly simplified the architecture of the solution and resulted in a more robust and reliable final product.

The System

The general idea of the system is that there are a set of “entities”. Each entity has a set of actions that can be done on them. For example, let’s imagine that we have a Child entity. Here are a few of the actions that can be taken on our Child:

Each action has two main parts. A validate method, and a do method.

Whenever you execute an action on an entity, the validate function will be called first. If this functions fails in any way, we’ll return the failure to the caller and stop all execution of the action before moving forward.

If validate doesn’t catch any problems, we’ll move on to the do function, which executes the meat of the action.

As an example, our put_to_bed action might look something like this:


put_to_bed: {
    validate({bedId}) {
        check(bedId, String);
    },
    do() {
        Child.update(this._id, {
            $set: { in_bed: true }
        });
        Bed.findOne(bedId).do("set_occupant", { childId: this._id });
    }
}

This seems all well and good. We check that bedId, or the ID of the Bed we’re putting the child into is a String. When we execute the action, we update the current Child document and set in_bed to true. Next, we find the bed we’re putting the child into and set its occupant to the child’s ID.

Broken State

But what happens if the bed already has an occupant?

Calling Bed.findOne(bedId).do(...) will trigger the set_occupant action to be triggered on the bed. It’s valiate function will be called, which might look something like this:


set_occupant: {
    validate({childId}) {
        check(childId, String);
        if (this.occupied) {
            throw new Meteor.Error("occupied");
        }
    },
    ...
}

The set_occupant action on the Bed will fail.

This leaves our system in a broken state. The child claims that it’s in bed (in_bed: true), but the bed is occupied by someone else.

Two Phase Commit Problems

The MongoDB documentation explains that this kind of multi-document transaction-style commit can be accomplished using two phase commits. The idea is that we keep track of our set of database changes as we carry out actions, and undo them if things go wrong.

The example two phase commit described in the documentation updates two documents within the same collection. Unfortunately our problem is a little more complex.

The example holds the IDs of the documents being updated in the transaction’s source and destination fields. Our transactions will update an arbitrary number of documents across any number of collections.

Instead of a single source and destination pair, we would need to maintain a list of affected documents, storing both the documents’ _id and collection:


{
    ...
    documents: [
        {
            collection: "children",
            _id: ...
        },
        {
            collection: "beds",
            _id: ...
        }
    ]
}

If something goes wrong in a two phase commit, any updates that have already been carried out need to be rolled back.

In the example scenario described in the MongoDB documentation, rollbacks are easy. All updates are simple increments ($inc: { balance: value }), and can be undone by decrementing by the same value ($inc: { balance: -value }).

But again, our scenario is more complicated.

Our actions are free to modify their respective documents in any way. This means that we have no natural way of undoing these modifications without either storing more additional data, or adding additional code.

One potential solution would be to store the original, pre-modification document along with the transaction’s _id in the pendingTransactions list:


{
    in_bed: true,
    ...
    pendingTransactions: [
        {
            _id: ...,
            document: {
                in_bed: false,
                ...,
                pendingTransactions: []
            }
        }
    ]
}

In the case of a roll-back, we could replace the entire document with this pre-modification document. The downside of this approach is that it drastically increases the size of our entity documents.

Another approach would be to create a new undo function to go along with each of our actions’ do functions. The undo function would simply undo any operations done by the do function.

This approach is very similar to the migration model used by Active Record and other migration frameworks. The obvious downsides of this approach are that we’re adding a huge amount of extra code to our application.

As my good friend Bret Lowrey says, “Code is like a war - the best code is one never written.”

My Kingdom For Transactions

It’s amazing how much architectural effort needs to be put into creating a functional, but awkward solution to this problem.

Interestingly, this kind of problem isn’t unique to this specific application. Most web applications do some kind of transactional updates against multiple documents across one or more collections.

Many developers just ignore the possibility of mid-transaction failures. If it happens, it happens. We’ll just clean up the database on an ad hoc basis.

And why not? When your alternatives are either doubling the size of your codebase or doubling the size of your database, a little manual labor starts to sound more appealing.

For this particular application, we decided that it would make more sense to invest in heavier upfront validation (via robust validate functions and simulations), rather than implementing a proper two phase commit system.

However, this entire mess could have been completely avoided if we had gone with a database that supported proper transactions.

My kingdom for transactions…