As part of an ongoing quest to speed up an application I’m working on, I found myself tasked with writing a fairly complicated MongoDB aggregation pipeline. I found no existing documentation on how to accomplish the task at hand, so I figured I should pay it forward and document my solution for future generations.

Widgets and Icons

Imagine we have two MongoDB collections. Our first collection holds information about widgets in our system:


db.getCollection('widgets').insert({
    _id: 1,
    name: 'Name',
    info: [
        {
            iconId: 2,
            text: 'Text'
        }
    ]
});

Every widget has a name and a list of one or more info objects. Each info object has a text field and an associated icon referenced by an iconId.

Our icons collection holds some basic information about each icon:


db.getCollection('icons').insert({
    _id: 2,
    name: 'Icon',
    uri: 'https://...'
});

The goal is to write an aggregation that returns our widgets with the associated icon documents attached to each corresponding info object:


{
    _id: 1,
    name: 'Name',
    info: [
        {
            iconId: 2,
            text: 'Text',
            icon: {
                _id: 2,
                name: 'Icon',
                uri: 'https://...'
            }
        }
    ]
}

Working Through the Pipeline

The aggregation that accomplishes this goal operates in six stages. Let’s work through each stage one by one. We’ll start by $unwinding our info array:


db.getCollection('widgets').aggregate([
    { $unwind: '$info' }
]);

This creates a new document for every widget/info pair:


{
    _id: 1,
    name: 'Name',
    info: {
        iconId: 2,
        text: 'Text',
    }
}

Next, we’ll $lookup the icon associated with the given iconId:


db.getCollection('widgets').aggregate([
    ...
    {
        $lookup: {
            from: 'icons',
            localField: 'info.iconId',
            foreignField: '_id',
            as: 'info.icon'
        }
    }
]);

Our resulting document will now have a list of icons in the info.icon field:


{
    _id: 1,
    name: 'Name',
    info: {
        iconId: 2,
        text: 'Text',
        icon: [
            {
                _id: 2,
                name: 'Icon',
                uri: 'https://...'
            }
        ]
    }
}

This is a step in the right direction, but we know that the info to icons relationship will always be a one to one relationship. We’ll always receive exactly one icon as a result of our $lookup.

Armed with this knowledge, we know we can $unwind on info.icon and safely turn our info.icon array into an object:


db.getCollection('widgets').aggregate([
    ...
    { $unwind: '$info.icon' }
]);

{
    _id: 1,
    name: 'Name',
    info: {
        iconId: 2,
        text: 'Text',
        icon: {
            _id: 2,
            name: 'Icon',
            uri: 'https://...'
        }
    }
}

But now we need to roll our info back up into an array. We can accomplish this by $grouping our widgets together based on their _id. However, we need to be careful to preserve the original document to avoid clobbering the entire widget:


db.getCollection('widgets').aggregate([
    ...
    {
        $group: {
            _id: '$_id',
            root: { $mergeObjects: '$$ROOT' },
            info: { $push: '$info' }
        }
    }
]);

Our resulting document contains our info array and the original, pre-$group widget document in the root field:


{
    root: {
        _id: 1,
        name: 'Name',
        info: {
            iconId: 2,
            text: 'Text',
            icon: {
                _id: 2,
                name: 'Icon',
                uri: 'https://...'
            }
        }
    },
    info: [
        {
            iconId: 2,
            text: 'Text',
            icon: {
                _id: 2,
                name: 'Icon',
                uri: 'https://...'
            }
        }
    ]
}

The next step in our pipeline is to replace our root document with the root object merged with the actual root document. This will override the info object in root with our newly grouped together info array:


db.getCollection('widgets').aggregate([
    ...
    {
        $replaceRoot: {
            newRoot: {
                $mergeObjects: ['$root', '$$ROOT']
            }
        }
    }
]);

We’re getting close to our goal:


{
    _id: 1,
    name: 'Name',
    info: [
        {
            iconId: 2,
            text: 'Text',
            icon: {
                _id: 2,
                name: 'Icon',
                uri: 'https://...'
            }
        }
    ],
    root: {
        _id: 1,
        name: 'Name',
        info: {
            iconId: 2,
            text: 'Text',
            icon: {
                _id: 2,
                name: 'Icon',
                uri: 'https://...'
            }
        }
    }
}

An unfortunate side effect of this merger is that our resulting document still has a root object filled with superfluous data. As a final piece of housecleaning, let’s remove that field:


db.getCollection('widgets').aggregate([
    ...
    {
        $project: {
            root: 0
        }
    }
]);

And with that we’re left with our original goal:


{
    _id: 1,
    name: 'Name',
    info: [
        {
            iconId: 2,
            text: 'Text',
            icon: {
                _id: 2,
                name: 'Icon',
                uri: 'https://...'
            }
        }
    ]
}

Success!

All Together

For posterity, here’s the entire aggregation pipeline in its entirety:


db.getCollection('widgets').aggregate([
    { $unwind: '$info' },
    {
        $lookup: {
            from: 'icons',
            localField: 'info.iconId',
            foreignField: '_id',
            as: 'info.icon'
        }
    },
    { $unwind: '$info.icon' },
    {
        $group: {
            _id: '$_id',
            root: { $mergeObjects: '$$ROOT' },
            info: { $push: '$info' }
        }
    },
    {
        $replaceRoot: {
            newRoot: {
                $mergeObjects: ['$root', '$$ROOT']
            }
        }
    },
    {
        $project: {
            root: 0
        }
    }
]);

I’ll be the first to say that I’m not a MongoDB expert, and I’m even less knowledgeable about building aggregation pipelines. There may be other, better ways of accomplishing this same task. If you know of a better, more efficient pipeline that gives the same results, please let me know!