Building Check-Checker as a Meteor Plugin

Written by Pete Corey on Nov 23, 2015.

I recently decided to switch my east5th:check-checker package to use the new Build Plugin API.

Before switching to the new linter API, I was using ESLint’s CLIEngine to power check-checker. This resulted in a few bugs due to assumptions CLIEngine made about its environment. I decided to ditch CLIEngine and have Meteor’s new linter API do the heavy lifting of delivering the files that need to be checked.

Buried deep within the Meteor wiki, there’s a fantastic guide for working with the new Build Plugin API. This wiki entry, combined with the jshint package were my guiding lights for this refactor.


The first step to turning a package into a linter is to modify your package.js. Linters, minifiers, and compilers are all considered “plugins” within the Meteor ecosystem, and need to be registered as such. This is done through a call to Package.registerBuildPlugin:

Package.registerBuildPlugin({
 name: "check-checker",
 sources: [
   "lib/rules/checks.js",
   "lib/check-checker.js"
 ],
 npmDependencies: {
   eslint: "0.24.1"
 }
});

Package.onUse(function(api) {
 api.use("isobuild:linter-plugin@1.0.0");
});

In our package code, we register our linter with a call to Plugin.registerLinter. We pass in the types of files we want to operate on, the architectures we want to look for these files in, and a function that returns an instance of our linter.

By specifying an architecture of "os", our linter will only rerun when changes are made to server code. Client source files will be ignored.

Plugin.registerLinter({
 extensions: ["js"],
 archMatching: "os"
}, function() {
 return new CheckChecker();
});

This last argument is the most important. You’ll notice that we’re returning a new instance of a CheckChecker function. Later on, we add a function to CheckChecker.prototype called processFilesForPackage.

This function is called directly by the linter for each set of files that match the criteria we specified above. The goal of our linter is to iterate over each of these files, looking for missing calls to check. When we find a problem we report it through a call to the error function with is attached automatically to each file instance we’re given.

function CheckChecker() {
 eslint.linter.defineRule('checks', checks);
};

CheckChecker.prototype.processFilesForPackage = function(files, options) {
...
 files.forEach(function(file) {
   var source = file.getContentsAsString();
   var path = file.getPathInPackage();
   var results = eslint.linter.verify(source, config, path);
   results.forEach(function(result) {
     file.error({
       message: result.message,
       line: result.line,
       column: result.column
     });
   });
 });
};

The rest of the processFilesForPackage function is ESLint specific and fairly uninteresting. We’re setting up a configuration object, and verifying that the give file complies with all of the rules we’ve created.

If you dig through the check-checker source, you’ll notice that I’m using getSourceHash to accomplish some basic in-memory caching. The goal here is to prevent ESLint from running on files it’s already verified. It’s recommended that you do some kind of caching to keep build times as fast as possible.


Creating linters using Meteor’s new Build Plugin API is a fairly straight-forward and painless process. I highly recommend taking a look at Build Plugin API wiki entry and the jshint package as an example implementation.

If you want another example of a linter using the Batch Plugin API, check out east5th:check-checker!

Sorting By Ownership With MongoDB

Written by Pete Corey on Nov 16, 2015.

I sometimes find myself coming up against constraints or limitations imposed upon my software either through the tools that I’m using, or by a limited understanding of how to use those tools. In these situations we’re always given two options:

  1. Bend or reshape your solution to fit the constraint
  2. Maintain your design and overcome the limitation

A perfect example of this would be something as seemingly simple as sorting a collection of documents by ownership using MongoDB.

Let’s say we have a huge collection of documents in our database. An example document would look something like this:

{
  ownerId: "XuwWcLue9zom8DqEA",
  name: "Foo"
  ...
}

Each document is owned by a particular user (denoted by the ownerId field). On the front-end, we want to populate a table with these documents. The current user’s documents should appear first, secondarily sorted by the document’s name field, and all other documents should follow, sorted by their name.

Sorting by Ownership is Hard

There are a couple things going on here that make this a difficult problem. First thing’s first, “ownership” is a computed value. You can’t determine if a document belongs to a user until you receive some input from the user; specifically their ID.

Unfortunately, while there are tools that let us attach computed values to our documents, we can’t search or sort on those fields at a database level. This also means that we can’t paginate our data off of those calculated fields.

The second issue is the size of our imaginary collection. If our collection were smaller, we could just pull everything into memory and (painfully) sort the documents ourselves:

Collection.find({}).sort(function(a, b) {
  if (a.ownerId === Meteor.userId()) {
    if (b.ownerId === Meteor.userId()) {
      return a.name < b.name ? -1 :
             a.name == b.name ? 0 : 1;
    }
    else {
      return -1;
    }
  }
  else if (b.ownerId === Meteor.userId()) {
    return 1;
  }
  else {
    return a.name < b.name ? -1 :
           a.name == b.name ? 0 : 1;
  }
});

Unfortunately, we have a very large number of documents, so pulling them all down into memory at once is unfeasible. This means that we need to sort and paginate our data in the database. See issue #1.

This leaves us with two options as application developers:

  1. Change our application design to better fall in line with the restrictions MongoDB imposes upon us. For example, we could show two separate tables - one of documents we own sorted by name, and another of documents we don’t own sorted by name.
  2. Fight back!

Let’s choose option #2.

Encoding Ownership In The Document

The fundamental problem that we’re facing here is that everything we want to sort on needs to live on the document we’re sorting. This means that if we want to sort on ownership, ownership for each user needs to be encoded into each document. This can be a little mind-bending to consider.

At first, you may be thinking that ownership is already encoded through the ownerId field. Unfortunately, ownerId only tells us the owner’s ID, not whether the current user’s ID matches that ID. We need to somehow store that calculation on the document to be able to use it in an actionable way.

One way to do this is to create a field on the document when it’s created. The value of this field is the owner’s ID. Within that field we store a simple object that holds an ownership flag:

{
  …
  "XuwWcLue9zom8DqEA": {
    "owner": 1
  }
}

This object can be inserted into each document automatically using a variety of hooking or data management techniques. Here’s how you would implement it if you were using matb33:collection-hooks:

Documents.before.insert(function(userId, doc) {
  doc[userId] = {
    owner: 1
  };
  return doc;
});

This seems a little unconventional, but it opens up the path to our goal: sorting by ownership. Check out how we would construct our sorting query:


var sort = [
  [this.userId + ".owner", -1],
  ["name", 1]
];

Documents.find({
  ...
}, {
  sort: sort,
  ...
});

Using this query, all documents we own will be returned first, sorted by their name, followed by all documents we don’t own, sorted by their name. Victory!

Don’t Pollute the Document

There is a downside to the above approach.

By encoding the ownership calculation into the document itself, we’re polluting the document. This new nested object has no real purpose, other than to get around a technical limitation, and in many ways is just a duplication of the information held by ownerId.

A better solution would give us this same functionality without polluting the document. Thankfully, we can leverage the power of MongoDB aggregations to accomplish just that.

Our aggregation will operate in two steps. The first step will be to calculate the ownership flag and add it to each document we’re sorting. The second step is to sort our documents, first by this ownership flag and next by the document’s name.

We’ll use the $cond operator to calculate a new owned flag on each document by comparing the value of ownerId to the current user’s ID (which is passed into our aggregation). This calculated value is set on each returned document during the projection stage of our aggregation pipeline. Check it out:

Documents.aggregate([
    {
        $project: {
            owned: {$cond: [{$eq: ["$ownerId", this.userId]}, 1, 0]},
            name: "$name"
            ...
        }
    },
    {
        $sort: {
            owned: -1,
  name: 1
        }
    }
]);

We’re using Mongo’s aggregation framework within our Meteor application using the meteorhacks:aggregation package. Be sure to check out Josh Owen’s great article about using meteorhacks:aggregation to power your publications.

By building the owned field on the fly in our aggregation, we get all of the benefits of encoding our ownership information into the document, with none of the downsides of permanently polluting the document with this information.

Don’t Let the Tool Use You

Every tool we use comes with a certain set of limitations and constraints. Sometimes these constraints exists for very good reasons, and trying to work around them can lead to very serious performance issues or security vulnerabilities. Other times, these constraints are just limitations of the technologies we’re using, or limitations in our understanding.

Originally, we thought MongoDB was the problem. By exploring alternative solutions and building a deeper understanding of the tool, we realized that we could use MongoDB to solve the problem!

When you’re facing limitations imposed by your tools, don’t immediately concede. Always try to understand why the limitation exists, and how you can (or can’t) overcome it.

Why I Can't Wait For ES6 Proxies

Written by Pete Corey on Nov 9, 2015.

Full ES6 support is just around the corner. In fact, nearly all of ES6 is available to us through compilers like Babel that transpile ES6 syntax into ES5 code. Unfortunately, one of the ES6 features I’m most excited about can’t be implemented in ES5. What feature is that? Proxies, of course!

Proxies make some incredibly exciting things possible. Imagine a Meteor method like the one below:

Meteor.methods({
  foo: function(bar) {
    return Bars.remove(bar._id);
  }
});

As I’ve talked about in the past, this method exposes our application to a serious security vulnerability. A user can pass in an arbitrary MongoDB query object in the _id field of bar like this:

Meteor.call("foo", {_id: {$gte: ""}});

This would delete all of the documents from our Bars collection. Uh oh! Imagine if we could automatically detect and prevent that from happening, and instead throw an exception that tells the client:

Meteor.Error: Tried to access unsafe field: _id

Our _id field would be accessible only after we check it:

Meteor.methods({
  foo: function(bar) {
    check(bar, {
      _id: String
    });
    return Bars.remove(bar._id);
  }
});

Any attempts to access a field on a user-provided object will throw an exception unless it’s been explicitly checked for safety. If this were possible, it could be used to prevent entire categories of security vulnerabilities!

With proxies, we can make this happen.

What is a Proxy?

An ES6 Proxy is basically a middleman between an object, and the code trying to access that object. When we wrap an object with a proxy, we can oversee (and interfere with) every action taken on that object.

Proxies do this overseeing through “traps”. A trap is just a callback that’s called whenever a certain action is taken on the proxy object. For example, a get trap is triggered any time a piece of code tries to get the value of a field on the proxy. Likewise, a set trap is triggered any time you try to set the value of a field.

In the above example, our proxy sees that we’re trying to access _id on the bar object, but because it knows that check hasn’t been called on that field yet, it throws an exception. If we had checked the field, the proxy would have let _id’s value pass through.

A rough sketch of this kind of proxy would look something like this:

CheckProxy = {
  get: function(target, field) {
    if (!target ||
        !target.__checked ||
        !target.__checked[field]) {
      throw new Error("Tried to access unsafe field: " + field);
    }
    return target[field];
  }
};

But how does the proxy know when a field has been checked? We have to explicitly tell the proxy that each field has been checked after we’ve determined that it’s safe to use. One way to do this is through a custom set trap:

CheckProxy = {
  ...
  set: function(target, field, value) {
    if (field == "__checked") {
      if (!target.__checked) {
        target.__checked = {};
      }
      target.__checked[value] = true;
    }
    else {
      target[field] = value;
    }
    return true;
  }
};

If we wanted to use our proxy as-is, there would be a good amount of manual work involved. We’d have to instantiate a new proxy object for each one of our object arguments, and then explicitly notify the proxy after each check:

Meteor.methods({
  foo: function(bar) {
    bar = new Proxy(bar, CheckProxy);
    check(bar, {
      _id: String
    });
    bar.__checked = "_id";
    return Bars.remove(bar._id);
  }
});

This is too much work! It wouldn’t take long to lose diligence and fall back to not checking arguments at all.

Thankfully, we can hide all of this manual work through the magic of monkey patching.

The first thing we’ll do is patch our check method to tell our proxy whenever we check a field on an object:

_check = check;
check = function(object, fields) {
  if (object instanceof Object) {
    Object.keys(fields).forEach(function(field) {
      object.__checked = field;
    });
  }
  _check.apply(this, arguments);
};

Next, we just have to patch Meteor.methods to automatically wrap each Object argument in a proxy:

_methods = Meteor.methods;
Meteor.methods = function(methods) {
  _.each(methods, function(method, name, obj) {
    obj[name] = function() {
      _.each(arguments, function(value, key, obj) {
        if (value instanceof Object) {
          obj[key] = new Proxy(value, CheckProxy);
        }
        else {
          obj[key] = value;
        }
      });
      method.apply(this, arguments);
    };
  });
  _methods.apply(this, arguments);
};

Whew, this is getting dense!

Thankfully, that’s all the patching we have to do. Now, we can revert back to our original method and still reap all of the benefits of automatic check enforcement for all object fields throughout all of our Meteor methods.

Shortcomings

ES6 Proxies are currently only supported in Firefox, which means that what I described above currently isn’t possible. Until proxy support comes to V8, Node.js, and finally Meteor, all we can do is wait and dream.

The implementation I described here is fairly unsophisticated. It only works when accessing fields within the first layer of an object. It also pollutes the provided object with a __checked field, which may wreak inadvertent havoc. In future versions of this idea, both of these issues could easily be solved.

I hope this post has given you a taste of the awesome power of proxies. Fire up your Firefox console and start experimenting!