Property Testing a Permutation Generator

Written by Pete Corey on Nov 19, 2018.

Last time we spent some time writing a function to generate permutations of length k of a given list. Our final solution was fairly concise, but there are quite a few places where we could have made a mistake in our implementation.

Our first instinct might be to check our work using unit tests. Unfortunately using unit tests to check the correctness of a permutation generator leaves something to be desired. The length of our resulting set of permutations grows rapidly when k and list increase in size, making it feasible to only manually calculate and test the smallest possible permutations.

Thankfully there’s another way. We can use property testing to test the underlying properties of our solution and the permutations we’re generating. This will give us quite a bit more variation on the inputs we test and might uncover some hidden bugs!

Our Permutation Generator

The permutation generator that we’ll be testing looks like this:


defmodule Permutation do
  def generate(list, k \\ nil, repetitions \\ false)
  def generate([], _k, _repetitions), do: [[]]
  def generate(_list, 0, _repetitions), do: [[]]

  def generate(list, k, repetitions) do
    for head <- list,
        tail <- generate(next_list(list, head, repetitions), next_k(k)),
        do: [head | tail]
  end

  defp next_k(k) when is_number(k),
    do: k - 1

  defp next_k(k),
    do: k

  defp next_list(list, _head, true),
    do: list

  defp next_list(list, head, false),
    do: list -- [head]
end

You’ll notice that it’s a little different than the generator we built last time. This generator supports the creation of permutations both with and without repetitions of elements from list, and lets you optionally pass in the length of the final permutations, k.

We can use our Permutation.generate/3 function like so:


iex(1)> Permutation.generate([1, 2, 3])
[[1, 2, 3], [1, 3, 2], [2, 1, 3], [2, 3, 1], [3, 1, 2], [3, 2, 1]]

We can also specify a value for k:


iex(2)> Permutation.generate([1, 2, 3], 2)
[[1, 2], [1, 3], [2, 1], [2, 3], [3, 1], [3, 2]]

And we can tell our generator to allow repetitions in the final permutations:


iex(3)> Permutation.generate([1, 2, 3], 2, true)
[[1, 1], [1, 2], [1, 3], [2, 1], [2, 2], [2, 3], [3, 1], [3, 2], [3, 3]]

All looks good so far, but dragons hide in the depths. Let’s dig deeper.

List of Lists

Permutations have some fairly well-known and easily testable properties. For example, we know that the results of our calls to Permutation.generate/3 should always have the structure of a list of lists.

Using Elixir’s StreamData package, we can easily model and check that this property holds for our Permuatation.generate/3 function across a wide range of inputs. Let’s start by creating a new property test to verify this for us:


property "returns a list of lists" do
end

We start by telling StreamData that we want it to generate lists of, at most, @max_length (which we’ll define at 5 for now) integers.


property "returns a list of lists" do
+  check all list <- list_of(integer(), max_length: @max_length) do
+  end
end

Next, we call our Permutation.generate/3 function to create the permutations of the list that was just generated for us:


  property "returns a list of lists" do
    check all list <- list_of(integer(), max_length: @max_length) do
+      permutations = Permutation.generate(list)
    end
  end

Finally we’ll make assertions about the structure of our resulting permutations. We want to assert that the result of our call to Permutation.generate/3 is a list, but also that every element in that result is a list as well:


property "returns a list of lists" do
  check all list <- list_of(integer(), max_length: @max_length) do
    permutations = Permutation.generate(list)
+    assert permutations |> is_list
+    Enum.map(permutations, &assert(&1 |> is_list))
  end
end

And that’s all there is to it. Running out test suite, we’ll see that our first property test passed with flying colors (well, mostly green).

.

Finished in 0.06 seconds
1 property, 0 failures

Correct Number of Permutations

Now that we know that the structure of our resulting list of permutations is correct, the next obvious property that we can test is that the number of permutations returned by our Permutation.generate/3 function is what we’d expect.

Permutations are a well-defined mathematical concept, and so a nice equation exists to determine how many k-length permutations exist for a list of n elements:


P(n, k) = n! / (n - k)!

Let’s write a quick factorial function to help calculate this value:


defp factorial(n) when n <= 0,
  do: 1

defp factorial(n),
  do: n * factorial(n - 1)

Let’s also rewrite our P(n, k) calculation as an Elixir helper function:


  defp pnk(list, k),
    do: div(factorial(length(list)), factorial(length(list) - k))

Great!

Now we’re set up to test that our Permutation.generate/3 function is giving us the correct number of permutations for a given list and value of k.


property "returns the correct number of permutations" do
end

This time we’ll generate our list, along with a value for k that ranges from 0 to the length of list:


property "returns the correct number of permutations" do
+  check all list <- list_of(integer(), max_length: @max_length),
+            k <- integer(0..length(list)) do
  end
end

Once we have values for list and k, we can generate our set of permutations and make an assertion about its length:


property "returns the correct number of permutations" do
  check all list <- list_of(integer(), max_length: @max_length),
            k <- integer(0..length(list)) do
+    assert pnk(list, k) ==
+            list
+            |> Permutation.generate(k)
+            |> length
  end
end

Once again, our tests pass.

..

Finished in 0.06 seconds
2 properties, 0 failures

Only Include Elements From the List

Another neatly testable property of the permutations we’re generating is that they should only contain values from the list being permutated. Once again, we’ll start by defining the property we’ll be testing and generate our values for list and k:


property "permutations only include elements from list" do
  check all list <- list_of(integer(), max_length: @max_length),
            k <- integer(0..length(list)) do
  end
end

Next, we’ll want to generate our set of permutations for list and k, and reject any permutations from that set that include values not found in list:


property "permutations only include elements from list" do
  check all list <- list_of(integer(), max_length: @max_length),
            k <- integer(0..length(list)) do
+    assert [] ==
+              list
+             |> Permutation.generate(k)
+             |> Enum.reject(fn permutation ->
+                [] ==
+                  permutation
+                 |> Enum.reject(&Enum.member?(list, &1))
+              end)
  end
end

We’re asserting that the resulting list of permutations should be an empty list ([]). There should be no permutations left that contain elements not found in list!

And, as expected, our suite still passes.

...

Finished in 0.08 seconds
3 properties, 0 failures

Use Each Item Once

Our current implementation of Permutation.generate/3 allows for duplicate items to be passed in through list. When we generate each possible permutation, it’s important that each of these duplicate items, and more generally any item in the list, only be used once.

That is, if list is [1, 2, 2], our set of possible permutations should look like this:


[[1, 2, 2], [1, 2, 2], [2, 1, 2], [2, 2, 1], [2, 1, 2], [2, 2, 1]]

Note that 2 is used twice in each permutation, but never more than twice.

At first, it seems like we might need a new test to verify this property of our permutation generator. It’s conceivable that we could group each set of equal elements, count them, and verify that the resulting permutation have the correct count of each element group. But that sounds complicated, and an added test just introduces more code into our codebase that we need to maintain.

It turns out that we can tweak our previous property test to verify this new property.

Instead of identifying duplicates, counting them, and verifying the correct counts in the final set of permutations, let’s take a simpler approach. Let’s ensure that each element in list is unique by using Elixir’s Enum.with_index/1 function to bundle the element with its index value.

For example, our previous [1, 2, 2] value for list would be transformed into:


[{1, 0}, {2, 1}, {2, 2}]

Now both of our 2 elements are unique. The first is {2, 1}, and the second is {2, 2}. Using this technique, we can recycle our “permutations only include elements from list” with a few slight tweaks:


  property "permutations only include elements from list" do
    check all list <- list_of(integer(), max_length: @max_length),
              k <- integer(0..length(list)) do
      assert [] ==
               list
+              |> Enum.with_index()
               |> Permutation.generate(k)
               |> Enum.reject(fn permutation ->
                 [] ==
                   permutation
-                  |> Enum.reject(&Enum.member?(list, &1))
+                  |> Enum.reject(&Enum.member?(list |> Enum.with_index(), &1))
               end)
    end
  end

And once again, our suite passes:


...

Finished in 0.08 seconds
3 properties, 0 failures

Final Thoughts

The current version of our property tests currently only test values of list composed entirely of integers. This doesn’t necessarily need to be the case. The list being permutated can contain literally anything, in theory. Expanding our property tests to support a wider range of types might be a great opportunity to try out writing a custom generator function!

Our current test suite also defines @max_length as 5. While testing, I noticed that values of @max_length up to 8 were very performant and finished in under a second on my machine. Running the suite with a @max_length of 9 took several seconds to complete, and using a value of 10 took nearly two minutes to come back green. I’m not sure if these performance problems can easily be improved, but I’m happy about how obvious they became through property testing.

You’ll also note that none of this testing covers generating permutations that allow infinite repetitions of elements from list. The properties for these sets of permutations are completely different, so I’m going to leave this as an exercise for the truly motivated reader.

I’m enjoying my experiences with property testing so far. Hopefully you find these write-ups useful as well. If so, let me know on Twitter!

Permutations With and Without Repetition in Elixir

Written by Pete Corey on Nov 12, 2018.

I’ve been hacking away at my ongoing Chord project, and I ran into a situation where I needed to generate all possible permutations of length k for a given list of elements, where repetitions of elements from our list are allowed. I figured this would be an excellent opportunity to flex our Elixir muscles and dive into a few possible solutions.

Base Cases

Let’s get the ball rolling by defining a Permutation module to hold our solution and a with_repetitions/2 function that accepts a list of elements, and a value for k:


defmodule Permutation do
  def with_repetitions(list, k)
end

We’ll start by defining a few base cases for our with_repetitions/2 function. First, if list is empty, we’ll want to return a list who’s first element is an empty list:


def with_repetitions([], _k), do: [[]]

We’ll do the same if k is 0:


def with_repetitions(_list, 0), do: [[]]

Note that we’re returning [[]] because the only possible permutations of an empty list, and the only possible zero-length permutation of a list is []. The list of all possible permutations, in that case, is [[]].

Building Our Permutator

Now we come to the interesting case where both list and k have workable values:


def with_repetitions(list, k) do
  # ...
end

We’ll start by mapping over every element in our list. We’ll use each of these elements as the head of a new k-length permutation we’re building.


list
|> Enum.map(fn head ->
  # ...
end)

For each value of head, we want to calculate all n - 1 length sub-permutations of our list, and concatenate each of these sub-permutations to our head:


list
|> with_repetitions(k - 1)
|> Enum.map(fn tail ->
  [head] ++ tail
end)

At this point, the result of our with_repetitions/2 function is a list of a list of permutations. For every head, we’re returning a list of all k-length permutations starting with that head. What we really want is a singly nested list of all permutations we’ve found.

To reduce our list of lists of permutations, we need to append each list of permutations for every value of head together:


|> Enum.reduce(&(&2 ++ &1))

This will order our permutations in lexical order (assuming our initial list was sorted). If we wanted our final set of permutations in reverse order, we could switch the order of our concatenation:


|> Enum.reduce(&(&2 ++ &1))

Or we could just use Kernel.++/2 to accomplish the same thing:


|> Enum.reduce(&Kernel.++/2)

Simplifying with Special Forms

This pattern of nested mapping and consolidating our result into a flat list of results is a fairly common pattern that we run into when writing functional code. It’s so common, in fact, that Elixir includes a special form specifically designed to make this kind of computation easier to handle.

Behold the list comprehension:


def with_repetitions(list, k) do
  for head <- list, tail <- with_repetitions(list, k - 1), do: [head | tail]
end

This single line is functionally equivalent to our original with_repetitions/2 function. The for tells us that we’re starting a list comprehension. For every value of head and every value of tail, we build a new permutation ([head | tail]). The result of our list comprehension is simply a list of all of these permutations.

Without Repetition

We’ve been trying to find permutations where repetitions of elements from list are allowed. How would we solve a similar problem where repetitions aren’t allowed?

It turns out that the solution is fairly straight-forward. When we generate our set of sub-permutations, or tail values, we can simply pass in list without the current value of head:


def with_repetitions(list, k) do
  for head <- list, tail <- with_repetitions(list -- [head], k - 1), 
    do: [head | tail]
end

That’s it! Each level of our permutation removes the current value of head preventing it from being re-used in future sub-permutations. It’s fantastically convenient that previously computed values in our list comprehension, like head, can be used in the subsequent values we iterate over.

Bending Jest to Our Will: Caching Modules Across Tests

Written by Pete Corey on Nov 5, 2018.

My test suite has grown to be unreliable. At first, a single red test raised its head. Not believing my eyes, I re-ran the suite, and as expected, everything came back green. As time went on and as more tests were offered up to the suite, random failures became more of a recurring problem. Eventually, the problem became so severe that the suite consistently failed, rather than consistently passed.

Something had to be done.

After nearly twenty four hours of banging my head against the wall and following various loose ends until they inevitably unraveled, I finally stumbled upon the cause of my problems.

Making Too Many Database Connections

Jest, the testing platform used by the project in question, insists on running tests in isolation. The idea is that tests run in isolation can also be run in parallel, which is the default behavior of Jest’s test runner. Due to decisions made far in the immutable past, our team decided to scrap parallel executions of tests and run each test sequentially with the --runInBand command line option.

The Jest documentation explains that running tests in band executes all of the tests sequentially within the same process, rather than spinning up a new process for every test:

Run all tests serially in the current process, rather than creating a worker pool of child processes that run tests.

However, when I ran the test suite I noticed that every test file was trigging a log statement that indicated that it just established a new database connection.


Connected to mongodb://localhost:27017/test

This explains a lot. If each test is spinning up its own database connection, it’s conceivable that our database simply can’t handle the amount of connections we’re forcing on it. In that case, it would inevitably time out and fail on seemingly random tests.

But if all of our tests are sharing a single process, why aren’t they sharing a single database connection?

Jest Ignores the Require Cache

In turns out that this project instantiates its database connection in a dependent, shared sub-module. The code that handles the instantiation looks something like this:


let connection = mongoose.createConnection(MONGO_URL, ...);

connection.on('open', () => console.log(`Connected to ${MONGO_URL}`));

module.exports.connection = connection;

Normally, due to how Node.js’ require and require.cache work, the first time this shared module is required anywhere in our project, this code would be executed and our database connection would be established. Any subsequent requires of our module would return the previously cached value of module.exports. The module’s code would not re-run, and additional database connections would not be opened.

Unfortunately, Jest doesn’t honor require.cache. This means that every test file blows away any previously cached modules, and any require calls that test file makes will re-evaluate the required module’s source. In our case, this re-evaulation creates a new database connection, which is the root of our problem.

Mocking a Module with the Real Thing

The Github issue I linked above hints at a solution to our problem.

If you’d like to set a module, you can add it to setupFiles in a way that does jest.mock('module-name', () => { return moduleContents }).

Christoph is suggesting that we add a file to our setupFiles Jest configuration, which we’ll call test/setup.js, and mock our shared module in that setup file:


const mockSharedModule = require('shared-module');
jest.mock('shared-module', () => mockSharedModule);

Unfortunately, this doesn’t solve our problem. The test/setup.js script runs before each test (emphasis is my own):

The path to a module that runs some code to configure or set up the testing framework before each test.

We need to find a way to require our shared module once, before all tests run.

Thankfully, we can do this by creating a custom Jest environment, and instructing Jest to use our new environment with the testEnvironment configuration option. We can require our shared module within our new environment, and mock any subsequent imports of our module to return a reference to the instance we just instantiated.

Unfortunately, we can’t set up that mock within our environment. We need to do that within our test setup file.

This means we need some way of passing the contents of our shared module from our custom environment into our test/setup.js. Strangely, the only way I’ve found to accomplish this is through the use of globals:


const NodeEnvironment = require('jest-environment-node');
const sharedModule = require('shared-module');

class CustomEnvironment extends NodeEnvironment {
    constructor(config) {
        super(config);
    }

    async setup() {
        await super.setup();
        this.global.__SHARED_MODULE__ = sharedModule;
    }

    async teardown() {
        await super.teardown();
    }

    runScript(script) {
        return super.runScript(script);
    }
}

module.exports = CustomEnvironment;

Most of this custom environment class is boilerplate required by Jest. The interesting pieces are where we require our shared module, and most importantly, when we assign its contents to the __SHARED_MODULE__ global variable:


this.global.__SHARED_MODULE__ = sharedModule;

Note that __SHARED_MODULE__ isn arbitrary name I chose to avoid collisions with other variables defined in the global namespace. There’s no magic going on in the naming.

Now, we can go back to test/setup.js and create a mock of our shared module that returns the contents of the global __SHARED_MODULE__:


jest.mock('shared-module', () => global.__SHARED_MODULE__);

And that’s all there is to it.

Our custom environment requires and evaluates our shared module once, instantiating a single database connection. The reference to the shared module’s contents is passed into our test setup script through a global variable. Our setup script mocks any future requires of our shared module to return the provided reference, rather than re-evaluating the module, creating a new database connection, and returning the new reference.

Whew.

In Hindsight

After much blood, swear, and tears, our test suite is once again consistently passing. Rejoice!

While this solution works, it highlights a fundamental problem. We’re not using Jest correctly. We came into this project with a preconceived notion of how testing and, by extension, our test framework should work. When the we learned more about our tools and realized that they didn’t work how we expected, we didn’t retrace our steps and reassess our strategy. Instead, we applied quite a bit of time and pressure to force our tools to behave as we expected.

While having the knowhow and ability to do this is great, actually doing it in your project isn’t recommended. Instead, see if you can use Jest in “The Jest Way”. If you cannot, maybe you shouldn’t be using Jest.

In hindsight, I should go have a talk with my team…