Generating Sequences with Elixir Streams

Written by Pete Corey on Dec 11, 2017.

It’s almost Christmas, which means the 2017 edition of Advent of Code is officially upon us! While studying other people’s solutions to the first few problems, Sasa Juric‏’s use of streams stood out to me.

The resource/3 and the closely related iterate/2 and unfold/2 functions in the Stream module were completely unknown to me and seemed worthy of some deeper study.

After spending time researching and banging out a few examples, I realized that these functions are extremely useful and underrated tools for generating complex enumerable sequences in Elixir.

Let’s dive into a few examples to find out why!

Simple Sequences

The resource/3, iterate/2, and unfold/2 functions in the Stream module are all designed to emit some new values based on previous information. While they all accomplish the same task, they each have their own nuances.

Stream.iterate/2 is best suited for generating simple sequences. We’ll define a “simple sequence” as a sequence who’s next value can entirely be determined by its preview value.

For example, we can implement a simple incrementing sequence:


Stream.iterate(0, &(&1 + 1))

Taking the first 5 values from this stream (|> Enum.take(5)) gives us [0, 1, 2, 3, 4].

We ramp the complexity up a notch and generate a Mandelbrot sequence for a given value of c:


defmodule Mandelbrot do
  def sequence({cr, ci}) do
    Stream.iterate({0, 0}, fn {r, i} -> 
      # z₂ = z₁² + c
      {(r * r - i * i) + cr, (i * r + r * i) + ci}
    end)
  end
end

The value of z is entirely determined by the value of z and some constant.

Sequences with Accumulators

There are times when the next value in a sequence can’t be generated with the previous value alone. Sometimes we need to use other information we’ve built up along the way to generate the next value in our sequence.

A perfect example of this type of sequence is the Fibonacci sequence. To generate the nth value in the Fibonacci sequence, we need to know the previous value, n-1, and also the value before that, n-2.

This is where Stream.unfold/2 shines. The unfold/2 function lets us build up an accumulator as we generate each value in our sequence.

Here’s an stream that generates the Fibonacci sequence:


Stream.unfold({0, 1}, fn {a, b} -> {a, {b, a + b}} end)

Stream.unfold/2 takes the initial value of our accumulator and a next_fun function. Our next_fun function returns a tuple who’s first element is the value emitted by our stream, and who’s second element is the accumulator that will be passed into the next call to next_fun.

Taking the first 10 elements of this stream gives us a familiar sequence of numbers:


[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

For a more complex example of the power of Stream.unfold/2, check out how I implemented a stream that generates an integer spiral as part of an Advent of Code challenge.

Multiple Values per Accumulator

Sometimes we’re faced with a true doozy of a sequence, and even an accumulator isn’t enough to accurately describe the next value. For example, what if we need to emit multiple sequential values for each invocation of our next_fun?

Stream.resource/3 to the rescue!

The Stream.resource/3 function was originally intended to consume external resources, hence its name and start_fun and after_fun functions, but it also works beautifully for building sufficiently complex sequences.

Wolfram-style cellular automata, such as Rule 30, is a perfect example of this type of complex sequence.

To generate our Rule 30 sequence, we’ll start with an initial list of values (usually just [1]). Every time we call our next_fun, we want to calculate the entire next layer in one go (which would be [1, 1, 1]). Unfortunately, the size of these layers continues to grow (the next would be [1, 1, 0, 0, 1], etc…).

How would we write a stream that outputs each cell of each layer in order, and tells us the cell’s value and depth?


defmodule Rule30 do
  def stream(values) do
    Stream.resource(fn -> {values, 0} end, &next_fun/1, fn _ -> :ok end)
  end

  defp next_fun({values, layer}) do
    next = ([0, 0] ++ values ++ [0, 0])
    |> Enum.chunk_every(3, 1, :discard)
    |> Enum.map(&rule/1)
    {values |> Enum.map(&{layer, &1}), {next, layer + 1}}
  end

  defp rule([1, 1, 1]), do: 0
  defp rule([1, 1, 0]), do: 0
  defp rule([1, 0, 1]), do: 0
  defp rule([1, 0, 0]), do: 1
  defp rule([0, 1, 1]), do: 1
  defp rule([0, 1, 0]), do: 1
  defp rule([0, 0, 1]), do: 1
  defp rule([0, 0, 0]), do: 0
end

Stream.resource/3 makes short work of this potentially difficult problem. We start with our initial set of values, and a depth of 0. Each call to our next_fun tacks on a few buffering 0s to each side of our values list, and applies the rule function to each chunk of 3 values. Once we’ve finished calculating our next layer, we emit each value grouped with the current depth along with our updated accumulator tuple.

We can take the first 9 values from this stream and notice that they match up perfectly with what we would expect from Rule 30:


Rule30.stream([1])
|> Enum.take(9)
|> IO.inspect
# [                {0, 1},
#          {1, 1}, {1, 1}, {1, 1}, 
#  {2, 1}, {2, 1}, {2, 0}, {2, 0}, {2, 1}]

Final Thoughts

While we’ve mostly talked about mathematical sequences here, these and techniques ideas apply to any type of enumerative data that needs to be generated.

In closing, here are a few rules to remember when working with Stream.iterate/2, Stream.unfold/2, and Stream.resource/3:

  • Use Stream.iterate/2 when you want to emit one value per call to your next_fun function, and that value can be entirely generated from the previously emitted value.
  • Use Stream.unfold/2 when you want to emit one value per call to your next_fun function, but need additional, accumulated data to generate that value.
  • Use Stream.resource/3 when you want to emit multiple values per call to your next_fun function.

I highly encourage you to check out Elixir streams if you haven’t. The Stream module is full of ridiculously useful tools to help streamline your data pipeline.

Fleshing out URLs with Elixir

Written by Pete Corey on Dec 11, 2017.

My most recent side project, Affiliate Crawler, is a web crawler designed to find links in your written content that can be turned into affiliate links.

If you’re curious, be sure to read more about the how, what, and why behind Affiliate Crawler.

Affiliate Crawler works by being given a starting URL that points to your content. It crawls that page, looking for external links that can potentially be monetized, and internal links to crawl further.

Unfortunately, the Elixir-based crawler that powers Affiliate Crawler only works correctly when given fully qualified URLs, such as http://www.east5th.co/. Partial URLs, like www.east5th.co result in parsing errors and a general “failure to crawl”.

This is unfortunate because most people deal exclusively in partially formed URLs. You tell people to go to google.com, not http://www.google.com/. This seems to be especially true when prompted for URLs online.

What we need is a way to infer fully fleshed out URLs from user-provided partial URLs, just like we naturally would in conversation.

Thankfully, this is ridiculously easy thanks to Elixir’s URI module, pattern matching, and the awesome power of recursion!

The Parts of a Parsed URL

Elixir’s URI.parse/1 function accepts a uri string as an argument and returns a URI struct that holds the component pieces that make up that uri.

For example, parsing "http://www.east5th.co/" returns the following data:


%URI{authority: "www.east5th.co", fragment: nil, host: "www.east5th.co",
 path: "/", port: 80, query: nil, scheme: "http", userinfo: nil}

URI.parse/1 only works on fully fleshed out URLs. Attempting to parse a partially specified uri string, like "east5th.co", returns a mostly empty struct:


%URI{authority: nil, fragment: nil, host: nil, path: "east5th.co", port: nil,
 query: nil, scheme: nil, userinfo: nil}

The major pieces of missing information needed to properly parse this uri are the path, and the scheme. Given those two pieces of information, everything else can be inferred by URI.parse/1.

Thankfully, we can come up with some fairly reasonable defaults for both path and scheme. If no path is provided, as in a uri like "http://www.east5th.co", we could assume a default of "/". If no scheme is specified, like in "www.east5th.co/, we could assume a default of http.

But how do we know when and where to add these default values to our uri string?

Always prepending "http://" and appending "/" leads to obvious problems in most cases. Using a regex-based check sounds fragile and error-prone.

There has to be some other way.

Pattern Matching and Recursion

In turns out that pattern matching makes it incredibly easy to know wether we need to provide a default path or scheme. After our first pass through URI.parse/1, we can look for nil values in either path or scheme:


case URI.parse(uri) do
  %URI{scheme: nil} ->
    # Needs default scheme
  %URI{path: nil} ->
    # Needs default path
  %URI{} ->
    # Successfully parsed
end

But pattern matching alone doesn’t give us our solution…

But what if uri needs both a scheme and path, as in the case of "www.east5th.co"? And wouldn’t we still need to re-parse the newly fleshed out uri to populate the inferred fields in the URI struct like port, and authority?

Thankfully, wrapping our URI.parse/1 call in a function and calling that function recursively elegantly solves both problems:


defp parse(uri) do
  case URI.parse(uri) do
    %URI{scheme: nil} ->
      parse("http://#{url}")
    %URI{path: nil} ->
      parse("#{url}/")
    parsed ->
      parsed
  end
end

We define a function called parse and within that function we parse the provided uri string with a call to URI.parse/1.

If URI.parse/1 returns a URI struct without a scheme, we recursively call parse with "http://" prepended to the current uri.

Similarly, if URI.parse/1 returns a URI struct without a path, we recursively call parse with "/" appended to the current uri string.

Otherwise, we return the parsed URI struct generated from our fully-qualified URL.


Depending on your personal preference, we could even write our parse/1 function with our pattern matching spread across multiple function heads:


def parse(uri) when is_binary(uri), do: parse(URI.parse(uri))
def parse(uri = %URI{scheme: nil}), do: parse("http://#{to_string(uri)}")
def parse(uri = %URI{path: nil}),   do: parse("#{to_string(uri)}/")
def parse(uri),                     do: uri

Pattern matching and recursion, regardless of your preferred style, can lead to some truly beautiful code.

Final Thoughts

I’m continually amazed by the tools offered by the Elixir programming language straight out of the box. Pattern matching, a recursion-first philosophy, and even the incredibly useful URI module are examples of techniques and features that make your day-to-day development life easier when working this this language.

If you want to see how I used these tools to build fully-qualified URLs from partial user input, check out the Affiliate Crawler source code on Github, and check it out in action at Affiliate Crawler.

Crawling for Cash with Affiliate Crawler

Written by Pete Corey on Nov 20, 2017.

Several weeks ago, I released a monster of an article titled Learning to Crawl - Building a Bare Bones Web Crawler with Elixir. I mentioned early on in that post that I was working on a small side-project that involved web crawling.

After a weekend of furious coding, the side project I mysteriously alluded to is now ready to be released to the world!

Without further ado, check out Affiliate Crawler!

Affiliate Crawler is a tool designed to help bloggers and content creators monetize their writing through affiliate and referral links. You give the tool a starting URL and it crawls through your website (following internal links), looking for links to external products and services that can be monetized through affiliate or referral programs (Amazon’s Affiliate Program, etc…).

The project largely came out of my search for inoffensive ways of monetizing the hundreds of articles I’ve produced over the years (without making me feel sleazy). Monetization through affiliate and referral linking seems like the best solution.

After spending hours researching affiliate programs and manually grepping through my posts for monetizable links, I realized that there might be value in automating that process. And with that blast of inspiration, Affiliate Crawler was born!

Affiliate Crawler run against this blog.

As I mentioned, the first version of Affiliate Crawler only took a weekend to build. That short turnaround time was intentional. After spending nearly six months on my last project, Inject Detect, I’ve learned the value of validating ideas before pouring your soul into their development.

That said, I’m starting small with a limited feature set and a small collection of affiliate and referral programs. My goal with this release is to see if there’s any potential here. If you’re curious about the nuts and bolts behind the project, I’ve opened sourced the entire thing.

Are you a blogger or a writer? Have you explored affiliate programs as a source of potential revenue? Do you think this type of tool valuable? Let me know! I’d love to hear from you.