Thursday, June 17, 2010

Divide and Accumulate with Scala Actors

In Scala we can easily do things in parallel with actors; actors are easily started up, they are cheap, and they can easily send objects to each other
Here I'll look at how to use actors to parallelize in situations where we need to do some sort of calculation and aggregation over a collection of objects. E.g. given a collection of RSS or ATOM feeds pull down the list recents items from each one and build the aggregated list of items from all the feeds ordered by publication time. Doing that sequentially is sort of trivial, and involves waiting for a request to each feed one at a time. It would by nice to fire all those requests in parallel, then wait for them to return thier results and build the aggregated list as the results arrive. The following diagram shows how to do that with actors:

Let the program start an actor per feed. Let each actor go do a request for current items to the feed it is in charge of, and then send that list as an immutable object to the accumulating actor. The accumulating actor will maintain the aggregated list of feed items. When a new list of items is received from one of the other actors it merges that list into its aggregated list and at the end it has the full aggregated list.
Going a step further; why not have the accumulating actor send an immutable copy of the aggregated list to e.g. the view every tiume the aggregated list has been updated. That us allows us to show the user the partial results as they arrive.

Notice that the above did not involve any explicit starting of threads, no synchronization of threads, and no locking. Those things are handled behind the scenes by the actors and their message queues. Furthermore there is only mutable state in one place; inside the accumulating actor. Everything sent from one actor to another is immutable. So we've parallelized safely and easily.

The problem we're solving here is - admittedly - fairly easy to parallelize, but nonetheless I think the solution I described is nice because it is easy to understand and to code.