Sunday, 23 October 2011

The costs of concurrency and the actor model

I was interested to understand more about how actors are implemented in Erlang. In an old but still relevant article, an argument is raised:
These days every third grader knows that processes are expensive. Ok, so we're not using real processes, we're using threads, but if you constantly create, destroy, and switch between thousands of threads you'll bring any system to a crawl. Threads are expensive for a number of reasons. They take a long time and a lot of memory to create because the operating system needs to set up many things (the stack, internal data structures, etc.) for them to work. They take a long time to destroy because all the resources they consumed need to be freed. And they take a long time to switch between because unloading registers, storing them, loading them back, and flipping the stacks is complicated business.
So how expensive it really is to create a Thread in java these days?
Peter Lawrey, writer of the excellent Vanilla Java blog, tell us that roughly, creating a Thread takes
about 70μ (microseconds)
, which is fairly expensive, but not prohibitive. The size of the stack can be configured by command line argument to the JVM, with a default of 512kb. According to some Mac OS X 10.5 docs, a thread takes about 90μ, which is kind of validates his results.

As it was pointed out in the StackOverflow discussion
Creating threads is expensive if you're planning on firing 2000 threads per second, every second of your runtime. The JVM is not designed to handle that. If you'll have a couple of stable workers that won't be fired and killed over and over, relax.
This is slightly different in Erlang, where everything is a Process, so the model leads you to that situation.

Essentially, Erlang uses a kind of lightweight processes, akin to green threads (VM scheduled pseudo-threads as opposed to real threads). Since they only consist of a mailbox implemented as a queue, it takes about 300 words to build one, meaning we can create thousands, even millions of them, so the only constrain is memory.

I read a great post on Erlang lightweight-process actor concurrency and it's inherent problems in terms of message passing costs, i.e., having to serialize the entire message between processes, even within the same node. The lack of zero-copy messaging between actors, even within the same node, can really hurt performance. It also talks a great deal on how the BEAM VM and HiPE JIT compiler aren't anywhere near as good as the industry leader JVM (or his favoured Azul Systems "pauseless" VM).

On that basis, after years of hardcore work on Erlang, he decants for Erjang (using Java's Killim actors) or Clojure, possibly
due to been more functional and lisp syntax. I wonder why Scala wasn't mentioned? Akka
offers the same principles of Erlang actors, but implemented in a much more robust way and running on top of the JVM.

Scala actors, particularly Akka actors, use configurableexecutors that efficiently reuse threads (and in Akka 2.0, this will be declaratively configured). Thus, i can see Akka
as supersiding Erlang, an the heir of the throne in actors concurrency.

No comments:

Post a Comment