Down the RabbitMQ hole

I tried to get this article started 3 times, failing every time. The reason, I think, is I wanted to be generically helpful. Then I realized that maybe I didn’t solve enough world problems for such a greater aim.
Let’s do something better. Let’s talk about the real world.

I was working on a big project of mine and everything was kinda working well. I had various agents with different peculiarities and technologies in need of talking to each other and receiving commands from other sources.
To have them talking, my logical, first approach was “let’s have them receive and send web service calls” and this is exactly what I started doing, but then my requirements (and common sense) hit me like a shovel on the nape.

  • Those web services will need to be rational, properly standardized according to a stable plan. If I don’t, it’s going to be a mess in a few weeks
  • Every message sent and received is extremely important. All the deliver/status/redeliver needs to be handled
  • Every process is important. I need to make sure that even once the message is delivered, it is processed completely
  • These agents cannot simply process every message the system throws at them. I need to make the system able to queue the messages for processing, according to the system capabilities
  • Multiple agents of the same type need to be deployed, and the system should be able to load balance them
  • Some messages should be issued once, but delivered to a number of consumers that could be interested in it
  • Some messages are just triggers, some others are actual synchronous RPC invocations
  • As part of the agents will end up in alien machines, it’d be great if they could initiate the connection to a trusted rack, rather than being open to connections
  • Some of the agents that need message delivered do not use the same technology used by the rest of the platform

This pile of things opened my eyes on an astonishing revelation:

Most of my worries relate to the code I write for handling messages between software components

As I learned very quickly, my needs are definitely nothing special, so if something worries me it probably worries at least one million developers out there. I swallowed my ego and the usual do-it-yourself attitude and started looking.

And the solution came out by looking into the number of software / frameworks I often use, most of which are made by or somehow related to Spring.
The answer is RabbitMQ, by Pivotal.

But before we get into the details…

Message brokers

Let’s make clear RabbitMQ is not a mysterious rare bird, but a message broker, a software that does -at a high level- two things:

  • Accept messages from agents written in any supported technology and deliver messages to agents written in any supported technology
  • Queue, organize, balance, manage delivery, redelivery and failure to delivery of messages

It is pretty obvious, at least considering what we’ve said so far, that this software would drastically replace a lot of code you’re normally used to write yourself, and as in every time you start considering to rely on someone else’s code, the first thing that comes up into your mind is “Oh shit, let’s hope it’s not some Sunday morning project”, which often

leads to: “If, long term, I don’t like it anymore, or the project fails, how am I going to replace it?”. And this is probably one of the most interesting parts of the story: AMQP.
I hate acronyms, seriously, but this one is sweeter than most: Advanced Message Queuing Protocol.
Which is, in fewer words than needed, a standard describing how a message broker middleware software should behave both from the behavioral perspective (how queuing should happen) and the protocol perspective (how the data should travel on the wire).

Every message broker that adheres to AMQP, can substantially be replaced with another one with a reasonable amount of adjustments. I’m not saying you shouldn’t put great care when evaluating a software that is going to take such a relevant role in your project, but at least you’re not venturing into the unknown.

Features

Now that we know what message brokers are, let’s see how they can help us in our requirements.

  • Point-to-point accept/delivery: the message is accepted in a queue and delivered to a consumer that is enabled to receive messages from that specific queue. There can be one consumer or many, but the message is delivered to just one. This is a pretty common use when you have multiple consumers of the very same kind to distribute the load among multiple nodes.
  • Publish-subscribe: the message is accepted in a queue and delivered to any consumer that subscribed to that queue. This strategy is a win when a software component triggers an event that can be interesting to a number of consumers. A good example is a system that receives documents and once the document is in, the event has to be notified both via email and SMS. The Email and SMS agents will subscribe to the same queue and receive the very same message. Moreover, using the routing functionality, you can have some agents being interested in messages flagged in a certain way, others in messages flagged in another way, or all of them.
  • Auto/Explicit acknowledge: the message broker considers a delivery succeeded when it receives an acknowledge response. Some messages are just notifications of events you want an agent to take action for. Some others are vital messages you want make sure they get fully processed. In the first scenario, your attention should focus on the certainty that the message successfully reached the agent and no network failures broke the message in transit. This is done via the auto-acknowledge routine that automatically sends the response once the message is in the agent.
    In the second scenario, you want the agent to reply the message broker once it succeeded in doing what it had to do. This moves a part of failure detection outside the agent code itself. Traditionally, what you would do is writing all the nice code that takes failure in consideration and if anything goes wrong “do something”. But a failure can also be unpredictable either in the place where it happens, or related to a system failure that is beyond code control (faulty memory, unstable state, out of memory process etc.).
    Moving part of the problem somewhere else could be a great opportunity to avoid even thinking on how to react to the worst failures. Explicit acknowledge won’t allow the message broker to consider the delivery done until an explicit acknowledge message has been returned by the agent. You can decide where this response is sent depending on your code, but a common strategy is triggering this event once the processing of the message has actually finished. If the processing fails for some reason and you are able the catch that event, then you will be sending a not-acknowledge, so that the message brokers will know the delivery failed. If it’s impossible for you to catch the failure, then the message broker will wait a certain amount of time and eventually decide the delivery failed.
  • Redeliver: in relation to what we just said, the message broker can be instructed to redeliver a message if a previous delivery was not properly acknowledged. Of course if a failure was caused by an unrepairable failure of an agent, then redelivery to the same agent won’t do any good, but in a scenario where agents are replicated, then there’s a good chance for the redelivery to succeed.
  • RPC: by using temporary queues and exchanges, it is possible to use the message broker as dispatcher for RPC calls. This is not exactly what message brokers have been designed for, but it’s impossible not to notice how this can be easily achieved. The invoker sends a message notifying what’s the name of the temporary queue it will wait a reply from. The remote agent will execute the call and respond in that queue that will eventually be deleted.

All the requirements are met.

Nature of the message

Messages can be anything, the message broker will not investigate what they’re made of. As long as bytes go through the wire, it’s perfectly fine.
Depending on the type of system you’re designing, though, the nature of the messages is something you want to study ahead of time. If you’re in a very homogeneous environment, such as a Java only context, there’s no reason why you shouldn’t binary-serialize your objects to simplify the code both from the producer and the consumer.
On the contrary, if your system is heterogeneous or might be in a realistic future, then you might consider working with universal formats, such as JSON.

RabbitMQ

RabbitMQAmong a number of very valuable message brokers, RabbitMQ definitely made it through my software selection.
The remarkable reputation that make it a relevant component of various softwares, the highly technological company backing its development, and the vast user base, are a good ID card of the software, but there are other things which really impressed me.
Its technology, entirely based on Erlang, is a clear statement of purpose: high availability by design.
It’s simplicity of deployment both for development and production environments, massively convention-over-configuration, makes you start quick and gives your ops a chance to learn new stuff when needed.

Even just these facts convinced me to take the risk to give it a try, not just as an experiment, but all the way to production, and let me say this: it definitely won the proof of time.

After a very quick deployment through the repository of the Linux distro, the application was up and running, and substantially that was it, the agents were gracefully able to connect and communicate with the message broker.

RabbitMQ also has a plug-in system, and it didn’t take long for me to realize how much I wanted to install the RabbitMQ Management plugin.
This plug-in adds a small web server that allows you to see exactly what’s going on in the message broker. Message deliveries, acknowledges, queues, messages awaiting delivery and confirmation, but connections and channels. Everything is right there, and especially during the exploration phase, this is extremely important because you will quickly find yourself with messages stuck or deadly redelivers. Learning to avoid getting stuck is pretty straight forward, but still this plugin gives you a great help from the software selection to the actual production environment.

The plug-in system has a number of interesting items you might be interested in looking into. One of them, that I think it’s pretty cool is an implementation of STOMP, a text streaming protocol, popular and widely supported. Applied to RabbitMQ, it allows us to expose most queues functionalities to external clients over the Internet with a finer control and security than you would have by exposing the AMQP sockets themselves.

Of course, as you would expect from such a system, RabbitMQ has clustering capabilities. Now, contrary to many scalable software components you might have used in the past, RabbitMQ has a number of options which work well in specific scenarios, so don’t rush it and read the documentation carefully.
I didn’t go through all the possible variations because I really didn’t need that much. In my case, basic clustering in a LAN allowed me to do everything, and it’s definitely a piece of cake to set up. I’m pretty sure the federation plugin (required when you need to cluster over a WAN) will require a bit more work, but I do believe these functionalities follow the same RabbitMQ principles.

Last, but not least in any possible way: reliability.
It’s all eye candy when you’re developing, but production is all another story. You can be rational and clearly evaluate how respectable the software is and how Erlang is beautiful in this kind of product. But at the end of the day, you need to see it with your very own eyes to start sleeping well the night.
For this reason, my words won’t help you much, but I’ll give it a try. RabbitMQ is absolutely exceptional in reliability and stability.
I had the luck to talk to two software engineers in different companies using RabbitMQ for large volume of critical messages and they all agreed with my impressions:
“It’s incredible how you drop it in and you just forget about it” Francesco said. I trust Francesco, and you should trust him as well.

Conclusions

I’m in that phase everybody doing this job periodically go through: how could I live without technology-name-goes-here.

How could I live without RabbitMQ? I probably can’t answer this question right now because I really consider it critical for my work.
Rationally speaking, it’s not what it does, but how it does it and that sense of safety and stability that eventually had me sleeping well again.  If you’re dealing with enterprise software with asynchronous tasks going in an out multiple agents, I do believe you should consider a message broker, and RabbitMQ is definitely my advice.

First steps to the Actor model with Scala and Akka

This article assumes you’ve already read: “Simplifying efficiency – The actor model”

Scala programming language has been designed with the actor model in mind and the Akka toolkit brought it to the stratosphere.

Akka can do its job both for Java and Scala, but given the amazing integration with Scala and the fact I ideally like Scala more than Java (even though I’m not that expert in it), I thought it would have been more enjoyable and intriguing to go for it.
Before we get started, mind that Scala has its own actor model implementation, but starting with Scala 2.10 it has been deprecated in favor of Akka.

This article is not meant to be a guide to Akka, but a way to go through the basics of the main features and let you evaluate what it can do for you.

Hello world

We start real quick and create our hello world actor!

class HelloWorldActor extends Actor{

  def receive = {
    case "greetings" =>
      println("hello world!");
  }
}

Let’s look into it, shall we?
After declaring the class as a descendant of Actor, we implement a “receive” closure which verifies whether the message is equal to “greetings”. If it is, then it prints “hello world”. The “Actor” suffix in the class name is just a convention of mine and is not required.

This is how we make it work:

object Main extends App {
   val actorSystem : ActorSystem = ActorSystem()
   val hwa = actorSystem.actorOf(Props[HelloWorldActor])
   hwa ! "greetings"
   Thread.sleep(1000)
   actorSystem.shutdown()
}
  • Line 2 : initialization of the actor system
  • Line 3 : this is an interesting part. We ask the actor system to instantiate an actor and obtain an ActorRef. You won’t have access to the instance itself, but to a reference that will only allow you to send messages to it, which might look a bit obnoxious at first, but you can clearly understand how this is so very necessary: if you can mess it up, you probably will. Enforcing the messaging system as the sole system of communication is for your own safety. Props is a configuration wrapper, you can read about it in the documentation. This initialization does not only create a reference for you, but also a system wide reference to it.
  • Line 4 : finally, we can send a message to the actor by using this glorious syntax
  • Line 5 : cheap and dirty. As the message is asynchronous, there’s a consistent possibility that the main thread could end before the message is handled
  • Line 6 : system shutdown

The synchronous asynchrony

As I stated in my introductory article, an actor instance is not a thread but a worker and performs one operation at a time. To demonstrate this fact, let’s modify our program like this.

//main
object Main extends App {

	 val actorSystem : ActorSystem = ActorSystem()
	 val hwa = actorSystem.actorOf(Props[HelloWorldActor])
	 for ( i <- 0 until 10)
           hwa ! "greetings "+i
	 Thread.sleep(1000)
	 actorSystem.shutdown()
}


//HelloWorldActor
class HelloWorldActor extends Actor{ def receive = {
  case message : String =>
      if(message.startsWith("greetings")){
	      Thread.sleep(new Random().nextInt(100))
	      val reply : String = message.replace("greetings","hello world")
	      println(reply);
      }
  }
}

On line 17, a random pause will introduce delays to make the whole thing more real.
Even though the messages are actually dispatched asynchronously, the worker is one and will perform one operation at a time pulling tasks from a FIFO queue where the messages are in the receive order. So all you have achieved so far is a worker that is asynchronous from the calling thread.

hello world 0
hello world 1
hello world 2
hello world 3
hello world 4
hello world 5
hello world 6
hello world 7
hello world 8
hello world 9

Therefore, as you might have guessed…

object Main extends App {

	 val actorSystem : ActorSystem = ActorSystem()
	 val hwa1 = actorSystem.actorOf(Props[HelloWorldActor])
	 val hwa2 = actorSystem.actorOf(Props[HelloWorldActor])
	 for ( i <- 0 until 10) {
		 if(i%2==0)
			 hwa1 ! "greetings "+i
			else
			  hwa2 ! "greetings "+i
	 }

	 Thread.sleep(1000)
	 actorSystem.shutdown()

}

Will produce sort of an unpredictable order, since two workers will be able to perform a task at the same time.

hello world 1
hello world 0
hello world 3
hello world 2
hello world 4
hello world 5
hello world 6
hello world 8
hello world 7
hello world 9

But actors are also very good at talking to each other (it actually is what they do all the time) and we want to see it in action. Welcome the HiThereActor

class HiThereActor extends Actor{
  def receive = {
    case "hello world" =>
    	println ("hi there")
  }
}

Our main:

object Main extends App {

	 val actorSystem : ActorSystem = ActorSystem()
	 val hwa1 = actorSystem.actorOf(Props[HelloWorldActor])
	 val hwa2 = actorSystem.actorOf(Props[HelloWorldActor])

	 val hta = actorSystem.actorOf(Props[HiThereActor], "hithere")

	 for ( i <- 0 until 10) {
		 if(i%2==0)
			 hwa1 ! "greetings "+i
			else
			  hwa2 ! "greetings "+i
	 }

	 Thread.sleep(1000)
	 actorSystem.shutdown()

}

We instantiate the HiThereActor. Now, you might be asking yourself: how do I have the HelloWorldActor talk to the HiThereActor? Do I need to store its reference somewhere? You can, and in a simple application it would be considered completely legit, but let’s invest some time talking about an interesting alternative.
When I instantiated the HiThereActor I also passed a string Akka uses as a “name”. That name can be used as a fragment of a path that allows me to retrieve that reference:

class HelloWorldActor extends Actor{
  def receive = {
    case message : String =>
      if(message.startsWith("greetings")){
	      Thread.sleep(new Random().nextInt(100))
	      val reply : String = message.replace("greetings","hello world")
	      println(reply);
	      val other = context.actorSelection(ActorPath.fromString("akka://default/user/hithere"))
	      other ! "hello world";
      }
  }

When it’s time to talk to the other actor, I provide that “path” that allows me to retrieve the reference to that specific instance. Eventually, I can send that message and make it all work beautifully.

hello world 1
hi there
hello world 0
hi there
hello world 2
hi there
hello world 3
hi there
hello world 4
hi there
hello world 6
hi there
hello world 8
hi there
hello world 5
hi there
hello world 7
hi there
hello world 9
hi there

Thanks to my “outstanding” charting skills, I can show you what is actually happening.

actor model example #1

Some considerations:

  • You can tell when a message is sent, but you can’t tell when it’s going to be handled
  • If an actor is busy performing a task, the message will stay in the mailbox until it’s done
  • The more the actors of a certain type, the more messages are going to be handled in parallel, but this number is perfectly finite and every actor is clearly identifiable

The fact that an actor is consistently deployed and can perform one operation at a time, gives us access to its stateful capabilities, so I wouldn’t find anything weird if you wanted to store the state of your actor and write a functionality to provide it to another actor:

object HelloWorldActor{

  case class Status
}

class HelloWorldActor extends Actor{

  var counter : Int = 0
  var lastGreeting : String = null

  def receive = {
    case message : String =>
      if(message.startsWith("greetings")){
    	  counter+=1;
    	  lastGreeting = message;
	      Thread.sleep(new Random().nextInt(100))
	      val reply : String = message.replace("greetings","hello world")
	      println(reply);
	      val other = context.actorSelection(ActorPath.fromString("akka://default/user/hithere"))
	      other ! "hello world";
      }
    case HelloWorldActor.Status =>
      val status = Map(("counter",counter), ("lastGreeting",lastGreeting))
      sender() ! status
  }
}

In the “object” section we’re declaring a class that can be used as a message. It is a handy way to identify messages designed to trigger a specific event.
Right before the receive closure, I declared two local variables and in the “message” case, I updated them. Since this method is going to run once at a time by design, there’s no race condition at all.
In line 22, we therefore created a new case (representing a request for a status update) and in line 23, we craft a Map to be used as a message. Finally, in line 24, I send the message to the sender, that is the actor who sent the Status “request” to this actor.
What’s left to implement is a checker actor that asks HelloWorldActors about their status, not included in this article, but you already have all the data to build one yourself.

Routers

Hopefully, the simple hello world triggered some thinking in your brain about how you could apply this approach to a number of situations you handled differently.
On the other hand, it is also obvious this is not the end of the story, otherwise it wouldn’t be that great deal at all.
What we’re going to talk about is something you already puked on while reading the previous example: balancing messages between actors of the same type.
Of course there’s a neat solution to this problem, but not all implementations are that smooth.
Fortunately, we’re working with Akka, which provides us a number of solutions, based on how complex the scenario is.
Here’s an easy one

object Main extends App {
	 val actorSystem : ActorSystem = ActorSystem()
	 val router = actorSystem.actorOf(Props[HelloWorldActor].withRouter(RoundRobinRouter(nrOfInstances = 2)))
	 val hta = actorSystem.actorOf(Props[HiThereActor],"hithere")
	 for ( i <- 0 until 10)
		router ! "greetings "+i
	 Thread.sleep(1000)
	 actorSystem.shutdown()
}

In line 3, we state we want to create a router that will take care of 2 HelloWorldActor instances, and it will distribute the messages in round robin.
In the loop, we simply send our message to the router and obtain the expected behavior, without really knowing anything about the HelloWorldActor instances. Yet the actor instances are constantly and consistently 2, stateful, performing tasks linearly.
Needless to say RoundRobinRouter is just one of the options. Other implementations are already in the box, such as SmallestMailboxRouter, and of course you can create yours, based on your needs. After all, as you may have guessed, routers are (special) actors as well.

Failure and recovery

What happens to an actor when it fails? In the previous article, we mentioned how letting an actor fail rather than blindly try-catching it is a good thing. But of course, you can’t simply ignore failure, so how are you going to handle it, really?
Let’s get back to the example “Routers” example, and edit the HelloWorldActor as follows:

class HelloWorldActor extends Actor{

  def receive = {
    case message : String =>
      if(message.startsWith("greetings")){
    	  val i = 1/0
	      Thread.sleep(new Random().nextInt(100))
	      val reply : String = message.replace("greetings","hello world")
	      println(reply);
      }
  }
}

If you’re not one of those believing math is a New World Order plot against free thinkers, you can clearly tell this is going to blow up, but what you are going to get is probably different from what you expect.
In your console, a number of: “java.lang.ArithmeticException: / by zero” will show up. How many? It depends, not one, nor 10. In my case, I had 5.
Ok, let’s throw in some more mystery, by changing the “main” loop like this:

	 for ( i <- 0 until 10) {
	   router ! "greetings "+i
	   Thread.sleep(100)
	 }

And you just got 10. This is the Holy Grail of the WTF.
Long story short:

  • Actors are supervised by supervisors which decide what to do when an actor fails
  • Every actor is responsible for supervising its child actors, so this is where the supervision strategy is declared
  • By default, actors have a supervisor that will reinitialize and restart a failing child actor
  • When an actor blows up and gets reinitialized, its mailbox is lost
  • If you send messages to an actor very fast, they will stack up in the mailbox as the actor is executing the first task, so when the actor fails the mailbox will not be empty
  • By slowing down the message rate, the actor will fail before the messages start stacking up

Now, this example might look a bit stupid (because it fails every time) but tells us a lot about how failure recovery is a key topic. Why the actors are restarting by default and not simply resuming operations? The reason is pretty simple: as actors are stateful, you can’t really tell a priori if the failure is caused by the state of the actor itself, so it’s safer to restart it from scratch. In a previous example, we used the actor state to store some status information, but what if the state was a persistent connection to a database?
This is not necessarily true for you, as there are a lot of things you might want to do in case of failure, maybe based on the type of failure.
Back to our example, we might be interested in having the router (parent of the HelloWorldActor) to adopt a different strategy, such as “Resume”, which means “never mind, keep going”.

So here’s how to declare this:

object Main extends App {
     
     val actorSystem : ActorSystem = ActorSystem()
     val supervisorStrategy = OneForOneStrategy(){
      case _:ArithmeticException  =>
        SupervisorStrategy.Resume;
      case _:Exception =>
        SupervisorStrategy.Stop;
    }
     val router = actorSystem.actorOf(Props[HelloWorldActor].withRouter(RoundRobinRouter(nrOfInstances = 2,supervisorStrategy = supervisorStrategy)))
     val hta = actorSystem.actorOf(Props[HiThereActor], "hithere")
    
     for ( i <- 0 until 10)
         router ! "greetings "+i
    
     Thread.sleep(5000)
     actorSystem.shutdown()
    
}

As we’re not implementing the router itself, we’re passing the supervisor strategy as a parameter. In this case, we’re basically saying:

  • This strategy has to be applied to each child actor individually. In opposition to the OneForOneStrategy, the AllForOneStrategy takes action on all child actors.
  • In case the error is an ArithmeticException, don’t worry and resume operations, which means “we’re pretty sure the error is not related to the actor state”.
  • In case the error is any other exception, restart the actor.

Now, since our example throws ArithmeticExceptions, we now expect the actors not to be restarted and keep trying for all the messages it has in the queue.

Of course in this example we have the router supervise its child actors, but when you implement an actor yourself that has child actors, you can simply override the default supervisor inside the actor by writing:

override val supervisorStrategy =
  OneForOneStrategy(maxNrOfRetries = 10, withinTimeRange = 1 minute) {
    case _: ArithmeticException      => Resume
    case _: NullPointerException     => Restart
    case _: IllegalArgumentException => Stop
    case _: Exception                => Escalate
  }

Now this snippet shows 2 more features I do believe you can study by yourself. Just a hint so you know where to start:

  • the actor can be restarted 10 times in a minute before it’s considered legally dead
  • If the failure is a generic exception, escalate the supervision of the event to the parent actor, higher in the chain of responsibility (and maybe it’ll know what to do…)

Remoting

The awesomeness parade isn’t over yet.
In Akka, actors are not necessarily meant to be in the very same process, not even in the same server! So rather than simply instantiating and using your actors, you can transform the process in a server, awaiting messages for its actors.
And guess what, it’s pretty easy to do. First thing you need to do, is downloading and employing Akka Remote library which adds the feature we need.
To accomplish this, we are going to use another interesting feature of Akka: a mighty configuration file where you can configure the platform behavior and preconfigure certain components. All the magic happens in the application.conf file that has to be in the program classpath.
Now, there’s a big number of ways to work with remoting. You can remotely deploy actors, balance, distribute etc. Honestly, I’m pretty sure you can start discovering it yourself once you have a basic working example, and that’s exactly what we’re going to do.

  • Two programs, CX and SX
  • CX contains the HiThereActor
  • SX contains the HelloWorldActor
  • CX sends a message to the HelloWorldActor in SX
  • HelloWorldActor sends a message to the HiThereActor in CX

SX

//Main
object Main extends App {
	val actorSystem : ActorSystem = ActorSystem()
	actorSystem.actorOf(Props[HelloWorldActor],"hello")
}

//HelloWorldActor
class HelloWorldActor extends Actor{

  def receive = {
    case message : String =>
      if(message.startsWith("greetings")){
	      Thread.sleep(new Random().nextInt(100))
	      val reply : String = message.replace("greetings","hello world")
	      println(reply);
	      val other = context.actorSelection(ActorPath.fromString("akka.tcp://default@127.0.0.1:2553/user/hi"))
	      other ! "hello world";
      }
  }
}

//application.conf
akka {
  actor.provider = "akka.remote.RemoteActorRefProvider"
  remote {
    enabled-transports = ["akka.remote.netty.tcp"]
    netty.tcp {
      hostname = "127.0.0.1"
      port = 2552
    }
  }
}

CX

//Main
object Main extends App{
  val actorSystem : ActorSystem = ActorSystem()
  val hello = actorSystem.actorSelection("akka.tcp://default@127.0.0.1:2552/user/hello")
  val hithere = actorSystem.actorOf(Props[HiThereActor], name="hi")
  hello ! "greetings 1"
}

//HiThereActor
class HiThereActor extends Actor{
 def receive = {
  case "hello world" => println ("hi there")
 }
}

//application.conf
akka {
 actor.provider = "akka.remote.RemoteActorRefProvider"
 remote {
  enabled-transports = ["akka.remote.netty.tcp"]
  netty.tcp {
   hostname = "127.0.0.1"
   port = 2553
  }
 }
} 

When SX starts, a HelloWorldActor is initialized with “hello” as name. The configuration says the server is going to listen on port 2552. The process is not going to stop once the App code is executed, instead it’s going to stay there and join the conversation.
When CX starts, a HiThereActor is initialized with “hi” as name. Just like SX, the server is going to stay on and listen on port 2553. The App code eventually retrieves a reference to the actor known as “hello”, explicitly pointing out its network location. Once the reference is there, it sends the “greeting 1” message.
The message is accepted by the SX server which will print “hello world 1”, obtain a reference to the actor known as “hi” in the CX server and send a message back.
Needless to say that the CX console will eventually print “hi there”.

In this simple example, you can see the easiest (and least flexible) way to invoke a remote actor, which is, by the way, the 10% of what this thing is capable of. Of course explicitly referencing actor names and IP addresses right in the code is not advisable, but it gives you the idea.
Routing messages over a pool of distributed actors in a large scale cluster, deploying actors remotely, managing failure and recovery are just a glimpse of the other topics you will need to work on to fully master this beast.
Yet there really is nothing difficult in unleashing the other capabilities. Want to load balance between multiple actors of the same type in different servers?

...
actor{
  	provider = "akka.remote.RemoteActorRefProvider"
  	deployment {
		/hwGroup {
			router = round-robin-group
			routees.paths = [ "akka.tcp://default@serverone:2552/user/hello1","akka.tcp://default@serverone:2552/user/hello2", "akka.tcp://default@servertwo:2552/user/hello1" ]
		}
	}
  }
...

Conclusion

As the picture on how the actor model can help you becomes clearer, you should become more aware of what to look for in a good implementation.
Whether you’re a Java guru or a Scala “activist”, Akka represents a great opportunity of writing tidy code. It’s elegant, consistent and productive at the same time, three good qualities that demonstrate a sharp vision and a solid project.
Since there still so much to talk about and the topic is very vivid in my head, I might get back at it and talk about some specific scenarios.

In the meantime, take care.

Simplifying efficiency – The actor model

When you work for the same company for some time, you might end up reviewing code you wrote years ago. The feeling is always shocking. Imprudent, brave, naive, it’s like watching the first season of “The Simpsons” again.

Among the embarrassing things you find in your “old” code, over-engineering is possibly one of the most dramatic, because it might compromise future years of work. It draws your attention from the actual task to a endless number of technicalities, unexplainable maintenance and constant improvements not really improving anything. It’s drowning in poo, literally.

When project complexity and will to reinvent the wheel love each other very much, over-engineering is born. Every technique, theory or library that can stop your team to bring chaos in your project is more than welcome.

The problem

First off, I’m not going to talk about this topic scientifically. I’m no scientist, I’m a software artisan, so all you’re going to read is about how certain solutions solved specific problems for me.
If your plan is to comment something like “hey that is partly wrong if you apply to a functional context!” or “this is far from being the original concept formulated in 1973!” keep in mind that: a) you’re not helping anyone by doing so b) you can pretty much go fuck yourself.

Our objective is standardizing a number of tasks that are extensive part of our everyday life in server side programming. If not properly approached, these tasks can be the source of an exponential growth of stacked, generational, over-engineered code.

  • Modularization. Dividing the project in smaller semi-autonomous tasks and boxing them accordingly is vital for simplicity, robustness and testability. If you don’t, then you’re in a big problem, but are you evaluating the most rational criteria?
  • Interfacing. Deciding how these boxes will interact with each other is something you feel the urge to do quick, when you start assembling your components. If modularization is done properly, then your component has few entry points, but this decision is more critical than it seems. Refactoring and extending those interfaces over and over becomes a big problem as they start getting used in multiple locations.
  • Parallelism. Let’s face it, this is something you can rarely avoid in modern server side programming. And even though it might look like a work you can delay, remember that parallelization requires you to code in a certain way. If you take it as an “add on” you will need to re-engineer your code to make it happen. Moreover, going parallel is not just thread spawning; controlling the flow, the number of parallel events and their status cannot be ignored.
  • Failure recovery. Assuming you’re doing all the above pretty well, there still is one thing that can drive you nuts: how to recover from a failure. Now, try..catch might sound as an answer as long as the failure is limited to a “boom” is a stateless piece of code, but what happens if you need to rollback a complex status, or the failure is related to the current status of an object? You will need to revert data changes and reinitialize your code. And how will you notice that something bad happened? Are you going to wait for a terrifying customer call?

Of course these four items are just a glimpse of the great challenges in software development, but they represent very well how a software development model can avoid the mess we often introduce without even knowing.

A solution

The actor model is a concurrent programming strategy where the fundamental computational unit is -indeed- the actor.

An actor encapsulates the necessary code to perform a task, runs as an independent worker from the calling thread, and is triggered by a message.
Actors are often stateful and won’t work on a second task ’till the completion of a previous operation. If you trigger it as it’s still working on something, the subsequent call will be queued in what many implementations call “mailbox”. Therefore, if you want to be able to simultaneously run two tasks of the same type, you will need to instantiate two.
Actors are not threads but use threads to achieve asynchrony, so their flow is also bound to the availability of threads.

The communication between actors happens with messages. In all the implementations I’ve seen, a message can be anything, an object, a string, a number. You don’t need to declare an actual interface, but rather set up data items working like letters. It is the actor duty to verify if the letter it just received is OK or not. Of course, to make everything work, it is expected you to sent messages containing at least what an actor needs to work.

Each actor should be good at solving one problem, or performing one task, but this does not necessarily mean it will take the responsibility of a whole process. It is a common -if not recommended- pattern to have actors mailing other actors as part of their tasks, therefore building chains of asynchronous events.

Failure management is also another interesting fact of actor programming. As every actor lives its very own life, every health issue an agent can encounter does not directly influence the health of the whole system. Of course a failing actor is a bad thing, but unless that actor is the keystone of your project, its failures shouldn’t be able to kill the whole application.
An interesting fact about actors and failures is it’s a common practice to let actors fail and not recover autonomously, in favor of a strategy where a supervisor handles what needs to be done to have the actor back in full working order. In most implementations, a failure deactivates the actor which will not be able to handle further messages.
This is a shocking perspective as you’re climbing the actor model learning curve, but you can clearly see it makes a lot of sense: if an actor’s the ability to perform is related to its state, then if it fails there’s a tangible possibility the state is corrupt, hence you don’t really want to simply “try-catch” the main task code, but manage the reinitialization of the actor!

At this point you can pretty much notice how the actor model looks like a representation of a real world factory. It’s kinda funny and disturbing, I would say.
Actors:

  • are specialized workers
  • are finite
  • are supervised by other actors
  • perform one task at a time
  • receive an heterogeneous file containing the details of the information to perform they need to interpret
  • have a mutable knowledge of their condition
  • need an available workbench to perform their job (threads)
  • may be asked to inform other actors of events concerning their job, or passing other actors a half-processed artifact
  • if hurt, they stop working and inform the supervisor

Look pretty intriguing already. Thinking at a software like this, you can clearly see how easier it is to control the behavior of your “factory”.

What else? Oh well. A factory can have offsite workers, and so does our model.

Scaling up!

I kept this topic as the last theory class mostly because not all frameworks implement this, and not all do it in the same way, so I’m going to keep the description pretty abstract.

You now have a full picture of how the model works and how fine the grain of your control is. You have also seen how the adoption of a good actor framework can keep you from over-engineering simple things that are already optimal in the model.

Now let’s add something that is -again- a source of horrendous over-engineering pain and insomnia: scalability.
As we previously said, actors talk to each other and each instance is responsible to perform the task it’s been instructed to do, potentially interacting with other actors in the process of doing their job. So far, though, we’ve always considered this activity as an abstraction inside one running process. Or maybe not?
Maybe not. Many actor model libraries also implement the ability to create a cluster of agents deploying actors. These agents won’t be the same process and won’t necessarily reside in the same server. Since the communication between actors is performed via messages, actors don’t really need to know each other, so an actor can definitely talk to another actor whose nature is completely unknown.
This allows you to achieve three great things:

  • A micro-agent system. Smaller, specialized software is easier to maintain, faster to deploy, debug, and potentially outsource. Grouping actors by analogy in micro-agents can be a winning strategy as they can be deployed in different machines and therefore provide a better performance. Micro-agent philosophy is not an effect of actor programming and the model can be achieved in a number of ways, but the actor model definitely pushed me in that direction.
  • Redundancy. Micro-agents deployments don’t need to be unique and don’t need to reside on the same server. An imaginary “Mailer” agent, implementing the FailureMailActor, SuccessMailActor and DailyMailActor doesn’t need to be unique, and if one goes down for whatever reason, the other will still perform.
  • No matter if you embrace the micro-agent philosophy or not, even exact same copies of the whole software, deployed in multiple servers and implementing all the possible actors can collaborate with each other and allow you to share the load between multiple nodes.

Conclusions

As always, if you’re looking for the model that will save us all, you probably have bigger problems you talk about with a psychologist. There’s no panacea in this world, you should be aware of it by now. What is certain is the actor model solves the problems we’ve been talking about with swag, makes the flow control clearer, allows a better use of the resources and forces the developer to be tidier.

It definitely worked for me all the way and has become foundational for all the software I’ve been writing.

In the next article, we’ll deep dive in some examples using my favorite implementation! Until then, I strongly suggest you start looking around and see what the gods of software have created for you to start working with actors.