Down the RabbitMQ hole

I tried to get this article started 3 times, failing every time. The reason, I think, is I wanted to be generically helpful. Then I realized that maybe I didn’t solve enough world problems for such a greater aim.
Let’s do something better. Let’s talk about the real world.

I was working on a big project of mine and everything was kinda working well. I had various agents with different peculiarities and technologies in need of talking to each other and receiving commands from other sources.
To have them talking, my logical, first approach was “let’s have them receive and send web service calls” and this is exactly what I started doing, but then my requirements (and common sense) hit me like a shovel on the nape.

  • Those web services will need to be rational, properly standardized according to a stable plan. If I don’t, it’s going to be a mess in a few weeks
  • Every message sent and received is extremely important. All the deliver/status/redeliver needs to be handled
  • Every process is important. I need to make sure that even once the message is delivered, it is processed completely
  • These agents cannot simply process every message the system throws at them. I need to make the system able to queue the messages for processing, according to the system capabilities
  • Multiple agents of the same type need to be deployed, and the system should be able to load balance them
  • Some messages should be issued once, but delivered to a number of consumers that could be interested in it
  • Some messages are just triggers, some others are actual synchronous RPC invocations
  • As part of the agents will end up in alien machines, it’d be great if they could initiate the connection to a trusted rack, rather than being open to connections
  • Some of the agents that need message delivered do not use the same technology used by the rest of the platform

This pile of things opened my eyes on an astonishing revelation:

Most of my worries relate to the code I write for handling messages between software components

As I learned very quickly, my needs are definitely nothing special, so if something worries me it probably worries at least one million developers out there. I swallowed my ego and the usual do-it-yourself attitude and started looking.

And the solution came out by looking into the number of software / frameworks I often use, most of which are made by or somehow related to Spring.
The answer is RabbitMQ, by Pivotal.

But before we get into the details…

Message brokers

Let’s make clear RabbitMQ is not a mysterious rare bird, but a message broker, a software that does -at a high level- two things:

  • Accept messages from agents written in any supported technology and deliver messages to agents written in any supported technology
  • Queue, organize, balance, manage delivery, redelivery and failure to delivery of messages

It is pretty obvious, at least considering what we’ve said so far, that this software would drastically replace a lot of code you’re normally used to write yourself, and as in every time you start considering to rely on someone else’s code, the first thing that comes up into your mind is “Oh shit, let’s hope it’s not some Sunday morning project”, which often

leads to: “If, long term, I don’t like it anymore, or the project fails, how am I going to replace it?”. And this is probably one of the most interesting parts of the story: AMQP.
I hate acronyms, seriously, but this one is sweeter than most: Advanced Message Queuing Protocol.
Which is, in fewer words than needed, a standard describing how a message broker middleware software should behave both from the behavioral perspective (how queuing should happen) and the protocol perspective (how the data should travel on the wire).

Every message broker that adheres to AMQP, can substantially be replaced with another one with a reasonable amount of adjustments. I’m not saying you shouldn’t put great care when evaluating a software that is going to take such a relevant role in your project, but at least you’re not venturing into the unknown.

Features

Now that we know what message brokers are, let’s see how they can help us in our requirements.

  • Point-to-point accept/delivery: the message is accepted in a queue and delivered to a consumer that is enabled to receive messages from that specific queue. There can be one consumer or many, but the message is delivered to just one. This is a pretty common use when you have multiple consumers of the very same kind to distribute the load among multiple nodes.
  • Publish-subscribe: the message is accepted in a queue and delivered to any consumer that subscribed to that queue. This strategy is a win when a software component triggers an event that can be interesting to a number of consumers. A good example is a system that receives documents and once the document is in, the event has to be notified both via email and SMS. The Email and SMS agents will subscribe to the same queue and receive the very same message. Moreover, using the routing functionality, you can have some agents being interested in messages flagged in a certain way, others in messages flagged in another way, or all of them.
  • Auto/Explicit acknowledge: the message broker considers a delivery succeeded when it receives an acknowledge response. Some messages are just notifications of events you want an agent to take action for. Some others are vital messages you want make sure they get fully processed. In the first scenario, your attention should focus on the certainty that the message successfully reached the agent and no network failures broke the message in transit. This is done via the auto-acknowledge routine that automatically sends the response once the message is in the agent.
    In the second scenario, you want the agent to reply the message broker once it succeeded in doing what it had to do. This moves a part of failure detection outside the agent code itself. Traditionally, what you would do is writing all the nice code that takes failure in consideration and if anything goes wrong “do something”. But a failure can also be unpredictable either in the place where it happens, or related to a system failure that is beyond code control (faulty memory, unstable state, out of memory process etc.).
    Moving part of the problem somewhere else could be a great opportunity to avoid even thinking on how to react to the worst failures. Explicit acknowledge won’t allow the message broker to consider the delivery done until an explicit acknowledge message has been returned by the agent. You can decide where this response is sent depending on your code, but a common strategy is triggering this event once the processing of the message has actually finished. If the processing fails for some reason and you are able the catch that event, then you will be sending a not-acknowledge, so that the message brokers will know the delivery failed. If it’s impossible for you to catch the failure, then the message broker will wait a certain amount of time and eventually decide the delivery failed.
  • Redeliver: in relation to what we just said, the message broker can be instructed to redeliver a message if a previous delivery was not properly acknowledged. Of course if a failure was caused by an unrepairable failure of an agent, then redelivery to the same agent won’t do any good, but in a scenario where agents are replicated, then there’s a good chance for the redelivery to succeed.
  • RPC: by using temporary queues and exchanges, it is possible to use the message broker as dispatcher for RPC calls. This is not exactly what message brokers have been designed for, but it’s impossible not to notice how this can be easily achieved. The invoker sends a message notifying what’s the name of the temporary queue it will wait a reply from. The remote agent will execute the call and respond in that queue that will eventually be deleted.

All the requirements are met.

Nature of the message

Messages can be anything, the message broker will not investigate what they’re made of. As long as bytes go through the wire, it’s perfectly fine.
Depending on the type of system you’re designing, though, the nature of the messages is something you want to study ahead of time. If you’re in a very homogeneous environment, such as a Java only context, there’s no reason why you shouldn’t binary-serialize your objects to simplify the code both from the producer and the consumer.
On the contrary, if your system is heterogeneous or might be in a realistic future, then you might consider working with universal formats, such as JSON.

RabbitMQ

RabbitMQAmong a number of very valuable message brokers, RabbitMQ definitely made it through my software selection.
The remarkable reputation that make it a relevant component of various softwares, the highly technological company backing its development, and the vast user base, are a good ID card of the software, but there are other things which really impressed me.
Its technology, entirely based on Erlang, is a clear statement of purpose: high availability by design.
It’s simplicity of deployment both for development and production environments, massively convention-over-configuration, makes you start quick and gives your ops a chance to learn new stuff when needed.

Even just these facts convinced me to take the risk to give it a try, not just as an experiment, but all the way to production, and let me say this: it definitely won the proof of time.

After a very quick deployment through the repository of the Linux distro, the application was up and running, and substantially that was it, the agents were gracefully able to connect and communicate with the message broker.

RabbitMQ also has a plug-in system, and it didn’t take long for me to realize how much I wanted to install the RabbitMQ Management plugin.
This plug-in adds a small web server that allows you to see exactly what’s going on in the message broker. Message deliveries, acknowledges, queues, messages awaiting delivery and confirmation, but connections and channels. Everything is right there, and especially during the exploration phase, this is extremely important because you will quickly find yourself with messages stuck or deadly redelivers. Learning to avoid getting stuck is pretty straight forward, but still this plugin gives you a great help from the software selection to the actual production environment.

The plug-in system has a number of interesting items you might be interested in looking into. One of them, that I think it’s pretty cool is an implementation of STOMP, a text streaming protocol, popular and widely supported. Applied to RabbitMQ, it allows us to expose most queues functionalities to external clients over the Internet with a finer control and security than you would have by exposing the AMQP sockets themselves.

Of course, as you would expect from such a system, RabbitMQ has clustering capabilities. Now, contrary to many scalable software components you might have used in the past, RabbitMQ has a number of options which work well in specific scenarios, so don’t rush it and read the documentation carefully.
I didn’t go through all the possible variations because I really didn’t need that much. In my case, basic clustering in a LAN allowed me to do everything, and it’s definitely a piece of cake to set up. I’m pretty sure the federation plugin (required when you need to cluster over a WAN) will require a bit more work, but I do believe these functionalities follow the same RabbitMQ principles.

Last, but not least in any possible way: reliability.
It’s all eye candy when you’re developing, but production is all another story. You can be rational and clearly evaluate how respectable the software is and how Erlang is beautiful in this kind of product. But at the end of the day, you need to see it with your very own eyes to start sleeping well the night.
For this reason, my words won’t help you much, but I’ll give it a try. RabbitMQ is absolutely exceptional in reliability and stability.
I had the luck to talk to two software engineers in different companies using RabbitMQ for large volume of critical messages and they all agreed with my impressions:
“It’s incredible how you drop it in and you just forget about it” Francesco said. I trust Francesco, and you should trust him as well.

Conclusions

I’m in that phase everybody doing this job periodically go through: how could I live without technology-name-goes-here.

How could I live without RabbitMQ? I probably can’t answer this question right now because I really consider it critical for my work.
Rationally speaking, it’s not what it does, but how it does it and that sense of safety and stability that eventually had me sleeping well again.  If you’re dealing with enterprise software with asynchronous tasks going in an out multiple agents, I do believe you should consider a message broker, and RabbitMQ is definitely my advice.

Advertisements

One thought on “Down the RabbitMQ hole

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s