How APIs changed modern technology

To be clear, this article talks about web services (mostly RESTful), often called APIs or web APIs.
The intimate difference between the web service and API definitions is very relevant to understand how this revolution changed literally everything in our industry.

Wikipedia describes web services as:

A web service is a service offered by an electronic device to another electronic device, communicating with each other via the World Wide Web. In a Web service, Web technology such as HTTP, originally designed for human-to-machine communication, is utilized for machine-to-machine communication, more specifically for transferring machine readable file formats such as XML and JSON.

I think I couldn’t say it any better.
This is their definition of API:

[…] is a set of subroutine definitions, protocols, and tools for building application software. In general terms, it’s a set of clearly defined methods of communication between various software components. A good API makes it easier to develop a computer program by providing all the building blocks, which are then put together by the programmer. […]

This sounds a little bit more obscure. In very short, it’s how a piece of software can interact with another piece of software, internally or through a media.

This clearly demonstrates how web services are a subset of APIs.

What’s really awesome about web services is in modern, standardized implementations, the protocols and formats are platform independent.
This specific aspect had huge impact in tech because it allows heterogeneous systems to talk to each other.
SalesForce talking to SAP, talking to Jira, talking to email services, without an actual need for each system to know each other specifics, other than being able to use a standardized, platform independent API.

But this fact has other more mundane implications. For example, you will probably build your back end in, say, Java, Node, Python, whatever, but your consumers are both an IPhone and an Android app. It appears obvious that having a common ground is what makes all this possible.

None of our modern mobility culture would exist without this kind of API. Or is it vice versa?

I’m not saying the idea of a standardized, common API language is genius, on the contrary it’s plain logic. It has been the need to do more that generated web APIs.

But there’s more to it.

Changing the way we write code

Whether you’re exposing a service for the masses or specialized software for businesses, web APIs are now not-optional.
We’ve seen big software packages struggling to provide them, sometimes using terrifying add ons that produced terrible side effects. But as of now, I think everybody complied.

As a software engineer, writing core services or business logic has to meet a certain quality standard and employ certain good practices. These are good practices in general, but they become strictly necessary (no shortcuts) when you’re exposing APIs to the outer world.

  • Dependencies. Every routine necessarily relies on other routines and libraries. Bad code (or “classic code”, to be gentle) expects dependencies to be passed somehow in their context by its invoker. While this is a bad practice in modern software -easily solved by the dependency injection pattern- it becomes a major problem as the API producer ends up knowing more of the domain than it should. Keep it clean: the API producer knows what to call, the service knows what to do, the right dependencies are injected, not passed.
  • Parallelism and concurrency. As a general rule, you should never ever  be tempted to think “users will never step on each other feet”. Remember, users will use your software in ways you cannot imagine. This is even more true when it comes to APIs. While users will follow the path your team designed in the UI (and will try to break it) APIs are meant to expose internal procedures to be consumed by other machines. You have no idea what they’re going to do. This means you need to consider what would happen if a certain API is activated one million times in a minute, used improperly, or loop. What are you going to? How will you manage the situation? And even more important, is there even a slight chance to cause a race condition? But also, when performing long running tasks will we have the API hang there for minutes or are we going to build a task ID and release the resource? When you come up with a good parallelization strategy, that will impact positively in every corner of the package.
  • Façading, blackboxing. Again another good practice, that’s always been good, but now you have to. APIs can expose the very same content you generally offer via a UI. Or more. When there’s more to it, you really must create solid gates so that the API producer will interact with just the routines it should talk to, passing very plain, non domain specific objects, and returning harmless content. A very simple way to do it is creating façades that will be the only point of contact between the API producer and the core. Everything behind the façade has to be a black box. Done right, the gate will hide routines that must not be invoked directly, provide the needed coordination to input verification, authentication and security.

Changing the way we see a software package

As web APIs evolve, get faster and more accurate, so does the software using web APIs as a communication protocol. This made web APIs a viable media also for internal software communication, so that many software packages are now smaller standalone services, collaborating with each other: the microservices.

This opposes to the classic view of the big ass package with everything in it, running in a single massive process in a huge server. Pros and cons, as in every radical architectural decision.
Certainly big monolithic packages have the positive effect of having all the domain in one place, which is a pleasing feeling especially when you’re dealing with big database schema controlled by an ORM. They will probably run slightly faster and won’t incur in communication problems related to the network.
On the other hand, which is a big hand actually, splitting applications in microservices provides a high degree of modularity (and any Java developer knows what that means to us) which implies the ability to roll code changes that do not impact the whole platform in a high availability flavor, allowing simplified parallel development trains and releases. Moreover, high availability and disaster recovery from crashing hardware becomes simpler as not the whole system will live in one place and moving microservices from one place to another is generally faster.

And finally Docker. Of course containers are exceptionally useful even if you’re not doing microservices, but one of the major uses of Docker is indeed containerization of microservices.

Changing our interest in programming languages

I’m a Java veteran and, like all Java veterans, look at other -newer- programming languages with a little bit of suspect. You know what you leave but you don’t know what you get.
What is certain, though, is more modern programming languages nowadays live around some concepts that make working with microservices a breeze.
It’s no doubt that platform like Node are phenomenal when JSON APIs are to be generated or read and it’s clearer than ever that languages with lightening fast bootstrap (Go for example) are the way to go, considering microservices lifecycles can have very fast turnarounds.

But the real life changing effects that APIs had on programming in general is that any software written with any web API capable language (probably 99.9%) can cooperate seamlessly with any other software written with any web API capable language.
While this sounds like a repetition of what we previously said, this sentence takes a completely different taste when the topic is microservices.

Choosing a programming language (and platform) was a terrifying step in the past (given that you had a way smaller pool to choose from) because it involved many aspects like:

  • does it have all it’s needed to do the job? And how hard is it going to be?
  • how will it scale?
  • how modular is it?
  • how easy is it to debug?
  • how many experts can I find in my area and outside my area?

While this still is a very important step, in enterprise software you’re now given the possibility to design a greater plan that expands in time, where a programming language or a platform can do what it’s good at.
Erlang is good in parallel computing and orchestration? Fine. Python is good at data crunching? Great. Node is good at building APIs? Excellent. Java is the industry standard? Go for it.
If you’re designing a large system and you’re a very well organized mind, you might not need to give up any of the benefits these languages offer you.

Changing the way we test

Now, if you live in the same timeline as mine, testing software is not optional. Unit testing, data driven testing, integration testing, user driven testing, you name it.
But there’s certainly a wet dream most developers had since software testing existed: being able to evaluate tests as the software is running, on live data. Even though it’s always been perfectly doable (remember the “It’s just software” mantra), we can’t really say it’s ever been a real option.
But with web APIs and microservices this is becoming a real deal, because you potentically can intercept messages going in and out and test them. Of course the way you test is different as you don’t have power on input values, but you can certainly introduce enough logic to verify how the APIs are working, adding detail where needed.

Changing the way we orchestrate

This last “change” is very close to my heart, and probably one of the most underestimated ones in the list.
Whether you are running microservices or a software beast packed with lots of different multi vendor packages, the way each item communicates with the other can be orchestrated in different ways.
Yes, you can write code to have them talk, determine when this is happening synchronously or not, how many simultaneous calls and what’s meant to happen when calls have to wait in a queue, but is it worth it?

Three years ago I met RabbitMQ and you can read about it in a previous article. Two years ago I had the pleasure to deal with MuleSoft‘s Mule and Apache Camel.
Whether it’s a queue manager like RabbitMQ (its logic connection with web APIs is a bit of a stretch) or a fully featured enterprise service bus like Mule, there really is no reason for you to write code to orchestrate your web API driven softwares.
Operations such as authentication, thread and concurrency management, or queuing strategies follow very precise patterns you don’t need to write every time and carry the burden to maintain, considering there are combat proven pieces of software out there which can do that for you.

If you can keep your code clean from details that are not relevant to what the software has to do, you better go for it.

Conclusions

It’s not rocket science. Actually, it’s less than brilliant. Wait, this is really boring.

But this boring, less than brilliant advancement allowed us to access natural modularity, platform independence, language independence, focus upon the software real aim and operation teams that can sleep -almost- peacefully.

It’s not the invention, it’s the infinite effects it will have on the world.

Advertisements

not-a-docker-fanboy Docker first impressions

cr7tatdukaa1vocSo Docker is buzzword. If you’ve been in the industry long enough you know how dangerous buzzwords can be, not because there’s not actual value in what they promote, but the risk of investing time in something that is either not ready or not suitable for what you do is extremely high, when everybody keeps telling you it’s impossible to live without it anymore.

The other problem with buzzwords is they are rarely accurate. When you dare to ask “can I do this” the fanboys will reply something like “Are you deaf or stupid? Of course you can!” but no one is really interested in exposing if it’s a good idea or not, if it requires other tools or not. Not because they’re not in good faith, but because when buzzwords spread, you have a higher chance to meet people who tried it than people who actually used it.

It’s because of this kind of conversations my first encounter with Docker was disastrous.
The versions on what Docker was were three:

  1. The best developers’ companion possible when you work with multiple softwares like databases and such
  2. The best imaginable production deployment solution that takes care of everything your application needs
  3. A new virtualization paradigm that allows you to finally stop thinking about servers

SPOILER ALERT: the only one to be totally true is (1).

Not so sad truths

So as I said 2 out of 3 expectations failed miserably. And that’s what tricked me in saying “I hate this crap” when I first approached the topic: wrong expectations.
Part of these expectations are the result of the fanboys advertisement of their new toy, as I said. But the other part is definitely my fault.
Remember the excitement when you first realized how easy it was to deploy a new server on Amazon AWS? Manage the networking, the volumes etc. That excitement was etched into my mind forever.
Now what happened with Docker is when I first heard bits of information about it I started imagining how it would have been having an “Amazon” where you don’t need to manage servers at all. A system where the virtual server environment is actually a commodity of applications. A system that “knows about applications rather than servers” and takes care of availability, virtual networking, virtual volumes, and configuration.

Here’s the thing. I imagined Docker to be something that it’s not, and the more I tried to find the way to make it work as I expected it should have behaved, the more I felt miserable and believed it to be faulty.

What’s really sad about this story is if I reread now how Docker has been described to me, I realize most of the problem was my hopes. I read what I wanted to read, I understood what I wanted to understand.

What is Docker

This paragraph would have helped me back then. I’m not going to dot-list every feature because more than anything else, what you need to understand is the philosophy that lies within.

Docker is a virtualization platform where the virtualized OS is not actually running, but it’s more of an environment that allows one application to run “as if” it was in a standalone context.
There’s no boot, there’s no init, no drivers or virtual hardware. With that said, though, the OS is actually functional, meaning it mimics real operating systems, and all the commands, files and utilities are where they have to be. Ideally, though, the sole purpose of the OS is making that one application run.
By doing so, the paradigm shift is the OS environment becomes part of the application itself, so it will be “running” as long as the application runs, and stop when the application stops. The environment can be modified and updated based on the needs, but when running it has to be considered stateless and any change you or the application will do to the environment is meant to vanish when the container dies. That’s perfectly coherent with this paradigm, because very rarely an application will modify itself, and the OS environment is part of the application.
Wording: an application with its surrounding environment is called an image. A running application is a container.

And there’s literally nothing else to say about the intimate nature of Docker, but there definitely are a lot of questions to be answered on how it will help you, so read on.

Networking

Docker allows the container to communicate with the outer world by providing network virtualization so that each container can be provided with one or more subnets and IP addresses.

There are various network modes and we’re not going into the details now, so accept this as a hint to get you started.

The virtual network will allow containers to talk to each other via the assigned IP addresses, so the application is clueless on whether it’s running next to its buddies, or thousand miles away. Each container is perceived as a standalone machine. Moreover, Docker also allows communication between containers using sort of host names that will resolve naturally from the inside, like any address.

In case it’s needed, a “bridge configuration” allows you to map the port of a container to a port of the host machine, exposing the service what’s outside the virtual network.

Storage

Of course storage is still a thing, and you can map a directory in the host server to a directory in the container OS. There’s no need for it to be an actual “mounted volume” like you would need to do with a classic virtual server, because the server is not actually a server, so you can skip all the crap that an OS needs to do to achieve the same result.

This is a very relevant topic because, as we previously said, you should never rely on what the container will store in it’s environment as it’s ephemeral so if your application needs to store something, you will need to “mount” a volume.

Be careful. Setting up a volume has one major effect on your deployment strategy. Stateless containers are -indeed- stateless so they can be moved around from one server to another without any side effect, but when a volume is mounted, it is obvious that you are binding that container to a location where the volume is actually accessible.

Build / Deploy

This is where things get tricky yet very familiar.
Images are not just big files with stuff in it. As the wording in Docker itself suggests, they are more similar to GIT repositories. Images are actually the history of changes that you made to them (they are called layers).
When you want to make changes to an image and make them ready to deploy, you apply the changes, literally commit the changes and push them to the main “repository” (which resides in a registry).
The servers running the containers will just need to pull the changes and restart the containers.

This implies two major side effects in your life:

  1. You have a quick way to deploy applications with a familiar paradigm
  2. You are not just deploying the application, but also changes to the full OS environment that will be necessary to run it!

Which means that at least 90% of the problems the new shiny update of the software will cause, even the ones that are OS related, can be fixed at your desk, without even bothering the infrastructure team.

Rather than making changes on a running container and committing them, the best way to build an image is to use a “Dockerfile“, which is no more than a recipe on how the image has to be built. You can stuff it with information like “what is the base image to start with” or, “copy this file”, or “launch this and that command”.

And these are the basics

Not really, but close. There’s a big amount of other options you can use once you get familiar with the basics. Like the ability to create a virtual subnet that spreads across multiple hosts, manage memory and CPU limitations, using containers host names rather than IPs, or describing how multiple containers are related and need to be started to run a complete microservice system (Docker Compose).

But for what concerns the very basics, this is it. And this is where your expectations come into play.
What about orchestration. What about configuration management. Guys, there’s none.

Specifically, configuration management is what killed my libido in my first steps with Docker.

Configuration anyone?

Since images need to contain the software needed to run in a agnostic environment, configuration shouldn’t live in them, unless it’s certainly no subject to change or works smoothly as a global default. No configuration in the image, no configuration in the container. And if you copy the configuration in a running container, well it will be lost once the container restarts because -again- it’s stateless.

So what’s the trick? The not so obvious way to provide configuration to your container? Well, there’s not a definitive answer. And this is the fact I was trying to communicate since the beginning of this article: Docker doesn’t care about your damn configuration, because it’s just a virtualization platform.

Every solution I can propose you here has pros and cons, there’s no magic trick.

  • Store the configuration on a volume that will be mounted when the container launches. It’s easy, it doesn’t require you to change your program and is compatible with distributed configuration managers, but now the service that’s running in the container is bound to the server it runs on. If the container has to move to another server, adjustments will be required and won’t happen automatically.
  • Store the configuration keys in a startup script (Docker Compose can do that for you) that will expose them as environment variables within the container and the application will need to pick them up. Easy to do, effective and portable, but complex configuration files will simply be a no go.
  • Store all the configurations within a key/value store container that will become essential part of the infrastructure. Smart, effective, portable, but then this container becomes a single point of failure and its storage need to live on a mounted volume. Plus you will need to change your program to make it happen.

Choose your flavor. Is there a right way to do it? Nah, not really.

Wait, where’s my orchestration again?

News say that with Docker 1.12 you get built in orchestration and that’s awesome. With that said, no orchestration has been available so far, out of the box.
Orchestration tools have been created to simplify your life. Rancher is the one I got familiar with and I definitely like it.
What you want to expect from these tools is the ability to store launch scripts, create stacks, virtualize a network across multiple servers and provide, where possible, a solution for your storage.

So… Do I still need servers?

The quick answer is: yes. Servers, or virtual servers if you will, aren’t going anywhere.
There are PaaS services out there that provide you the luxury to stop thinking about servers, and I’m sure  some of them are extremely good, but I must be honest and say I tried some of them and it didn’t work well for me. I expect to change my mind in the future and will definitely tell you how that goes, but as of now, I still think the way to go is deploying your own (virtual or bare metal) servers.

What really changes is the way you work with them. I don’t have the experience yet to tell you what’s the downsides but I can definitely tell you some of the upsides.

  • Simplifies the work for infrastructure teams as they don’t have to deal with the specific environment the application will need to run.
  • Drastically improve the efficiency of the QA phase as the testing environment is way more similar to the production environment
  • Manage redundancy, disaster recovery, updates and availability with less stress

Performance

This is not irrelevant. In the beginning of my experimentation I was kinda suspicious. Not because I don’t trust virtualization in general, but my mind couldn’t avoid bringing me back to the old VMWare days where there was a thin line between spinning another VM and sinking your server. Now, even with everything that happened after that (say, Amazon) I still couldn’t stop thinking that running one microservice into each virtualized container wasn’t going to do any good to performance.

Boy, I was wrong. Unlike classic virtualization, my understanding is that the magic Docker does by not really running a full OS makes the main process nearly as fast as if it was running natively in the host platform, because technically it is. I don’t have actual empirical data, but that was my feeling and has been confirmed by my readings.
Certainly there’s a price to pay, and it is related to every item that is not strictly connected to the process: one for all, virtual networking.

So if there’s a performance price to pay, is related to how you “accessorize” your deployment.

Conclusion

Am I liking Docker so far? Hell yes. Yet I cannot provide any specific details on whether I trust it or not because I honestly haven’t experienced the thrill to migrate to Docker in production. Yet.

Do I like the concept? I’m not a skeptical guy in general, but I generally don’t trust tech I didn’t have the time to understand how it works, at least in principle. Now that I (kind of) understand how it works, I find the whole thing extremely interesting and very powerful.

Is it the way to go? This question carries a lot of responsibility, but I take the risk: yes, it is the way to go.
Docker achieves two very important goals, and became so popular quickly because of them, not because of a general purpose excitement.

Is there more to be desired from Docker? Mmm. I don’t know really. At the beginning I was stunned by the absence of features that I thought to be essential, but then I realized that Docker was meant to be simple and achieve specific goals, and everything I needed would have been achieved by accessorizing it.

Very last note

As I’m not an Docker expert, as the title implies, if I got anything wrong in the article, feel free to reply and correct my mistakes! Thanks.

PostgreSQL JSONB type – first time hands on!

Now when you think you knew something about databases…

From relational to NoSQL, from MongoDB to Cassandra to Elastic… there’s been quite a lot going on. Or better, you know you’re getting old when all these events are all packed close in your memory. But you’re OK with it and understand how this new world has its own rules, where the most important one is:

Each one of these, is very good at some things, and very bad at other things

Then something happens and you don’t know what to think about it: your life companion PostgreSQL has a structured data column type.

It’s like when your wife says “I’m pregnant” and you’re not really sure if you’re like: “I’m gonna be a father”, laugh, cry, buy a ticket to Cuba, “it must be someone else’s”, summarize in your head the gazillion of problems you’re gonna face, hug your wife.

And if you think this comparison is excessively nerd, you gotta spend some time with a developer’s wife/husband and hear from her/him.

Anyway, to understand how, you better figure out why (I’m talking about databases now…), and to do it, I realized I should have played with it to have an opinion.

Getting started

Well first off make sure you have at least PostgreSQL 9.4 installed. On my Ubuntu Linux LTS 14.04 I had to upgrade it by adding another APT repository. Also keep in mind that when upgrading like this, you’re not actually upgrading but adding another instance on the side and they will both run, somehow.

Next, launch your psql client. If you’re completely new to PostgreSQL (I hope not, because I’m definitely not going to help you with the basics), you might find easier to su to the postgres user and continue from there. But again, if you’re completely new to it, I suggest you to start from here http://www.postgresql.org/docs/9.4/static/tutorial-start.html .

Once you’re in, create a test database and connect to it:

CREATE DATABASE testj
\c testj

Table definition

PostgreSQL ain’t MongoDB. We’re still in a relational database, we still have tables, we still have schemas. This JSONB thing in PostgreSQL is a column type. To better understand how we can benefit from this data type, I’d like to use the very first example that came into my mind.
When creating an application with registered users, you know exactly what a user is about (email address, full name, country etc.) but based on what your application is going to do, you might have a number (at first finite, then… who knows) of other properties related to the user that is going to be fluid, change over time and might even be customized by users themselves. Access privileges, for instance.
Until now, you had four chances.

  1. Alter the table to add columns when needed. Meh…
  2. Create a “properties” table and join when needed. Better but still meh…
  3. If you’re already using a document database, create a document of properties for each user. Seen and done, but boring and capable of creating lots of garbage.
  4. Store a serialized structured data in one column as text. Acceptable, but not funny, as there’s no way to query the database for particular properties and you will need to deserialize each time…

This JSONB column seems to be  doing just fine for this simple purpose.

CREATE TABLE myuser (
                     id         SERIAL PRIMARY KEY,
                     full_name  TEXT   NOT NULL,
                     email      TEXT   NOT NULL,
                     properties JSONB  NOT NULL);

As you can see, this is a usual create table, with a twist: the JSONB column.

Inserting

We know enough to do this, by now:

INSERT INTO myuser
           (full_name,email,properties)
           VALUES ('Charlie Sheen',
                   'charlie@thisisatest.com',
                   '{"privilege":5,"warehouse_access":true}');

What’s really interesting here is the properties field is inserted as you would do with text, but the content is actually evaluated. Try to introduce a syntax error in the JSON and you will see…

Let’s query that to learn more:

SELECT * FROM myuser;

 id |   full_name   |          email          |                 properties                 
----+---------------+-------------------------+--------------------------------------------
  1 | Charlie Sheen | charlie@thisisatest.com | {"privilege": 5, "warehouse_access": true}

Pretty clean, but still not a big difference from a text column.

Querying

Now let’s see why this is better than storing a piece of text.
The very first thing to worry me when it comes to storing big stuff into a database is: if I need to retrieve a subset of fields from a big number of big rows, how much junk is going to go through the wire?
You know this is not a problem with SQL servers, as the SELECT clause allows you to finely choose what you need.
In other systems, such as MongoDB, to do so you need to use a specific API (aggregation).
This is a very important topic because on heavy duty systems this can cause you a lot of trouble. Other than the traffic on the wire, you need to realize your software will have to parse the data once it makes it into the system. I have a few terrible stories where 80% of the query processing time was taken by the software trying to represent the received data.

Say I want to select the privilege level of users (I’ve added another user in the meantime):

SELECT id,properties->'privilege' AS privilege FROM myuser;

 id | privilege 
----+----------
  1 | 5
  2 | 2

Say I want to know how many people have access to the warehouse:

SELECT COUNT(*), properties->'warehouse_access' AS warehouse_access
                  FROM myuser
                  GROUP BY properties->'warehouse_access';

 count | warehouse_access 
-------+------------------
     1 | true
     1 | false

Say I want to have the list of people having access to the warehouse:

SELECT * FROM myuser WHERE properties->>'warehouse_access'='true';

 id | full_name     | email                   | properties 
----+---------------+-------------------------+--------------------------------------------
 1  | Charlie Sheen | charlie@thisisatest.com | {"privilege": 5, "warehouse_access": true}

Or you can decide to match some document properties (in a Mongoish way) by:

select * FROM myuser WHERE properties @> '{"warehouse_access":true,"privilege":5}';

The syntax look a bit upsetting, but the thing is they decided to create multiple operators you can find out here.

Transactions and atomicity

This is probably one of the most intriguing things about this feature.
Transactions anyone? YES. As I previously said, PostgreSQL didn’t change its intimate nature, therefore transactions are part of its daily routine. And as the JSONB typed column is “just a column”, transactions apply as usual, in other words, across the database.

Atomicity is a whole different thing. In PostgreSQL 9.4 you basically can’t change part of your JSONB, so -back to our example- if you need to to inhibit the access to the warehouse to Charlie, you will have to:

UPDATE myuser SET properties='{"privilege":5,"warehouse_access":false}' WHERE id=1;

To be fully honest, I see the logic underneath. As you can’t change “part of a cell”, then JSONB is not an exception.
With that said, though, PostgreSQL 9.5 introduced the jsonb_set command that does just that, but given the upgrade madness I previously mentioned, I have no intention to upgrade again for a while, so bear with me if I didn’t give it a try.

Indexing

Ah here we are. Noblesse oblige.
We have structured data, now what? If you’ve been playing with NoSQL software long enough, you know that indexing is something you will have to deal with sooner or later, and experience plays a very relevant role in this design phase.
In relational databases you are mostly like “yeah right, will do”, if your data is not meant to be huge.

Now, if you’re going to use PostgreSQL as usual, PLUS you’re including the JSONB thing to store extra stuff, fine go ahead. But if you’re going to query that JSONB, then I strongly suggest you to read this very well done speed test which undoubtedly points out how querying structured data without an index is slower than MongoDB (while it looks faster in all other situations).

Indexing is… kinda weird.
Back to our example, I can use the following statement:

CREATE INDEX propsindex ON myuser USING GIN (properties);

This will get the WHOLE properties column indexed, so your queries will take advantage of it.

SELECT * FROM myuser WHERE properties @> '{"warehouse_access":true}';

If you’re asking yourself “is this… going to put the whole JSON in an index?” the answer is yes, and might not be what you want is a vast number of situations. Also, unless you’re gonna do hardcore random searching, this is just wasted index space, and wasting indexing space means degrading performance.
Don’t despair though, here’s a way:

CREATE INDEX windex ON myuser USING GIN (properties -> 'warehouse_access');

This solution also offers many other advantages that are out of scope for this hands on article.

Conclusion

First off, I’m pretty new to this feature, so what I’m going to write here are just opinions and sparse thoughts.

Let me be straight forward: I think you can create wonderful disasters with JSONB.
Normal forms are far from being the panacea to every problem, and the multitude of NoSQL, document based databases demonstrate it in many scenarios. But normal forms are a synonymous of order, symmetry, something that makes us control freaks feel comfortable. Until we shatter our balls in endless queries with nested joins and ridiculous performance for no obvious reason.

Now, the following list of DON’Ts will seem a bit dramatic especially when I say you shouldn’t do things the system is actually capable of, but this pretty much relates to 2 software development rules of mine:

1) prepare for the worst: how many requests did you say? 100 per minute? How will it behave if they are 100 per second? How much HDD space? How about 10 times?

2) use the best tool for the job: if you can choose something that does great, don’t choose something else because you’re a lazy ass.

So here it is:

  • If your plan is to make aggregated operations, such as grouping or counting etc. on your structured data as a heavy duty feature, please, don’t, they won’t perform too well, and there are other tools very good at that.
  • If you plan to store unorganized random stuff in that column, don’t if you’re going to search them aggressively. And it’s a bad database design, period.
  • If you plan to massively index full JSONB columns, don’t if you don’t want your indexes to grow immensely and kill performance.
  • If you plan to use PostgreSQL as document store, don’t, it’s not.
  • If your project requirements don’t explicitly say you should use PostgreSQL or a database of your choice, postponing and delegating the decision, don’t, you might find yourself with a database design that isn’t compatible with other systems.

But if:

  • your rows are going to include beautifully organized columns used as indexes PLUS structured data to be used as a whole
  • … or your JSONB column is actually very well organized with properly indexed fields
  • you’re going to use your the structured data as catalog feature (an array of tags, for example) or any reasonable use that will enrich the capabilities of your table
  • you’re going to take advantage of the database-wide transactional peculiarity
  • your application is not going to constantly update them as a main operation

Then you probably met the philosopher stone and I think this feature will greatly improve your work and your database design.

And database design is the key. Having a column like JSONB doesn’t mean you can’t postpone key decisions on how your database is going to work, and this applies to all databases. Even the freedom of MongoDB, for instance, is apparent. Don’t take key decisions today and you’re going to regret them tomorrow.
Of course you don’t have a crystal ball and certainly you won’t be able to tell how the database is going to evolve, but this doesn’t mean you’re allowed to go straight, when you’re allowed to turn left or right.

With that said, though, I see a structured data column as a powerful tool that extends my options in designing a good database, empowering me to let some features evolve without re-engineering the database or being forced into paths that are boring and frustrating. Introducing approaches that makes my design stronger and efficient.

Don’t worry, be Groovy (even if your job is boring)

It’s been 5 long years since my adoption of Groovy in my daily work. A long lasting love / hate relationship that entitles me to give advices to Java developers.

This topic came up when I was asked advices for the “modernization of their software” by a large company with a core platform and a number of developers writing business logic. Having worked in such context, one the many things that came into my mind was:

simplify developers’ life when writing lots of boring business logic code

In my head the word “Groovy” was flashing and couldn’t help it.
So ideally, what I’m going to list here are a number of things serving this specific purpose.

What it is

Groovy is a programming language running on the Java Virtual Machine. Contrary to what some others did, Groovy syntax is an evolution of Java, meaning that most of the syntax and concepts you generally use in Java are still there and available. This also means that most of your “legacy” code could benefit from the Groovy features without a rewrite.

The main goals of Groovy in this context are:

  • Simplify your code, automating some tedious, verbose work you would normally have to do in your program
  • Introduce its implementation of closures
  • Decorate default objects with extended functionalities using meta-classes
  • Allow you to decorate objects using meta-classes
  • Introduce loose typecast

Personal opinion alert!
Let me say this: I think Groovy does not belong to the small family of elegant programming languages. I’m not talking about the features, but the feeling it gives me when I read it. It’s not that I don’t like modern languages, but  there are several ways to design one and Groovy can sometimes be a bit too Javascriptish. But let’s face it, we’re hereby talking about the effectiveness of the language. The thing with Groovy is it’s damn quick to use, and in contexts similar to what I previously mentioned (lots and lots of boring code) it’s very very straight forward.

Hack #1 : beans!

Some developers think the concept of “bean” is evil because it has been abused and turned into the root cause of a number of tragic bad behaviors. Fine, I kinda agree with it, but you can’t deny you need them.
Let’s start with the most boring thing about beans:

Getters and setters

class Banana {
   String color
   int quantity
}

Forget about getters and setters. Groovy creates them for you as long as they have the default scope. The rules in getters and setters naming are the usual Enterprise JavaBeans. When you need to access a property exposed like this, you can either invoke the getter/setter or the property itself: in both situations the getter will be invoked instead.

Banana bananaObject = new Banana()
bananaObject.color = "red"
bananaObject.color
bananaObject.getColor()

And you can easily customize the behavior by overriding the default, as In:

class Banana {
   String color
   int quantity
   public String getColor(){
      return "C '+color
   }
}

And guess what, this also works with maps, as in

   HashMap<String,String> stuff = new HashMap<String,String>()
   stuff.put("id","511")
   println(stuff.id)

Constructors

We just annihilated the need of explicit getters and setters, but why writing boring constructors when a bean is just meant to store the variables you just declared? Here’s what Groovy gives us to kill the pain: an implicit constructor that accepts a map (more accurately, a LinkedHashMap) as parameter that will automatically fill the properties. As in:

Banana bananaObject = new Banana(color:"yellow", quantity: 5)

Hack #2: operators

How many times did you == instead of .equals(..)? Yeah, I know the feel. Even if you’ve been doing it right for 10 years, sometimes when your mind is elsewhere, you can waste an entire day on a joke like this. Moreover, in a language where pointers are not explicit, it gets even harder to teach a newbie why it is the way it is.
Groovy overrides a number of operators (and adds some too!) to better meet a slimmer programming style.
I’m going to list some interesting ones here:

  • == now works as .equals(..), and this is true also with collections
  • Access operators ‘.’ and variants. Now that getters and setters are out of scope, you’re going to “dot notate” a lot. What’s worse than chaining a number of dots and get a null right in the middle? Here’s the question mark operator!
    stuff.request?.date
    

    If request is null, it’s going to return null straight away.
    This fact also reflects in ternary operators that acquire the “Elvis operator” as in:

    stuff.request ?: new Request()
    

    If stuff.request exists, then return it, otherwise return a new request

  • The boolean comparison is also getting an interesting new thing that is common to most dynamic languages:
    if (stuff.request) {
     // do stuff
    }
    

    If stuff.request is null (or if it’s a string, it’s empty) then it’s false!

  • +, +=, – and -= work with lists as well, as in:
    ArrayList<String> list = new ArrayList<String>()
    list += "bananas"
    

Hack #3 : loose typecast

Loose typecast everyone? Now, I’m sure this is going to worry some Java purists, me included in many ways. Facing typecast problems at runtime can be a new experience to most of us, nonetheless there are good points in doing it when “it’s good”. You can use it or not, up to you.

def bananas = "123"

Love it, hate it, I’m not blaming any of you, after all you’re not meant to use it if you don’t want to. One thing for sure, if you’re not used to it, you better keep your eyes open.

As you can imagine, if you can declare a variable as “def” you can use that variable even if the compiler is not aware of what an object is made of.

void doStuff(def item){
   item.printMyBananas()
}

If printMyBananas() is part of item, great. If it’s not keep in mind it’s going to blow up at runtime.

Hack #4 : simplified collections

Collections are part of our daily routine. Lists and maps have various implementations of course, but you rarely fine tune what to use unless you really have to.

Groovy provides many ways to simplify your work with collections.

Arrays / Lists

  • As previously mentioned, Groovy overrides operators such as +, +=, -, -= to work with arrays.
    def list = new ArrayList<String>()
    list += "bananas"
    
  • Initialization is also a great great improvement, look at the following snippet and see how the left side of the assignment is actually deciding the type of the list.
    def arraylist = ["a","b"]
    // look how I'm calling simpleName and not getSimpleName() ...
    assert(arraylist.getClass().simpleName=="ArrayList")
    // default implementation is ArrayList
    
    LinkedList linkedlist = ["a","b","c"]
    assert(linkedlist.getClass().simpleName=="LinkedList")
    // becomes a LinkedList
    
    String[] array = ["a","b","c"]
    assert(array.getClass().simpleName=="String[]")
    // becomes an Array
    
    Set set = ["a","b","c"]
    assert(set.getClass().simpleName=="LinkedHashSet")
    // set is an interface, chosen impl. is LinkedHashSet
    
  • You often hope to get filtered and organized data from your dataset, but sometimes you find yourself searching, filtering and sorting in your lists. Groovy injects collections metaclasses with special closure capabilities that will ease your life a lot, especially if you’re dealing with collections of complex objects:
    public void flagUnderage(def people){
        people.each {
            if( it.age < 18 )
                it.underage = true
        }
    }
    public def getUnderage(def people){
        return people.findAll { it.age < 18 } } public def sortByAge(def people){ return people.sort { it1,it2 -> it2.age-it1.age }
    }
    public def getAges(def people){
        return people.collect { it.age }.unique()
    }
    
  • Slicing a list is also a pain, right? So here’s the deal:
    def map = [1,2,3,4]
    def sub = map[0..2]
    assert(sub.equals([1,2,3]))
    

Maps

  • As previously mentioned, accessors for maps are a big deal in Groovy. Look at this snippet:
    HashMap<String> map = new HashMap<String>()
    map["username"] = "Banana"
    assert(map.username=="Banana")
    assert(map["username"]=="Banana")
    
  • And as much as for lists, also maps have easier intialization:
    def defaultMap = ["a":1,"b":2,"c":3]
    assert(defaultMap.getClass().simpleName == "LinkedHashMap")
    
    TreeMap hashmap = ["a":1,"b":2,"c":3]
    assert(hashmap.getClass().simpleName == "TreeMap")
    
  • Groovy also introduces the same metaclass methods you can get for lists, but keep in mind you’ll be dealing with key/value objects, as in:
    def map = ["a":1,"b":2,"c":3,"d":4]
    def greaterThanTwo = map.findAll { it.value>2 }
    assert(greaterThanTwo==["c":3,"d":4])
    //hint... remember the == operator?
    

Hack #5 : Closures

The each, findAll, sort strange backet-wrapped things we wrote before are actually “closures”. If you played a bit with some programming languages like Javascript you very well know what they are.

In Groovy closures are just a weird hybrid of a method object and an anonymous class.

Closures are interesting because they can be used as regular variables, passed as method params, stored in beans etc.

// we declare a clousure accepting 2 parameters
def multiply = { a,b -> return a*b }

// and pass it to a method
assert ( execute(10,15,multiply) == 150 )
//*******//
public static int execute(int a,int b, def op){
    return op(a,b)
}

Hack #6 : Metaclasses

The content of this 6th hack will explain some of the weird things you’ve seen happening in the previous ones.
Every class in Groovy (whether it’s a Groovy class or Java class) has a property named “meta class”, a container of properties and methods that expands the objects capabilities without really extending them.
This technique kills your UML diagrams, so be ready to some drama with purists.
How many times you thought about “Why this stupid library had to return this stupid object without a #£$%& method?”.
In Java, at this point, all you could do is wrapping that object in a decorating object, or creating a static method to call where needed.
The meta class comes in play right here, as you can…

class MyBean {
  int a
  int b
}
/****/
public static void main(def args){
 /*
  * We add the multiply closure to the MyBean meta-class.
  * This is going to be present for each MyBean
  * object instantiated in this program.
  */
  MyBean.metaClass.multiply {
    return delegate.a * delegate.b
  }
  doStuff()
}

public static void doStuff(){
  MyBean bean = new MyBean(a: 10, b: 15)
  assert ( bean.multiply() == 150 )
}

Be very careful!  In the scenario I originally outlined, if all developers writing business logic felt free to edit the meta-class of common classes, clashes would be certain! My advice is the core of the platform should introduce commonly used closures to foundational classes, but that’s it!

Hack #7 : delegation

This is probably one of the most underexposed features of Groovy, but I find it very intriguing and useful, especially once you have mastered what’s shown in Hack #6.

Meta-classes are cool, but they’re far from being good when you need to extend a class functionality in an elegant, robust way, and if -again- you cannot extend a class, then you’re probably back to the old wrapper, right?
In  many cases, your wrapped object is fine as it is, all you wanted was adding some extra functionalities, most of the time just to improve the language of your domain. In your wrapper, you will end up adding your methods and proxying most of the methods your inner object exposes.
In Groovy there’s a quick way to do it: the Delegate annotation.

class Banana {
   @Delegate
   DBObject databaseObject
   
   public Banana(DBObject dbo){
     databaseObject = dbo
   }

   Date loaded
   void peel() { /*...*/ }
   void throwAt(Gorilla g) { /*...*/ }
}

Now if you look at this snippet, we have an object belonging to a class we cannot alter (DBObject), wrapped in a Banana class, that includes a “loaded” property and the peel and throwAt methods.
When you instantiate a Banana object, it will expose the declared methods and properties, plus all method an properties of the object we delegated to (in this case databaseObject).

Hack #8 : JSON serialization

Serialization and deserialization are hell, we all know that.
The reason is pretty simple: if you serialize one object, you want to make sure the agent that is going to deserialize it will reconstruct the very same object, without exception. For these reasons, serial numbers are used and every update of the various software components are going to give you shakes.

Even though I totally understand the reasons of this approach, there is also another, lighter way to look at the problem: simple EJB property matching. And Groovy provides just that out of the box.

Let’s start with JSON serialization:

static class MyBean {
    int a
    int b
    def ops = [:]

    public void go(){
        ops['mul'] = a*b
        ops['diff'] = a-b
    }
}
public static void main(def args){

    MyBean bean = new MyBean(a:15,b:22)
    bean.go()
    println ( JsonOutput.prettyPrint( JsonOutput.toJson(bean) ) )
} 

Is going to print:

{
    "b": 22,
    "ops": {
        "mul": 330,
        "diff": -7
    },
    "a": 15
}

Perfect. Save it, store it, send it. The data is there.
But what happens when you are meant to write some code to consume it? Here’s two options:

JsonSlurper

Just slurp the JSON into a nested structure of maps and lists! After all, in our example, if your consumer has only interest in the stored data, then MyBean is superfluous.

MyBean bean = new MyBean(a:15,b:22)
bean.go()

String data = JsonOutput.prettyPrint( JsonOutput.toJson(bean))
def items = new JsonSlurper().parseText(data)
assert (items['ops']['mul']==330)

Loose deserialization

I would call this no more than a side effect, but it works, so why not. Remember when we said there’s a default constructor with a map as param for all Groovy classes? And remember you could use that constructor to populate the bean properties?

Well why not…

MyBean bean2 = new MyBean(new JsonSlurper().parseText(data))
assert (bean2.ops.mul==330)

Hack #9: be concise!

3 things that will improve the compactness of your code.

x times!

10.times { /* do sometimes */
10.times { i -> println(i) }

Multiple assignments

def (a,b,c) = [1,2]
assert (a==1)
assert (b==2)
assert (c==null)

Optional parenthesis and semicolon

println "banana"

Is perfectly legit.

Hack #10 : string templating

This is probably one of the most appreciated features in Groovy. I left it as last of my list because I wanted to make sure you read all the others before getting here. Hope it worked.
So In Groovy you can use both quotes and single quotes to wrap a String value. They are almost the same, but they’re not really the same thing.
Single quotes work just like the double quotes in Java: they’re inert, they simply state: this is a constant string.
Double quotes do almost the same thing, unless you use the “magic” character $. When you do, a String becomes a GString that works as a string template against the current scope.
Here’s an example that will explain it all:

int quantity = 55
String fruit = 'Banana'
String str1 = 'I want '+quantity+' units of '+fruit
String str2 = "I want $quantity units of $fruit"
assert str1==str2

This is awesome when you have the variables set exactly where you declare the string, but what if I really wanted to use it as a template that gets evaluated when I need it?

int quantity = 55
String fruit = 'Banana'
GString str2 = "I want ${-> quantity} units of ${-> fruit}"

quantity = 55
fruit = 'Apple'
String str1 = 'I want '+quantity+' units of '+fruit

assert str1==str2

Conclusion

You can run but you can’t hide: Java is awesome when you’re building the core of a platform, but when it comes to daily routine such as business logic or view/controller part of a web application, it’s just too tedious.
Groovy saves you in a number of occasions, and the downsides (we’ll talk about them in a future article) are negligible most of the time.

It is true that you might need to train your team to deal with the level of uncertainty of a dynamic language, but I think it’s a price you’ll get back with interest, over time.

I left out a lot of interesting hacks, but I decided to go with context specific article. I will try to talk about the rest later on.

Italian developers for dummies

“We’re having so many problems outsourcing to devs outside the Silicon Valley… how’s Italy?”

stock-illustration-50103160-made-in-italy-seal-italian-flag-vector-art-This is what the CTO of Silicon Valley star told me once.
Outsourcing to India works like this: you either pay very little money for shitty services, or you pay a reasonable amount of money for decent services. The 90s idea that India can be both cheap as junk food and high quality as French cuisine, has proven to be completely ridiculous.
Software development requires knowledge, mental strength, fantasy and initiative. These are qualities you certainly won’t find in demotivated people.

The question “how’s Italy” is more complex than one might think. There are a number of reasons a company should or should not outsource to Italian developers and teams, it’s all about understanding what you can accept and what not.

I will talk only about the things I know, so since Italy is not a compact, uniform country, I will refer to the north, north/east area.

Price

I’m starting with this simply because this is one of the main reasons people outsource.
Italian developers are pricey if you compare them with other popular outsourcing locations, like India or eastern Europe countries. Unfortunately the costs are just partly related to the cost of goods or the average retribution for this kind of career.
If you’re reading this from the United States, you might be shocked to find out software development/engineering is generally not a very well paid job over here. It offers a number of opportunities, more than most careers, but the salaries are medium-to-low. With that said, though, the overall gross expense is higher than one would expect, and the problem is taxation.
An auto mechanic in Tennessee has a gross salary similar to an Italian software engineer, but the American net income is almost 1/4 more (very rough calculation).
So if you’re considering outsourcing from the US to Italy, you have to be ready to pay salaries that are not the cheapest, but you should keep in mind that with a reasonable salary you can definitely hire the best around.

Bureaucracy

Doh. This is probably the main reason I would suggest you to outsource to an independent company rather than opening your own branch.
Bureaucracy in Italy is… overwhelming, there’s a number of cavils for every damn thing. Health, office safety, contracts, tax deadlines, overtime… everything requires lots of paperwork and time. You will need a good professional to assist you on these matters.
Open ended employment contracts used to be a life sentence. Things changed a bit lately, but still firing someone can be a big problem. On the other hand, contractors are not a big thing anymore, so finding short term help can be another problem. However part time and temporary contracts are available.

Loyalty

This is one of the things I kept hearing over and over about the software Eldorado, San Francisco: employees are not loyal, they suddenly leave for a better bid, and they do this continuously. If this a plague for you, in my honest opinion, you should consider having an Italian team. I’m not talking about an Italian company doing your job, but having your very own branch.
The sense of responsibility, the need to share the company vision and get things done until things are actually done is humongous. Involve the team, keep them high, motivated, reasonably retributed and they won’t leave for money. In years doing this job, I’ve rarely seen someone leaving for a better bid, but remember you need to pay attention to a number of other things (see work ethics) to make it happen.

Proficiencies

Simply put: the education system itself is not going to systematically provide what a high profile technology company would require, assuming it’s considering to outsource valuable tasks (as opposed to go-to-India tasks). This is not, at any level an accusation to professors who really try their best (well, most of them), but a systemic failure and I believe people working in the education field would agree. It’s not just the level of the education itself, but the quality of the evaluation of the acquired proficiencies that fails consistently, so the risk is to find people with the same degree but completely different abilities. I would consider the education level “average” and noticeably uneven on the territory, but we’re not looking for average professionals, are we?
A junior with an average interest in software development could be an inconvenient hire, unless you want to invest time and resources in training. As always in life, it’s the passion that makes a great professional, but my feeling is in Italy it is even more relevant.
It’s perfectly fine to hire junior developers, but the interview is more than essential to understand who you’re talking to, and a good experienced leader is mandatory.

A good point about Italian universities is a decent connection with the open source world and the use of open technologies is absolutely common. Not saying it’s the geek paradise, but geeks can definitely find an ecosystem that is compatible with their needs and this helps a lot the spontaneous creation of real coders communities. Remember Silicon Valleys rock stars were geeks before it was cool. These communities is where one should be looking for talents, and trust me, there are some of the most brilliant engineers-to-be you’ll ever meet among them.

I generally found front end web developers to be more up-to-date with modern technologies and patterns, while back end developers look a bit duller.

Fact. since long before the startup galore, most Italian developers have been dev-ops because a majority of Italian software companies are minuscule, and they simply cannot afford a developer that is “just a developer”.

To be completely honest, you could range from a bachelor degree that can barely write his own resume to a total guru with a high school diploma  a specialization in computer science. Eyes open.

Skills

I decided to split skills and proficiencies for a reason, and the skill part is more complex than the knowledge of things.
The best quality you will find in Italian software developers is creativity. Now this strict connection between Italians and creativity is both a stereotype and -trust me- a matter of fact.
In software/web development, creativity is the ability to craft alternative, innovative, more efficient or more elegant solutions to problems. In a context where the future becomes past in the blink of an eye, this quality turns a decent company into a visionary.
“Oh you Italians! (hearts) (hearts)” stop that, we’re not talking about Italian shoes. Jesus.
Italian creativity applied to the software industry has two faces.

One becomes the ability to evolve naturally, without a centralized propulsion. It is what I call compulsive improvement.
That propulsion has to be fed with controlled freedom if you want to make good use of it, but the advantages can be enormous.

The other one becomes the traditional ability of Italians to “always land on their feet” like cats, which translates into the ability to “get things working at all costs”.
While this is a fantastic ability when the project is messed up and you customer looks like Gordon Ramsey, I would be very careful not to abuse it. Landing on our feet does not mean landing graciously like a freaking ballerina.

To conclude, learn to take advantage of Italian creativity, and your projects will evolve to spectacular, unexpected conclusions. Guaranteed.

If you ignore the common place of the Italian living with the parents until the age of 35 (plot twist: it’s not a common place, it’s the truth for many people), you’ll soon discover Italians are pretty autonomous on work and like it more that way. In a country that praises the small/micro companies every damn day because it’s the actual core of the economy, every person is taught to be company. Give an employee a project and will most likely care for it as a puppy and feel responsible for. As a counter effect, she will want to have word on where that project is going.

Another interesting thing that I’ve noticed across a multitude of Italian software / web engineers is we tend to work on software like “watchmakers” if we’re allowed to do so (ie. no overworking, no anxiety). We like to work on minute things, sometimes for the sole sake of elegance. Balancing vision and craftmanship can be hard, and the predominant quality is definitely the latter. Even though this is great, just make sure we don’t get bogged down in some stupid academic exercise!
Also, in opposition to watchmakers, we’re not as good woodsmen: we can chop a forest if it’s needed (see “always land on their feet”) but it’s going to have a major number of quality downsides.

As last item I’d like to underline with pleasure how the resistance to learn new technologies and paradigms is generally very low if not inexistent. Training is a rare event in many Italian software companies, so it’s generally accepted with enthusiasm.

Work ethics

We’re now talking specifically of the scenario where you hire people for your company overseas. Now, given that all people are different and Italy has 60 million inhabitants, let me tell you what I know. It is a common misbelief that Italians do not work hard, on the contrary I can proudly state (still concerning the area I know well) Italians do work very hard. Of course, behind every stereotype there’s a bit of truth (see drawbacks). Also, sense of responsibility is another great quality in Italian software professionals.

From a general point of view, I think you should expect distinguished work ethics from Italian employees, but there’s a price to pay: company ethics.
And company ethics works as a great motivation. In the IT field, motivation is fairly different already from other jobs because the best software engineers and developers chose this career for that kind of passion that joins work hours and spare time in a continuous self improvement and to be honest, given what we said so far this is the only type of hire you want to do.
To motivate the passionate Italian IT professional you must provide things that are often quite rare in Italian companies, which I summarize in:

  1. Be transparent. Share with the company all the information that can be shared, and don’t tell lies you won’t be able to support. Italians would consider it a lack of respect, and respect is still a thing. Most Italians are instinctively very respectful with the top management / owners, but if they feel the management is not respectful enough with the employees, the problems they can generate are immense
  2. Keep anxiety levels down. Pressure is fine, but anxiety can be a strike out
  3. Plan careers. “Where do you see yourself in the company in 3/5 years?” is a question a few hear over here.
  4. Unfair retribution. Even though there’s always been a tendency to be “fair” in the retribution and the laws kinda endorse it, you should definitely pay the value an employee introduces in the company. It might be a shock and generate a bit of drama at first because people are not used to it, but politicized salaries create much more damages over time. Use 1 to achieve 4
  5. Identify key roles. Horizontal companies, Swedish style, can also work, but authoritative leaders work way better

Provide these 5 things, and you will have addressed all the major issues most Italian software companies seem unable to solve.
Provide ethic and get ethic back with interest. Don’t do it, and you will pay it dearly. Unfortunately Italians take lack of ethic or respect in a drastic personal way.

To conclude, you can get a superior kind of commitment and dedication from Italian engineers, but keep in mind it’s not to be taken for granted.

Drawbacks

Here’s a number of things to be very well aware of.

  • If you’re used to a fluid time sheet for office hours, I would suggest you to reconsider. From north to south, the concept of time… well, is relative. This doesn’t mean people will not work a consistent amount of time during the day, but you should be aware that if you’re loose on the entrance / exit schedules, you will quickly find yourself with even looser punctuality.
  • The problem is not how hard Italians work, but how efficient we are. The answer is: not much, in certain conditions. It’s not uncommon to see us running here and there like headless chickens, trying to do 10 things at the same time, achieving nothing. Draw a clear path, set up mid term objectives and you will get the best from your employees. Italian developers can be pretty chaotic when the goal is not visible.
  • If anxiety is unavoidable, get ready for drama. Italian drama is not something you are used to. We will rarely confront the top management directly, but this is even worse because the environment becomes poisonous pretty quickly and for everyone.
  • We live in a country that overprotected employees for too long. Now things changed a lot, but this caused a number of negative effects as well. For these reasons, Italians are extremely defensive on the rights that are left, even if some of them are remarkably nonsense. I wouldn’t fight that war, anytime.
  • Whining. Get used to it, it’s a daily ritual.

Conclusions

Being Italian myself makes me not the best judge of Italians, but working with international companies provided me a better point of view. I believe some of the things I pointed out in this article are true for many people, not necessarily Italians, but they definitely describe us well, or at least, they describe the pool of people I had the pleasure to work, hang out and discuss with.

It puzzles me why Italy hasn’t become a potential target for international hires in the IT field. The idea that the bureaucracy is the only problem is disrespectful to foreign companies. To me, the keyword is uncertainty. In bureaucracy, costs, education, expectations. For this reason, getting started can be harsh, complicated, but when a good team is established and the path is clear enough, the delivered quality is always remarkable.

The optimization deathtrap

“Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%”
Donald Knuth

Down the RabbitMQ hole

I tried to get this article started 3 times, failing every time. The reason, I think, is I wanted to be generically helpful. Then I realized that maybe I didn’t solve enough world problems for such a greater aim.
Let’s do something better. Let’s talk about the real world.

I was working on a big project of mine and everything was kinda working well. I had various agents with different peculiarities and technologies in need of talking to each other and receiving commands from other sources.
To have them talking, my logical, first approach was “let’s have them receive and send web service calls” and this is exactly what I started doing, but then my requirements (and common sense) hit me like a shovel on the nape.

  • Those web services will need to be rational, properly standardized according to a stable plan. If I don’t, it’s going to be a mess in a few weeks
  • Every message sent and received is extremely important. All the deliver/status/redeliver needs to be handled
  • Every process is important. I need to make sure that even once the message is delivered, it is processed completely
  • These agents cannot simply process every message the system throws at them. I need to make the system able to queue the messages for processing, according to the system capabilities
  • Multiple agents of the same type need to be deployed, and the system should be able to load balance them
  • Some messages should be issued once, but delivered to a number of consumers that could be interested in it
  • Some messages are just triggers, some others are actual synchronous RPC invocations
  • As part of the agents will end up in alien machines, it’d be great if they could initiate the connection to a trusted rack, rather than being open to connections
  • Some of the agents that need message delivered do not use the same technology used by the rest of the platform

This pile of things opened my eyes on an astonishing revelation:

Most of my worries relate to the code I write for handling messages between software components

As I learned very quickly, my needs are definitely nothing special, so if something worries me it probably worries at least one million developers out there. I swallowed my ego and the usual do-it-yourself attitude and started looking.

And the solution came out by looking into the number of software / frameworks I often use, most of which are made by or somehow related to Spring.
The answer is RabbitMQ, by Pivotal.

But before we get into the details…

Message brokers

Let’s make clear RabbitMQ is not a mysterious rare bird, but a message broker, a software that does -at a high level- two things:

  • Accept messages from agents written in any supported technology and deliver messages to agents written in any supported technology
  • Queue, organize, balance, manage delivery, redelivery and failure to delivery of messages

It is pretty obvious, at least considering what we’ve said so far, that this software would drastically replace a lot of code you’re normally used to write yourself, and as in every time you start considering to rely on someone else’s code, the first thing that comes up into your mind is “Oh shit, let’s hope it’s not some Sunday morning project”, which often

leads to: “If, long term, I don’t like it anymore, or the project fails, how am I going to replace it?”. And this is probably one of the most interesting parts of the story: AMQP.
I hate acronyms, seriously, but this one is sweeter than most: Advanced Message Queuing Protocol.
Which is, in fewer words than needed, a standard describing how a message broker middleware software should behave both from the behavioral perspective (how queuing should happen) and the protocol perspective (how the data should travel on the wire).

Every message broker that adheres to AMQP, can substantially be replaced with another one with a reasonable amount of adjustments. I’m not saying you shouldn’t put great care when evaluating a software that is going to take such a relevant role in your project, but at least you’re not venturing into the unknown.

Features

Now that we know what message brokers are, let’s see how they can help us in our requirements.

  • Point-to-point accept/delivery: the message is accepted in a queue and delivered to a consumer that is enabled to receive messages from that specific queue. There can be one consumer or many, but the message is delivered to just one. This is a pretty common use when you have multiple consumers of the very same kind to distribute the load among multiple nodes.
  • Publish-subscribe: the message is accepted in a queue and delivered to any consumer that subscribed to that queue. This strategy is a win when a software component triggers an event that can be interesting to a number of consumers. A good example is a system that receives documents and once the document is in, the event has to be notified both via email and SMS. The Email and SMS agents will subscribe to the same queue and receive the very same message. Moreover, using the routing functionality, you can have some agents being interested in messages flagged in a certain way, others in messages flagged in another way, or all of them.
  • Auto/Explicit acknowledge: the message broker considers a delivery succeeded when it receives an acknowledge response. Some messages are just notifications of events you want an agent to take action for. Some others are vital messages you want make sure they get fully processed. In the first scenario, your attention should focus on the certainty that the message successfully reached the agent and no network failures broke the message in transit. This is done via the auto-acknowledge routine that automatically sends the response once the message is in the agent.
    In the second scenario, you want the agent to reply the message broker once it succeeded in doing what it had to do. This moves a part of failure detection outside the agent code itself. Traditionally, what you would do is writing all the nice code that takes failure in consideration and if anything goes wrong “do something”. But a failure can also be unpredictable either in the place where it happens, or related to a system failure that is beyond code control (faulty memory, unstable state, out of memory process etc.).
    Moving part of the problem somewhere else could be a great opportunity to avoid even thinking on how to react to the worst failures. Explicit acknowledge won’t allow the message broker to consider the delivery done until an explicit acknowledge message has been returned by the agent. You can decide where this response is sent depending on your code, but a common strategy is triggering this event once the processing of the message has actually finished. If the processing fails for some reason and you are able the catch that event, then you will be sending a not-acknowledge, so that the message brokers will know the delivery failed. If it’s impossible for you to catch the failure, then the message broker will wait a certain amount of time and eventually decide the delivery failed.
  • Redeliver: in relation to what we just said, the message broker can be instructed to redeliver a message if a previous delivery was not properly acknowledged. Of course if a failure was caused by an unrepairable failure of an agent, then redelivery to the same agent won’t do any good, but in a scenario where agents are replicated, then there’s a good chance for the redelivery to succeed.
  • RPC: by using temporary queues and exchanges, it is possible to use the message broker as dispatcher for RPC calls. This is not exactly what message brokers have been designed for, but it’s impossible not to notice how this can be easily achieved. The invoker sends a message notifying what’s the name of the temporary queue it will wait a reply from. The remote agent will execute the call and respond in that queue that will eventually be deleted.

All the requirements are met.

Nature of the message

Messages can be anything, the message broker will not investigate what they’re made of. As long as bytes go through the wire, it’s perfectly fine.
Depending on the type of system you’re designing, though, the nature of the messages is something you want to study ahead of time. If you’re in a very homogeneous environment, such as a Java only context, there’s no reason why you shouldn’t binary-serialize your objects to simplify the code both from the producer and the consumer.
On the contrary, if your system is heterogeneous or might be in a realistic future, then you might consider working with universal formats, such as JSON.

RabbitMQ

RabbitMQAmong a number of very valuable message brokers, RabbitMQ definitely made it through my software selection.
The remarkable reputation that make it a relevant component of various softwares, the highly technological company backing its development, and the vast user base, are a good ID card of the software, but there are other things which really impressed me.
Its technology, entirely based on Erlang, is a clear statement of purpose: high availability by design.
It’s simplicity of deployment both for development and production environments, massively convention-over-configuration, makes you start quick and gives your ops a chance to learn new stuff when needed.

Even just these facts convinced me to take the risk to give it a try, not just as an experiment, but all the way to production, and let me say this: it definitely won the proof of time.

After a very quick deployment through the repository of the Linux distro, the application was up and running, and substantially that was it, the agents were gracefully able to connect and communicate with the message broker.

RabbitMQ also has a plug-in system, and it didn’t take long for me to realize how much I wanted to install the RabbitMQ Management plugin.
This plug-in adds a small web server that allows you to see exactly what’s going on in the message broker. Message deliveries, acknowledges, queues, messages awaiting delivery and confirmation, but connections and channels. Everything is right there, and especially during the exploration phase, this is extremely important because you will quickly find yourself with messages stuck or deadly redelivers. Learning to avoid getting stuck is pretty straight forward, but still this plugin gives you a great help from the software selection to the actual production environment.

The plug-in system has a number of interesting items you might be interested in looking into. One of them, that I think it’s pretty cool is an implementation of STOMP, a text streaming protocol, popular and widely supported. Applied to RabbitMQ, it allows us to expose most queues functionalities to external clients over the Internet with a finer control and security than you would have by exposing the AMQP sockets themselves.

Of course, as you would expect from such a system, RabbitMQ has clustering capabilities. Now, contrary to many scalable software components you might have used in the past, RabbitMQ has a number of options which work well in specific scenarios, so don’t rush it and read the documentation carefully.
I didn’t go through all the possible variations because I really didn’t need that much. In my case, basic clustering in a LAN allowed me to do everything, and it’s definitely a piece of cake to set up. I’m pretty sure the federation plugin (required when you need to cluster over a WAN) will require a bit more work, but I do believe these functionalities follow the same RabbitMQ principles.

Last, but not least in any possible way: reliability.
It’s all eye candy when you’re developing, but production is all another story. You can be rational and clearly evaluate how respectable the software is and how Erlang is beautiful in this kind of product. But at the end of the day, you need to see it with your very own eyes to start sleeping well the night.
For this reason, my words won’t help you much, but I’ll give it a try. RabbitMQ is absolutely exceptional in reliability and stability.
I had the luck to talk to two software engineers in different companies using RabbitMQ for large volume of critical messages and they all agreed with my impressions:
“It’s incredible how you drop it in and you just forget about it” Francesco said. I trust Francesco, and you should trust him as well.

Conclusions

I’m in that phase everybody doing this job periodically go through: how could I live without technology-name-goes-here.

How could I live without RabbitMQ? I probably can’t answer this question right now because I really consider it critical for my work.
Rationally speaking, it’s not what it does, but how it does it and that sense of safety and stability that eventually had me sleeping well again.  If you’re dealing with enterprise software with asynchronous tasks going in an out multiple agents, I do believe you should consider a message broker, and RabbitMQ is definitely my advice.