In our last post, we explained the concept of elasticity as the ability to move resources within a virtual infrastructure to answer business demands and (not always predictable) peaks with the best possible responsiveness.
Now, it’s not only about the ability to ‘inflate’ infrastructure by adding necessary resources, but also to ‘deflate’ it by reallocating these resources elsewhere. Like a balloon, the more elastic the infrastructure, the more it is easy to inflate and deflate.
What does that mean for your applications? Are they ready to go into the Cloud?
Elastic software means:
- Scalability: maintaining application performance when the number of users increases – and usually also the volume of data.
- The ability to release resources when the load decreases.
There are two kinds of scalability:
- Horizontal scalability: when you distribute an application on different machines. For example, you add additional instances of your application server and use a load-balancer to distribute the load.
- Vertical scalability: add capacity (memory, CPU, hard drive) on the machine hosting your application.
Horizontal scalability is not the most elastic, since it might need to manually add instances of architecture which could remain largely underused when there are no load peaks. In other words, you inflate the balloon but do not know how to deflate it. This is not really dynamic.
And to maintain an infrastructure based on the redundancy of various instances of the application architecture results in an infrastructure oversized to cope with peaks of activity, and does not use more than 10 or 20% of the capacity the rest of the time. Exactly the same inconvenients of a physical infrastructure.
Vertical scalability is often much more elastic. You install your application on a virtual machine of which it will usually take 50% of the resources, within a server hosting other virtual machines with applications less prone to load variation. When comes an unplanned peak of activity, the virtual machine may borrow resources to its neighbors on the same server. This ability to borrow memory is also known as “ballooning” (with VMWare). Once the peak ends, the VM will free the borrowed resources and return them to its neighbors. All this dynamically, with less saturation and contention possible.
But vertical scalability quickly reaches its limits: you can not increase indefinitely CPU or I/O performance. You can inflate and deflate the balloon, but within certain limits.
Any application will gain in elasticity by going to the Cloud.
For example, some treatments consuming memory, such as creating an object, must not be performed within a loop. Most loops are used to carry out processing of data in a table. The more a table contains data, the more important the number of iterations of the loop, and the greater the creation of objects and the memory consumption. Add more users and the more the application will perform these treatments and will require memory.
This is a typical example of an application with a low scalability that you can resolve ‘elastically’ by adding memory during peak activity.
Another example: you load-balance two application servers on two virtual machines. So you have two copies of the same software installed in memory. If the hypervisor is able to detect these sets of identical memory, it can establish a mechanism to share memory so that a single memory space will be used and accessed by two virtual machines, seamlessly. Thus saving memory resources.
Some programming techniques will help make an application more elastic. The first and most often cited is to make the application as most stateless as possible. What does this mean?
Very simply, a stateful application stores in memory the data for the user session and its context. The session is kept on the same server as long as the user is connected: it is not possible to transfer this session – the data stored in memory – on another server.
Let us imagine a commercial website with a high activity between 9:00 and 17:00, corresponding to 10 000 users who are spread across 10 servers. If the application is stateful, all data are stored in memory, and the memory will be freed only when users leave the site by logging out. If 10% of users forget to log off, or if the activity falls to 1 000 users, then we will keep in memory 1 000 sessions spread over the 10 servers. In other words, the application will use 10% of the 10 servers where a single server would be sufficient.
Now if the application is completely stateless, the session is not attached to a server. Although 10% of sessions remain active, it will be possible to release the servers when the activity decreases, to keep one busy server to 100% (or rather two busy servers at 50%).
It is difficult to program completely stateless applications, since we must store the data somewhere, and if not in memory, it will be in database, at least temporarily. But a database access is more expensive in terms of performance than a memory access. We will therefore look to develop the application into different modules or different sub-applications, some stateless, other stateful.
For example, the application that manages the user’s cart will benefit from being programmed in stateful mode:
- These are data that are most accessed and modified during the session, so the best performance is required: nothing worse than having to wait 10 seconds every time you add a new item in your cart, you will leave this site quickly and never return.
- The cart has a limited life duration: typically 30 to 60 minutes. If a user keeps connected without updating his cart, the session will be removed from memory at the end of this period, which will ‘deflate’ memory. This solves the problem of the 10% of users who forget to log off.
Other data, such as identifying the user, can be managed by a stateless module with a temporary table. Important note: we also see from this example that it is important to think in terms of life of each module and to program it through a time-out.
Asynchronism is another mechanism that promotes elasticity. The processing of your cart on a commercial site does not end with the completion of your purchase and the payment: these data are then used to prepare the delivery, do accounting entries, bank transfer, not to speak of the CRM and other marketing stuff. But all these treatments are not urgent and can be made when the servers are less busy, during the night for example. We will therefore seek to untie the front-end – the retail – by sending asynchronously a message to the back-end accounting and logistics applications, with the data of your order so that they are processed during the decreased activity with the idle resources previously released.
The main obstacle to the elasticity of applications is in my opinion the database, certainly the most difficult to scale horizontally. Actual recommendations for management of persistence in a virtualized world seem to promote again some modularity that allows to distribute information on different databases.
Finally, I think the realization of elastic applications for the Cloud will bring profound changes in methodologies and project lifecycle, as better integration between development teams, QA and production teams seems ineluctable. A kind of fusion between ITIL and Agile.
The main concern of development teams and QA is to deliver applications on time and of the highest quality. It is not about the consumption of resources. Unless you have teams dedicated to the achievement of virtualized applications, taking account resource consumption or elasticity is not in the agenda. Now, eventually these two worlds will have to meet when service availability becomes the most critical factor, on Service Level Agreements.
So I am convinced that each developer will have to add virtualization skills to his resume and that each service company will have to acquire these skills and methodologies to develop elastic applications.
The Cloud is coming out of the computer rooms. Get ready.