A question we see regularly: how to improve the productivity of IT departments? In this era of economic crisis, increased competition, globalization, how to further reduce costs, where to find new sources of optimization? In short, how to do « more with less »?
I’m sure many will think that I am talking about « how to improve the productivity of developers and projects »? But I think that, more and more, the Production departments are the ones doing better to answer this question, thanks to virtualization.
Previously, Production was seen as a constraint, because a good deal of bureaucracy, procedures and patience were necessary to be able to get any resource as a server or disk space.
And when a problem did occur in the production environment, it really had to be serious so that someone would decide to investigate. I mean, if your development or QA server was slow, nobody would really look at it.
Now, when you need a machine, or more CPU or disk space, you can send an email and expect to see your request fulfilled ASAP, usually within 48 hours or even the next day. Production’s response to user needs has really improved. With virtualization, a new VM (Virtual Machine) is installed with 3 clicks, and additional resources in your current VM with even less time.
This for the « more » part of the equation.
But there is a natural law that the easier it is to get a resource and the more increases the level of waste. And this law is true even with virtualization, in the form of a phenomenon known as the ‘proliferation of VM’, which has become currently the main concern for Production departments and the main source of costs, also with the exponential growth of the storage (on average 30% each year).
How to avoid inflation of budgets and do « more with less»? The answer is in the Capacity Management, which main axes we will present, as defined by ITIL.
Know what you have
Virtualization is a new market with a strong growth and competition is fierce between the various players to get larger shares of the market. Production departments are well aware of that competitive play and use it as most as possible in order to acquire at best costs hardware resources (CPU, memory, disks cabins, etc.) and software (OS, virtualization, etc.) necessary to their infrastructures.
Because of that, their environments are now divided into different technology silos. We will often find in the same department Solaris servers, IBM, x86, OS like AIX, Linux, Windows, etc. and virtualization solutions like VmWare or HyperV or others as such offers are more interesting from one year to another. Not to mention storage solutions SAN, NAS, etc. Certainly, the market segment where technological advances are the most important and the competition more fierce. If this last subject interests you, you certainly know the blog of my friend Philippe Nicolas.
This multiplication of technologies also allows to better meet the needs of Development departments. A critical Oracle database with a high volume of transactions will be more comfortable on a Unix server when a non-critical mail server can go into a virtual machine under Windows. And there will probably be a high-performance but also costly storage solution for the database and a slower but also cheaper disk for the mail server.
Managing such a heterogeneous infrastructure requires a Capacity Management solution that is able to recognize these different technologies and provide standardized metrics: the value of a MIPS varies between hardware manufacturers and software editors.
Know the status of what you have
Know the infrastructure is not enough, we must also know its health: what are the servers and virtual machines with saturation, resource contention, incidents. Better yet, it is crucial to identify the risks on these resources before a problem occurs that will put down the database or any other application critical to the company. When a banking system of payments can not meet deadlines because a backup was not completed on time, it is best to know it as soon as possible before the phone starts ringing.
The first task of Capacity Manager, when he comes to work every morning is to check the condition of its infrastructure. Again, a solution of Capacity Management should:
- Give an overview that identifies immediately if a machine has or will experience a problem (with alert thresholds).
- Allow drill-down from the CPD, the cluster, the server, to the virtual machine within the server, the resource on this VM, etc.
Answer user requests
As already said, virtualization brings a higher level of demand, and the response to a request for resources from the business must be satisfied as soon as possible. It is not only do more, but also better and faster.
It is therefore necessary to identify unused resources that can be used for this purpose, for example:
- A virtual machine is inactive or off, but consumes disk space.
- A virtual machine is dormant, unused for more than 20 weeks: a QA environment made available for a project, and the team has completed its QA but ‘forgot’ to report it.
Another challenge is to respond quickly to requests corresponding to an increase of the activity, as for example, an application with the number of users increasing sharply, event that Production is rarely informed of. In this case, the Capacity Manager must be able to identify which cluster is available to host a virtual machine with a higher profile of resource consumption. Still respecting the high availability (HA) thresholds of course.
Capacity Planning is the most advanced form of Capacity Management, when you must be able, not only to answer incidents or user requests, but proactively plan for future resource consumption.
Generally, it will provided to the Production or IT management a forecast of future needs in order to build the budget for the next year or the next period. The solution is to use available historical data in order to identify trends and plan the evolution of the demand for resources.
But there are also cases more complex as a merge with another entity, when we have to manage 400 new users. How to evaluate the need for additional resources, and not only the organic growth usually seen?
Other more frequent cases: sales time on a merchant website increases the number of visitors and transactions. How to ensure that the site will support the impacts and avoid degraded response time, or even complete unavailability? These unusual activities during atipical times occurs in all sectors: school holidays in the transport sector, a weather event or disaster for an insurance company, or simply the end of months or years for human resources or accounting departments.
It is then necessary to use the available information to perform a simulation and provide to the management data as objective as possible, allowing it to take the appropriate decisions.
When you look at these different activities of Capacity Management according to ITIL, it is interesting to make a comparison with Application Quality management. This will be an opportunity for a future post to answer questions such as:
- How to know you application portfolio?
- What is the quality of this portfolio?
- How to answer user requests?
- How to plan the evolution of the portfolio?
How to do more and better with less, for your applications?