Back from summer vacation and back on this series of posts about using SonarQube with a Legacy C application, in this case the first version of Word published by Microsoft in 1990.
We posed the following hypothesis: Microsoft has just been sold and its new owner asks you, as a Quality consultant, to recommend a strategy for this software.
Do not think that this never happens: software editors are purchased every day, and R&D people and software code are at the heart of these acquisitions.
Specifically, you have to answer three questions:
- What would it cost to transfer this application to a new team? This is the question that arises whenever one wishes to outsource a Legacy application to a service company.
- What would it cost to refactor this application to improve its quality and reduce maintenance costs? Any Legacy application is generally characterized by a high technical debt. Refactoring would reduce the interests of this debt, and thus the costs of transferring knowledge to an outsourcer and/or the maintenance costs.
- What would be the cost of reengineering this application? Refactoring often means rethinking the design and the architecture of the application. Why not take the opportunity to rewrite this code with another language, more recent, less difficult to maintain?
Obviously, you must not only try to answer these three questions, but also to propose an action plan for each of these strategies.
In the previous posts, we have seen how to analyze this Legacy code with SonarQube, and the results in terms of metrics – size (LOCs), complexity (CC), comments and duplications – as well as various ‘Issues’, Blocker, Critical, Major et Minor.
We also verified the existence of ‘monster’ components : large and complex functions and programs, with a high number of violations of programming best practices, particularly in terms of readability and understandibility of the code.
Legacy code
Originally, this describes an application ‘inherited’ from a previous generation of programmers, developed in a language of an earlier generation and running on a platform (hardware, OS, …) also old or unsupported. This did reference primarily to Mainframe Cobol code.
Today, younger generations are more often using this expression to refer to older applications (over 5 to 10 years) and / or without web interface and / or written in not object-oriented languages (C, Forms, PowerBuilder, Visual Basic, etc.)
In all cases, the term ‘Legacy’ evokes a code difficult to understand, with a large technical debt and often few and outdated documentation.
Edit and Pray
According to Michael Feathers, author of a very recommended book – “Working Effectively with Legacy Code” – Legacy code is characterized by the absence of tests.
In the chapter ‘2 – Working with Feedback’, he describes a very common process, especially in R&Ds of software editors: “Edit and Pray” (that it still works). When facing a change request, the developer will carefully plan its work, first make sure he fully understands the existing code and the possible impacts, then implement changes, compile and verify that it works correctly, and finally perform some additional tests here and there in the potentially impacted areas to ensure that there is nothing broken. Then he will send this evolution to a team of testers who will perform a battery of tests of regression and other more or less automated tests to ensure the proper behavior of the application.
In the best cases, these tests will be done overnight, and our developer will know the results the next day, if the code is valid or at least validated. In most cases, all developments of the week will be tested at the end of the week, with some additional manual testing, and validation will only be done on the next week.
Another practice still common today among software vendors is to define:
- a release date of a new version of the software, for example on October 15;
- a development phase during which the changes are implemented, for example until September 20;
- a QA phase (from September 20 to October 15) during which all change requests are frozen, the new version goes into the hands of the QA team, no new development is launched as developers are dedicated solely to the correction of problems discovered by the QA team.
Cover and Modify
Another way to work according to Michael Feathers is with a coverage of unit tests in order to protect our code from the consequences of any change. What are the benefits of a good test coverage?
- You have a battery of tests that you run every time you make a change, successfully most of the time. Sometimes you make a mistake, such as changing the logic in a condition, and one or more of the tests fails. You correct your mistake rapidly: the feedback is instantaneous, no need to wait for tomorrow, next week or the next QA phase to see if the changes made in the code introduced a bug or a regression.
- You must change this method or this very large and complex function. This is an opportunity to make a refactoring and split it into several other simpler methods. But you fear that such a change would break what did work before. “Do not fix it if it is not broke”: that is unfortunately the best way to develop new Legacy code above the existing Legacy code. If you have tests, not only yourself but also the entire team will have greater confidence in doing changes. You can improve the existing, avoid inflation of technical debt, and working with the pleasant feeling of developing quality code instead of adding new weight to the burden already heavy on your shoulders.
- You develop some new unit tests to cover your development and test it. Any future modification that would make another member of your team will be easier, faster and safer.
Contrary to what one might think, developping unit tests does not mean working twice more or developping more slowly. The code for these tests is usually very simple, and the additional workload it represents is actually not that important compared to the benefits. It is also a practice increasingly common, to the point that the absence of unit tests is considered one of the seven deadly sins of development.
Anyway, writing code does not represent the biggest cost: a developer spends more time trying to understand the code that needs to be changed rather than developing this change. In fact, you can not even estimate the time required to implement a change if you do not already know the existing code and what it does.
This is why readability and understandability of the code are one of the most important sources of high maintenance costs, due to the fact that the time spent programming is negligible compared to the time needed to understand what the code does and how to implement any change.
Now, if you already have unit tests, usually very simple and very readable, you enormously reduce this burden.
But what about our case, when our Legacy application does not already have unit tests? This is what we will see in the next post.
This post is also available in Leer este articulo en castellano and Lire cet article en français.