Legacy application – Refactoring or reengineering? (VII)

We have seen in the previous post how to use the SonarQube dashboard to estimate the effort of caracterization tests, recommended by Michael Feathers in his book ‘Working Effectively with Legacy Code’.

We categorized the various components of our Legacy application (Microsoft Word 1.1a) in different groups, the simplest functions with Cyclomatic Complexity (CC) of less than 20 points, the complex and very complex functions up to 200 points CC, and finally 6 ‘monster’ components.

We built a formula based on the Cyclomatic Complexity and a readability factor, in order to evaluate the testing effort on each of these groups.

Evaluation of results

We got a total score of 234 days, broken down as follows:

Table 1 – Distribution of the test effort by category of components

93.5 days to cover 60% of the functions with a CC lower than 20 points, representing 40% of the total effort and 44% of the total Cyclomatic Complexity of the application.
92 days (39% of total time) for 503 functions between 20 and 100 points CC, representing 44% of the total CC.
25 days (11% of total time) for the 30 functions between 100 and 200 points of CC, representing 9% of the total CC.
23.5 days or 10% of the total effort for the 6 most complex functions, representing 3% of the CC of the entire application.

This effort allows us a test coverage of 65.5% of the functions, 2/3 of our application.

I spent a lot of time on the formula, so as to obtain a valuation rule that is logical, consistent, and understandable, based on the number of logical paths in a function, represented by the Cyclomatic Complexity, and a greater or lesser ease of reading. A developer would proceed like this if he would be asked to estimate the number of days that he would need to develop unit tests on a code that he does not know.

I also wanted the rule to be accurate (yet simple), so that it would be possible to adjust the formula based on the size and the complexity of the components. I played around with different settings for the ‘Readibility Factor%’ and ‘Characterization Test time’ in order to achieve a result for each category of components that seems satisfactory: 54 minutes for writing tests for a function between 12 and 20 points of CC, 120 minutes for a function between 40 and 50 points of CC and 200 to 300 lines of code, 240 minutes (half a day) for a function between 60 and 70 points of CC and between 500 and 700 lines of code, etc.

Once calculated the various tables with this formula, and looking at the final results, my first reaction was that I found them quite realistic. 234 days for a test coverage of 2/3 of 3 936 functions distributed in 349 C programs: it looks correct to me.
I even think that it is not undervalued, considering that the times required to complete testing of the simplest components (<20 CC) to moderately complex (20 <CC <50) are, I think, estimated generously. For the heavier components, I find hard to get a very clear idea of the accuracy of our assessment. But three days for a complex function with 200 points and 750 CC lines of code do not seem inconsistent.

The distribution of the test effort also seems interesting:

Two figures very close of about 90 days for each of the first two categories, with around 19 000 CC points for each category on 3 400 simple components and also for 500 complex components.
Two close figures again of 25 days for 30 very complex components and 23 days for 6 ‘monster components’.

When I say that these results seem realistic, I do not mean they are absolutely accurate and reliable and foolproof, but that they can bear objections.

Objections

I finished the previous post asking if this estimate seemed correct and the response was quite positive. I was told that it was ‘OK’, and that the ‘scalability’, that is to say the changes in scale between the different categories depending on the growing CC, seemed fair enough.
However, as a consultant, I have to prepare myself for any objection during the presentation of these results to a client. What could they be?

A stakeholder: « I do not understand: you say we need 4 minutes per point of complexity for the simplest components and 2 minutes per point of complexity for the more complex components. Should not it be the opposite? ».

In fact, the writing of a test will require the development of a specific function for this test in order to verify one or more value(s) for each logical path and thus each point of complexity. If I have only a single point of CC, I’ll spend less time writing a line of test and proportionately more time developing the function, compile it, run it, verify the result of the test and possibly modify the function and test it again. If I have multiple points of CC, I’ll spend a little more time to try different values and proportionally less time in the writing and execution of the function.

We do not really see any gap when moving from a simple function between 12 and 20 points of CC with 54 minutes of completion time of the tests, to a moderately complex function between 20 and 30 points of CC and 50 to 60 minutes of testing depending of the size of the function, in lines of code).

A member of the project team: « Your formula is fine, but I think the Readibility Factor is not progressive enough. For instance, the RF% suddenly goes from 4 to 10 for the extremely complex functions ».

I do agree. In order to have a second hypothesis, an higher one for our assessment, I raised this factor as follows:

For functions with less than 100 lines of code (LOC), I let the RF% = 1.
From 100 to 200 LOC, the RF% increases from 1.5 to 2.
From 200 to 300 LOC, the RF% increases from 2 to 3.
From 300 to 500 LOC, the RF% increases from 2.5 to 4.
From 500 to 700 LOC, the RF% increases from 4 to 6.
Beyond 700 LOC, I kept the current settings.

Here is the new table with these parameters:

Table 2 – Calculation of the test effort on very complex functions (high hypothesis)

We go from 117.2 days to 131.8 days, an increase of 14.6 days, which is an extra effort of about 12% of the initial charge. So the impact of this change is relatively moderate. Moreover, we find that this increase is most important on the components of higher size (makes sense), which are not the majority. For example, we are going from 4 hours to 5 hours for a component from 60 to 70 points of CC with over 500 LOC, but as we has we have just one in our application …

The CIO : « How reliable are your results? To what extent can I use these numbers to decide whether to launch an outsourcing of this application? ».

Let me clarify this by presenting you an action plan.

Action plan

We have a total of 234 days for this project, or 248.8 days if the RF% parameter is modified as above, representing between 11.7 and 12.5 man/months on the basis of 20 days per month. We will base our action plan on these two hypotheses, the first hypothesis ‘average’, the second one ‘high’.

In fact, we do not have 20 days a month, because that would assume that nobody is taking holidays. We consider (in France) that a person receives five weeks of vacation and two weeks of public holidays. This gives us a total of:

52 weeks a year – 7 weeks of absence = 45 working weeks.
45 weeks x 5 days a week = 225 days per year.
225 working days / 12 months = 18.5 working days per month.

Our total workload is of 234 days (hypothesis 1) or 248.8 days (hypothesis 2), representing respectively 2.5 or 2.65 man/months for a team of 5 people and 18.5 days per month.
I think that 5 people is a correct size for the team in charge of ensuring the future maintenance of this application: I doubt that the original team at Microsoft has been less numerous. Based on this team, the knowledge transfer for our project would be less than 3 months.

Now, three months is the time used systematically for a ‘knowledge transfer’ phase in any response to a tender or a RFP. Less than 3 months is not credible: it would require a larger number of people to ensure this transfer that the number of people required in the subsequent phases of maintenance and development. This would mean that you do not complete the transfer phase correctly and thus greatly compromise the success of the outsourcing.
Beyond 3 months, you take the risk of increasing the planning and thus the cost of the project beyond what your competitors will propose, and consequently lose the contract. So everyone is going to propose a period of 3 months, and this time is quite acceptable for a CIO.

We will therefore base our action plan on a schedule of three months, with a workload of 2.5 (or 2.65) man/months for a team of 5 people. This gives us a margin of 20% (3 – 2.5 = 0.5 = 20% of 2.5) or 13% (for a charge of 2.65 man/months).

The following table summarizes these results:

Generally, the job of managing the project is evaluated to 20% of the total workload, but I think that on a phase of knowledge transfer, the need for project management will be lower. In any case, we should benefit from a bit of margin, if we start with a schedule of three months.
So I would make the following proposals:

Start with the more complex functions

The 30 most complex functions require 25.1 days in our hypothesis 1 or 26.9 in our hypothesis 2. 6 ‘monster’ functions require 23.5 j. In total, I’ll round up to 50 days of work, or 10 man/days for a team of 5 people.

Our advice is to use the first two weeks of the schedule to write the tests and document the most complex functions with regular progress meetings (at least 2 per week) to see if this first part of the planning can be held normally in 2 weeks. If this is the case, we can be confident for the rest of our project.

Share tasks between the two teams

Another recommendation I would like to suggest for the first phase: divide the work of development of tests on these most complex components between the two teams, the first who already knows the current code, and the second one to which is transferred the knowledge. We can work in two ways:

Joint development: a developer of the new team writes tests with the help of a developer of the current team.
Separate development and sharing knowledge: each developer programs separately his tests on the most complex functions (not the same ones at the same time, obviously). Both meet in the late afternoon to present their work, comment it, explain and facilitate the transfer of knowledge to the developer of the new team.

The first option is probably the one that will optimize the quality of our test coverage, since we get the benefits of the discovery and understanding of the code by a developer of the new team, with the help of a developer who already knows this code. The disadvantage is that it requires additional resources from the current team, not directly productive as in the second option.

I must say I have a preference for the second option, which looks to me as presenting more benefits. The developer of the new team performs characterization tests that enable him to discover and understand the behavior of the application. In parallel, a developer of the current team writes unit tests on a code he knows, before moving it to the developer of the new team. It is important, however, to set aside time for both to share their work and that the developer of the current team verifies and completes the work of the developer of the new team, or explain the tests he himself has developed. The sharing of this time would also allow to document the tests. I think an hour a day working together at the beginning would be great, but to 2 hours per day will probably be necessary rather quickly however.

In both cases, this division of the work is useful for:

Ensuring the knowledge transfer and the quality of the tests on the most complex functions.
Better monitoring of this project phase, probably the most difficult.
Ensuring the calendar and avoid any mishaps at the beginning of the project, thanks to the knowledge of the code by the current team.

Modulate resources efficiently

In both cases, we will need resources from the current team, but again, over a relatively short period of about two weeks. However, it is possible to adjust, more easily with the second option, by using fewer developers of the current team. We can even mix the two options: for example, joint development for the 6 ‘monster’ functions and separate development for the 33 other highly complex functions.

I think this is the kind of recommendation that we can offer to the project committee, and see what think the different participants. Between joint or separate development, more or less different resources. In fact, all possible variants are imaginable. For example: 2 developers of the current team and 3 developers of the new team, for a total of 5 people, share the work of writing tests on these most complex components. During the same time, 2 other developers of the new team start working on less complex components. This would allow to:

Ensure the schedule on the characterization of the most complex components, with two people who have the knowledge …
… while gaining a first insight into the writing of tests on the components between 20 and 200 points of CC, for a better visibility across the application and therefore the project.

Depending on the various choices, we will obviously have to calculate the additional wokload on the current team and the schedule.

Modify the test coverage

I think that starting with the most complex components, and aiming a complete coverage of these components up to 100% of their Cyclomatic Complexity should ensure the transfer of knowledge about the most difficult, the heaviest and the most complex part of the application, and therefore of our project. Then we can go to the 503 functions of – relatively – less complexity.

What’s going on yet if we are falling behind? If we want to meet the deadline of the three months schedule that we have set, we will have to reduce the workload and thus the test coverage.

We evaluated this coverage to 66% of the Cyclomatic Complexity of the application, 100% of the components with more than 20% CC and 60% for the other ones. We said in our previous post that the test effort matched (more or less) a Pareto distribution, and that we needed about the same time to develop 70% or 80% of the tests that the time needed for the 20% or 30% remaining. This is relative, but a minor decrease in our test coverage should enable us to reduce the workload in a proportionately larger way.

For example, if instead of a coverage of 100% of the components between 20 and 200 points of CC, we plan to lower this coverage to 80%, our workload changes as follows:

93.7 days versus 117.2 days in our hypothesis 1, or a difference of 23.5 days, 20% of the initial charge.
105.4 days versus 131.8 days in our hypothesis 2, or 26.4 days for a gain of 20%.

In fact, we can modulate in different ways again. For example, a coverage of 80% of the components of less than 50 points of CC and 100% for those between 50 and 200 points of CC would give us the following results:

105 days instead of 117.2 days in our hypothesis 1, a gain of 12.2 days, about 10% of the initial charge.
118.7 days versus 131.8 days in our hypothesis 2, or 13.1 days for a gain of about 10%.

So we can see that, in case of any delay, it is possible to reduce it by winning days against a minimum reduction of the test coverage, without sacrificing (too much) the quality of the knowledge transfer.

Synthesis

One of the missions assigned to us consisted in calculating the cost of a knowledge transfer of this application to another team and propose a strategy for such a project.

We relied on the concept of characterization tests by Michael Feathers, with the double advantage of having a test coverage for our Legacy application and to ensure the knowledge transfer.

We built a formula based on the Cyclomatic Complexity and a factor of readability, trying to find the right balance between simplicity and accuracy. We examined the possible objections to the results obtained with this formula.

Finally, we built an action plan, with various proposals that can convince (or reassure) a CIO of the reliability of our analysis and the feasibility of the project.

In our next post, we will work our second scenario: refactoring our Legacy application, with the help of the SonarQube’s Sqale plugin.

This post is also available in Leer este articulo en castellano and Lire cet article en français.

Qualilogy

"Half of my budget is wasted… I just don't know which one"