I received some material from Capers Jones, well known author and international lecturer (I don’t need to present him, but just in case: http://www.namcook.com/aboutus.html). Capers Jones is vice president and CTO of Namcook Analytics LLC.
This is a good synthesis of the current state of software quality, so I just did a summary of the main points, as an opportunity to ask Capers some questions.
In 2012, more than half of software projects larger than 10 000 function points (about 1 000 000 lines of code) are either cancelled or run late by more than a year. The following table shows results of about 13 000 projects categorized on six different sizes, and percentages of them that are on time, early, late, or cancelled.
Table 1: Software Project Outcomes By Size of Project
|1 000 FP||1.24%||60.76%||17.67%||20.33%|
|10 000 FP||0.14%||28.03%||23.83%||48.00%|
|100 000 FP||0.00%||13.67%||21.33%||65.00%|
Q: Capers, how did you manage to get all these data about all these projects? Are they all from US?
I worked at IBM and ITT and had access to all of their internal data. I also had a team of a dozen consultants bringing in data from 25 to 75 projects per month from hundreds of companies. I’ve worked in 24 countries but about 90% of the data in the table is from the U.S.
The best countries for quality are Japan and India. Productivity is more complex because of huge variations in work hours. In the U.S. we have a nominal week of 40 hours; it is 44 hours in Japan; only 36 hours in Canada. There are also major differences in vacations and public holidays. This is too complicated for a quick answer.
Q: You say that 10 000 FP is about 1 000 000 lines of code. Is this a measure that someone can use to compare his projects with these data, when he does not have FP estimation? Or is there any other measure like, for instance, size of the development team, man/year?
The ratio of source code statements to function points is called “backfiring” and was first developed in the 1970’s by IBM. The ratio varies by language or combination languages. COBOL is about 105.7 statements per function point; C is about 128 statements per function points; JAVA is about 53 statements per function point. Several companies sell lists of these ratios for about 1000 programming languages.
Volume of defects
When examining troubled software projects, it always happens that the main reason for delay or termination is due to excessive volumes of serious defects. Table 2 shows the average volumes of defects found on software projects, and the percentage of defects removed prior to delivery to customers for different origins.
Table 2: Defect Removal Efficiency By Origin of Defects Circa 2012
|Defect Origins||Defect potentials||Removal Efficiency||Delivered Defects|
(Data Expressed in Terms of Defects per Function Point)
Q: We can see from the previous table that Coding defect potentials are higher than any other ones. Have you seen any evolution among the different technologies used on projects ? A lot of people think that new technologies, because more complex, are prone to introduce more defects?
Code defects are primarily related to the level of the languages used. Every source of defects is impacted by complexity, and multi-tier applications such as client server tend to have more requirements and design defects.
Q: We can see also that Requirements and Design defects outnumber Coding defects. Same question: have you seen any evolution there? With the emphasis put on outsourcing or ‘rushing the application out of the windows’, some might think these defects happen more and more frequently.
There are effective tools such as the FOG and FLESCH readability tools; text static analysis stools; UML static analysis tools that can reduce requirements defects. So can requirements modeling and pattern matching. Agile embedded users help too, but with limits. No single user can understand all of the features for 10,000 function points or 10,000 users.
The software industry should be using modern 3D animated requirements and design methods – not just text and static diagrams. If you use suitable methods you will have high quality and short schedules at the same time.
From doing expert witness work in lawsuits on failing projects, most fail because of poor change control and poor quality control. Too many bugs present when testing starts stretches out the test cycles to almost infinite length.
Two general strategies are needed for successful quality control:
- Reducing defect volumes or “defect potentials”.
- Raising defect removal efficiency (DRE) – we can notice DRE is not very good for non-code defects.
Reducing defect volumes
Some of the methods that reduce defects include:
- Use of requirements modeling
- Use of requirements “static analysis” tools that find errors and ambiguities
- Use of formal requirements and design inspections
- Use of embedded users as is found in Agile development
- Use of test-based development where test cases precede design
- Use of formal prototypes to minimize downstream changes
- Formal review of all change requests
- Use of automated change control tools with cross-reference capabilities
When formal inspections are added to the cycle, defect potentials gradually drop from 5.0 per function point below 3.0 per function point while defect removal efficiency levels routinely top 95% and may hit 99%.
Automated static analysis is a fairly new form of defect removal that also has benefits in terms of both defect prevention and defect removal.
One interesting aspect of controlling requirements is a reduction in unplanned changes or “requirements creep”. Leading projects where the requirements are carefully gathered and analyzed average only a fraction of 1% per month in unplanned changes. Joint application design (JAD), prototypes, and requirements inspections are all effective in reducing unplanned requirements creep.
Raising Defect Removal Efficiency Levels
Many forms of testing by developers are less than 35% efficient in finding bugs or defects although some top 50%. Testing that uses formal test case design methods and certified test personnel is usually at the high end of the spectrum and often tops 50%.
Formal design and code inspections are more than 65% efficient in finding bugs or defects and sometimes top 85%. Inspections also raise testing efficiency by providing more complete requirements and specifications to test personnel.
Static analysis is also high in efficiency against many kinds of coding defects. Therefore all leading projects in leading companies utilize synergistic combinations of formal inspections, static analysis, and formal testing.
This combination is the only known way of achieving cumulative defect removal levels higher than 98%, leads to the shortest overall development schedules, and lowers the probabilities of project failures.
Q: In the above methods, some are automated and could be cheaper, other – as formal design and code inspections – cannot be, and thus are probably more expensive. Have you ever tried to define a ratio between the cost and the efficiency of these methods?
You are mistaken. Before IBM introduced inspections for a major data-base application testing took 3 months of 3 shift testing. After inspections testing dropped to 1 month of 1 shift testing. The inspections took about 6 weeks. There was a net reduction in costs and schedules and an increase in defect removal efficiency from below 85% to more than 95% at the same time.
Your question reflects a common problem – most companies don’t measure well enough to understand quality economics. I have data on the comparative costs of every combination of pre-test inspections, static analysis, and testing. The best results come from a synergistic combination of defect prevention, pre-test defect removal, and mathematically based testing by certified test personnel.
The worst results come from testing by developers or amateurs with no pre-test inspections of static analysis. Everybody should know this but measurements are so poor most managers don’t have a clue.
Q: Would you say that the efficiency of these methods varies depending on the size of a project? One can think instinctively that the larger the project, the higher the risk to find design & requirements defects while the code defects should stay (more or less) stable?
As applications grow in size defect potentials get larger and defect removal efficiency gets lower. That is why pre-test inspections and static analysis grow more valuable as applications get bigger.
Q: I would say that for large projects, it is recommended to industrialize all these methods. But for smaller or mid-size projects, or for a team not using these methods, would you recommend some that you would consider easier or more important to implement first? To say it otherwise, is there a best path to maturity?
This is too complicated a question for a short answer. Static analysis is cheap, effective, and easy to use for all sizes. Below 1000 function points Agile is OK; above 10,000 function points TSP is best. What companies need are not general answers but specific predictions that show the set of optimal methods for specific projects. My main work is building tools that can predict specific results for specific projects.
Q: Last question, to conclude, and probably one that most people are curious about: what is a typical day for you? Doing consulting for a customer? Working on a paper or a new book? Going fishing or playing golf?
Years ago when I was working on my first book I had to be at work by 8:30 so I started getting up early to write at 5 AM. After 15 books I can’t sleep late. I usually write books of articles from about 4 am to 10 am; switch to software or business tasks from 10 AM until 4 PM, and then relax or play golf later in the day. I also work out at a gym three days a week.
Much of my consulting and also speeches are remote. On Nov 8 I’ll be keynoting a conference in Stockholm, Sweden remotely. The topic is “Achieving Software Excellence.” On Nov 13 I’m doing a webinar on “Software Risk and Value Analysis.” On Dec 4 I’m doing a webcast for IBM on “Measuring Software Quality and Customer Satisfaction.”
I do travel to major conferences such as keynoting the Japanese Symposium on Software Testing, the Malaysian Testing Conference, and several metrics conferences by the International Function Point Users Group.
Capers, many thanks for the material you sent me and for taking time to answer my questions.