Audit of a Legacy C application – Microsoft Word 1.1a (II)

C_Analysis_Word2In the previous post, we have examined the first results of the analysis of the source code of Word 1.1a (1990).

We counted 349 files, which is not huge, but with a rather high size: on average more than 470 LOC (Lines Of Code), and many of them well beyond 1 000 LOC. The complexity metrics are also high, and the rate of comments quite low, but it was probably normal at that time, more than 40 years ago.

The very low level of duplication maked me think that all the components needed to implement a feature can be found in the same file, which explains the size and complexity of many of them. It seems that the motto was : priority to efficiency, then readability and understandibility are less critical.

But what about the compliance with good programming practices? This is what we will look at in this second post.

Compliance with best programming practices

A glance to the ‘Treemap of Components’ shows a rather low observance of best programming practices.

C_Word_Trremap

The most numerous and more complex files are in the main directory (‘Opus’), and have between 20% and 50% off ‘Rules compliance’ in most cases.

C_Word_Trremap2

In total, more than 40,000 ‘Issues’, most of them ‘Major’, and a technical debt of more than 1,200 days.

C_World_Issues

Now let’s have a look at what rules are not respected, by level of criticality. Remember that I used the Quality Profile by default, out of the box.

Blockers

No violation of ‘Blocker’ rules, which is what we wish to see in any application. But what exactly are these rules so well respected?

C_Word_Blockers

The first is a best practice for handling exceptions in C++. So no surprise if it is fully respected here: there is not a single ‘throw’ statement in all the application.

The second rule concerns the use of trigraphs, which I’m not sure that they existed at that time. I am not an expert in C language, but I believe that trigraphs were introduced by the C89 standard, well after 1980. Anyway, at least we know that we do not have this kind of defect.

Criticals

We have 753 defects of Critical type, most of them concentrated on two rules.

C_World_Crticals

The first rule concerns the use of ‘goto’ to go directly to some other location in the program. The ‘goto’ became the symbol of the ‘spaghetti’ code whose computational logic is difficult to understand, due to the excessive number of references in all directions. It is not always possible to avoid the ‘gotos’, but it is desirable to have them in the same block of code, for readability reasons, and therefore a better reliability as well.

It is mostly jumps to nested portions of code that are dangerous. A ‘goto’ to a block of higher level is more readable, except when your program is over 1,000 lines and the jump will send you a few pages down in the program. Difficult to understand what does the code when you spend your time navigating between multiple pages. It becomes extremely difficult to realize changes without introducing a bug. The maintenance costs and of testing increase proportionally.

The second rule indicates the use of logical operators that presents a risk of side effect because it leads to change a value other than the return code, such as a global variable, with unpredictable consequences in the execution of the program .
Note that the 335 occurrences encountered in the code are not necessarily all real defects, but potential violations.

C_Word_Misra5_14-1

For example, in the above case, if we are sure that the ‘DocOpenStDof’ does not change any value, but performs an action such as opening a Word document for instance, then we can report that it is not a fault, and no longer consider this case in future analysis.

The third rule is the use of octal constants, which creates a risk of confusion with integer values. We do not have much here, and if we consider that some portions of MS-DOS code were reused to develop this version of Word, their presence is not very surprising.

Major

Violations of ‘Major’ rules are by far the most numerous. I will not detail all of them, but just briefly comment on the most notable.

C_Word_Majors1

The most infringed rule (more than 9,000 violations ) is about the absence of braces in If- Else statements. In fact, when looking at the code, we can see that, mainly, they are not used when the If or Else has one or two lines, such as a simple ‘return’ or ‘goto’. So this lack of braces is not really indicative of a lack of readability of the code, as the spirit of the rule is generally respected. However, the problem is that it creates dangerous exceptions, which can be also found in the code: from how many rows without braces a program is becoming less understandable ?

I would probably desactivate in the Quality Profile the second rule, as any other  naming conventions: that they are not respected does not mean that they do not exist. And I tend to keep them in the ‘Minor’ category.

Syntax of K&R (Kernighan & Richie) type are obsolete nowadays, and regarded as difficult to read, but were not in the early 80’s.

Most other Major type violations will also impact the readability and the understandibility of the code, as shown below.

C_Word_Majors2

I configured this widget in order to display ‘Major’ rules by number of defects. The vast majority are programming practices to avoid too long statements, like ‘Switch cases should not have too many lines’ or ‘Function/methods should not have too many lines’, too complex functions (‘Avoid too complex function’ with more than 20 points of Cyclomatic Complexity), naming conventions (‘Literal suffixes shall be upper case’), etc.

There actually are very few ‘Major’ defects that could affect the reliability and robustness of the application, relatively speaking of course.

Minor

This is also true for the ‘Minor’ rules, although these are fewer.

C_Analysis_Word_Minors

The analysis of the issues encountered in the code of this first version of Word confirms our initial evaluation: everything seems to indicate that priorities did concern the efficiency and reliability of the application, since such violations are less numerous. On the contrary, the violations about maintainability and evolvability of the code represent the vast majority of the issues observed, which explains the Technical Debt very high for a first version of application.

There certainly are some rules, such as those about Naming Conventions for example, that I could disable. I did not want to modify the default Quality Profile for the analysis of the code, but it would surely be necessary to adapt it to the characteristics of a C Legacy application.

Nevertheless, we’ll see in the next post that there may be some cases where it is better to proceed as this in order to start our evaluation.

This post is also available in Leer este articulo en castellano and Lire cet article en français.

Leave a Reply

Your email address will not be published. Required fields are marked *