On Programming Style - Visual Basic version

Notes on Pet Peeves, from a Guy with a Red Pen

Acknowledgments: To teachers and writers of good code for the lessons they have imparted and their examples of elegance and clarity; and to writers of bad code for their rich legacy of atrocities.

"When the choice is between code clarity and minor optimizations, clarity must, nearly always, win. A lack of clarity is the source of bugs, and it is no good having code that is fast and wrong. First the code must be right, then the code must perform; that is the priority that any sane programmer must obey." - George V. Neville-Neil, Kode Vicious column, Communications of the ACM 59 (6), 2016, p. 27

Introduction

This page is designed to advise on some of the do's and don'ts of good programming style. It is motivated by the conviction that computer programmers should not have for the only goal of a software development project that the program work correctly. Don't get me wrong: the program should work correctly. But there are very practical benefits to developing good programming style (for my students, one is keeping my blood pressure down while I grade your program). A program written in good style is usually easier to understand, debug, and update, than the same program would be if written in bad style. In professional software development, programs are constantly being modified to correct errors, expand functionality, meet changing specifications, improve efficiency, etc. In order that these be done correctly and efficiently, it's important that the source code be easily read and understood.

Documentation

It has been said that programmers who do not document their code are indispensable, and unpromotable. They are indispensable because without proper documentation, it's likely that nobody else will understand their code. They are unpromotable for the same reason: you can't become a project leader if you don't yourself practice a fundamental of teamwork, namely, making your code understandable to your project teammates.

Proper documentation in a program should include the following elements:

  • Identification of the programmer or team of programmers. For a student, this serves the obvious purpose of enabling an instructor to assign a grade. For a professional, this enables a reader of the listing to find a good source of information about a program.

  • Identification of the "client" of the program. For a student, this is typically a course. For a professional, it may be a business client, another department within the organization employing the programmer, or a mass market. Particularly in the case of a more limited user group such as a department within an organization, the client can also be a useful source of information about the program.

  • The date (could be approximate, e.g., Fall, 2008) the program was written, and, when relevant, its version and revision history.

  • A brief statement about what the program does.

  • Explanation of all major sections of the program. For example, in a VB program, every class and every subprogram should be explained (an exception may be justified for a subprogram with a very simple listing whose name explains what the subprogram does).

  • When relevant, explanation of complex logic. For example, a calculation that uses a non-obvious mathematical technique should have that technique explained.

There are other issues of good documentation besides the use of comment/remark statements. For example, the programmer's choices of identifiers can do a lot to improve or harm understanding of code. Variables, subprogram names, etc., should be chosen to suggest what they represent. Compare, for example, the following snippets of code:

Logically, these two code fragments are identical. However, the one on the right is written in superior style, because its identifiers have been chosen to document their respective purposes within the code.

See the related issue of Magic Numbers.

Identifiers

An identifier is the name you give to a variable, constant, class, or subprogram. Some widely accepted style guidelines in Visual Basic programming for indentifiers include the following.

  • For constants, use only capital letters. When you follow this guideline, a reader is reminded that an identifier such as TAXRATE is the name of a constant.

  • For other (non-constant) named entities, use "camel capitalization" - capitalize the initial of each word except the first word in an identifier.
    Examples:
    unitPrice, gramsPerBag

  • Graphic controls should use a 3-letter prefix in their names that describes the type of control. Examples:


Violations of these guidelines will not prevent your program from running, but may make your code harder to understand.

Indentation

The "pretty printer" of the VB code editor will automatically handle most examples concerning this issue pretty well. However, there are instances in which the programmer's decision makes a difference. For example, greater indentation of a statement's 2nd, 3rd, etc., lines, clarify that these lines continue a statement started on a previous line; the pretty printer will not supply such indentation automatically. Consider the fragments below, in which the version on the right is better than the version on the left:

The superiority of the style used on the right is observed in that the extra indentation used on the 2nd and 3rd lines of the statement clarifies that the expressions listed are part of the value being assigned.

The VB code editor's pretty printer will automatically indent the first line of a statement to reflect the statement's block level within the code. This is a practice that is recommended, for example, should you use some other editing system (such as NotePad) for editing your code, since the ability to understand a statement's block level within the surrounding code is an important key to understanding the code.

Length of a Subprogram

KISS - Keep it short, smarty! (You thought I would use "stupid"? Stupid people don't write software.)

In general, it's wise to keep every subprogram (Sub or Function) short. A reader typically tries to understand one subprogram at a time, so keeping subprograms short makes them more "digestable." A classical guideline: a maximum of 25 lines of code, what used to be the maximum viewable on a computer screen using a typical text editor. Today's screens often show more than 25 lines, and I won't send a student to the guillotine for a 26th line, but if you're in excess of 30, there's probably a natural way to abbreviate your subprogram, perhaps by extracting one or more blocks of its code as (a) separate subprogram(s).

Logical Structure of a Subprogram

A subprogram that performs a small number of simple actions can have a short listing consisting of simple action statements. But what about a subprogram that is responsible for a complex segment of the program's actions?

I (following my teachers) encourage using a "top-down" approach. A subprogram that directs complex action should have a listing that outlines the totality of the actions managed by the subprogram. When we combine this outlook with the advice given elsewhere in this document (see Length of a Subprogram) to keep the listing of each subprogram short, we can deduce that it's a good idea to extract lower-level details of a major subtask managed by a subprogram so that such a subtask has its own subprogram that can be called upon by the current subprogram. This approach also facilitates the view of a subprogram as doing "one job," even if that "one job" involves managing several other "jobs."

For example: Here's a situation that often arises in student programs. Sub Blah calls upon a Sub Menu that both presents a menu to the user and acts upon the user's choice. As a result, a reader of Sub Blah doesn't see how the program acts on the user's choice. A better structure: have Sub Blah call upon a Sub Menu that returns (preferably via a reference parameter, not via a Return statement, as the latter method likely would represent a side effect) the user's choice to Sub Blah, and have Sub Blah contain the code (which could be a call to a separate subprogram with, say, a Select Case structure to select the appropriate action) that shows how the program responds to the user's choice.

Loop Style

Most programming languages have a variety of loop structures. Often, more than one loop structure can be used to code a given piece of logic. However, there is generally a preferred loop structure for a given situation.

First of all, although most programming languages have a goto statement, it should be avoided when possible (and in modern programming languages, including Visual Basic, it's almost always possible). The use of a label (a target for a goto) greatly increases the difficulty of understanding the flow of control in a program, because when a label is present, control can reach the labeled statement from many more places in the listing than otherwise. See the classic article [Dijkstra] for more on this point. In particular, then, loops should not be controlled by goto statements.

A reader of this document probably knows that a modern language may have a loop structure that tests a condition before performing the loop body, thus allowing the possibility that the loop body is not performed at all (in VB, a While loop), and a loop structure that tests a condition only after performing the loop body, thus requiring the loop body to be performed at least once (in VB, the Do ... While loop; also, the Do ... Until loop). Thus, the most prominent criterion for choosing between a While loop and a Do ... While loop is whether or not to allow the possibility of reaching the loop and performing the loop body zero times.

What about a For loop? Should you use a For loop where you might use a Do ... While loop or a While loop? Often, this is possible. In general, the use of a for loop is interpreted to mean that the author has some knowledge of the number of times the loop body is to be performed. For example, if a loop is to have one performance for each count of a control variable from some minimum value up to some maximum value (or for some maximum value down to some minimum value) (these extreme values need not be constants in the program), a For is the preferred loop structure. The increment or step value of the control variable need not be 1 (or -1); for example, it is appropriate to use a For loop to print the multiples of 5 and their squares:

By contrast, even when possible to use a For loop, it's preferred to use a Do or While when there is no knowledge of the number of performances of the loop body before exit from the loop. For example, to count each name stored in an array until reaching a "name" that's a null string, it is better to traverse the list with a While loop, since in general we don't know how many non-null entries will precede the first null entry. Thus, in the following, the version on the right is preferable.

The version on the right is preferable, for the following advantages:

  • By its use of a For loop, the version on the left misleads the reader (and perhaps even the programmer) concerning the number of entries of the list that must be examined, since the version on the left considers every index value, and the version on the right only considers index values until a null entry is found (or the maximum index value is reached).

  • Because once a null entry is found, the version on the right will not consider further index values, while the version on the left needlessly does so, the version on the right will, often, execute more efficiently.

Magic Numbers

Magic numbers are (numeric) constant values that are unexplained. The use of magic numbers is easily avoided by assigning an appropriate symbolic identifier (in VB, typically via a const declaration) to the value in question. This practice has several advantages, as illustrated in the following examples.

  • Imagine a program that requires time calculations. Such a program might well use the value 60 for multiple purposes, including the number of seconds per minute, the number of minutes per hour, and the number of paper clips in a small box (has nothing to do with time, which is part of the point). Compare the following code fragments, in which, to emphasize our points, we will combine the issue of magic numbers with the issue of choosing identifiers mnemonically:

  • Logically, these code fragments are identical. However, the one on the right has the advantage that each use of the constant value 60 is explained by its Const identifier. Indeed, the right fragment clarifies, and the left fragment does not, that the three uses of the value 60 in the assignments to (nonconstant) variables are unrelated to each other. (The right fragment also has the advantage that its other identifiers are mnemonically chosen.)

  • Compare the following code fragments:

These fragments have identical logic. However, the left version does not clarify nearly as well as the right version that the tax rate is 7%.

Further, imagine a program that has 20 such computations. Now suppose the county legislature changes the sales tax rate. In the version of the program written in the style of the left fragment above, 20 changes must be made, and because there are so many, there is a danger that not every place in the program requiring a change is found and corrected; but in the version of the program written in the style of the right fragment, only the value used in the Const declaration need be changed.

Recursion - when to use it, when not to use it

Recursion is when a subprogram calls upon itself, either directly or circularly. Circular recursion takes place when there is a list of subprograms s0, s1, ..., sn-1 such that for each index i, si calls upon s(i+1) mod n - i.e.,

s0 calls s1, and s1 calls s2, and ..., and sn-2 calls sn-1, and sn-1 calls s0.

When used properly, recursion is a powerful programming technique. Often, an algorithm expressed recursively requires much less code than would the same algorithm expressed nonrecursively.

Typical good uses of recursion are in "Divide and Conquer" algorithms. These are algorithms for which a large problem is divided up into smaller problems of the same type; each of the smaller problems is solved; and the partial solutions are "stitched together" to obtain a solution to the original, large problem. (In the following, you need to know that merging two sorted lists means combining the lists into a single sorted list.) For example, the Merge Sort algorithm for sorting data may be expressed as follows:

Merge Sort - given an unsorted list L of data, sort the data, as follows:

If the list L has at least 2 items (and therefore has the possibility of data out of order), then

  1. Divide the list L into two smaller lists L1 and L2 of approximately equal size (thus, each of L1 and L2 has about 1/2 of the data of the original list L).

  2. Recursively, apply the Merge Sort algorithm to L1, so that at the end of this step, L1 is sorted.

  3. Recursively, apply the Merge Sort algorithm to L2, so that at the end of this step, L2 is sorted.

  4. Merge the sorted lists L1 and L2 to obtain the sorted list L

Notice that recursion is a form of looping. For example, in the discussion above, when we apply the algorithm recursively to L1, the list L1 has at least 2 items then it is divided into smaller lists, say, L11 and L12; if L11 has at least 2 items, it is subdivided into, say, L111 and L112; etc.

Because direct recursion (a subprogram calling itself directly) is a form of looping, it can often be avoided when its only motivation is looping. Languages like C++, BASIC, COBOL, and Visual Basic provide other loop forms for programmers to use that will often be more clear than recursion, so that, unless there is a motivation other than looping (such as divide-and-conquer), it may be preferable to avoid direct recursion. (There are programming languages such as LISP and PROLOG in which recursion is the primary form of looping - in such languages, it may be impossible or undesirable to avoid direct recursion even when looping is the only motivation.)

Circular recursion is rarely necessary, and often is difficult to understand. Unfortunately, many beginners use circular recursion where a more conventional form of looping would be much easier to understand. Consider the following forms, in which the example on the right is preferable:

Logically, these code fragments are identical. However, the version on the right is easier to understand, because its loop structure is explicit in the director subprogram. By contrast, the version on the left obscures its looping - you must read director and againQuery together in order to understand that the actions of director repeat. It's easier to read one subprogram at a time than two (or more) subprograms together. Therefore, circular recursion should be avoided unless there is some powerful reason other than looping (which, for most programmers, will be a rare event) for using it.

Case Selection / Selection of Actions

Many programming languages have a statement that is designed to control the selection of actions among a known list of possibilities. In VB, this is the Select Case structure. An alternative to the use of Select Case is the use of a series of If statements, or If with a series of Else If clauses.

When we are selecting among more than two mutually exclusive possibilities, the Select Case statement is preferred over the use of a series of If statements or a series of Else clauses. This clarifies the mutually exclusive nature of the list of cases among which we're choosing, while a series of if statements does not. Consider the following pseudo-code examples:

Are these seemingly equivalent versions even truly equivalent? Not necessarily. For example, in the version on the left, suppose control reaches the first if statement with choice equal to 1. Suppose action1 causes the value of choice to be changed to 4. Then the version on the left executes both action1 and action4. By contrast, if the version on the right has control reach the Select with choice equal to 1, action1 is executed, but action4 is not despite the fact that action1 changes the value of choice to 4.

Side Effects

A side effect introduces the potential for a nasty surprise in the behavior of a program. See this essay for more information on this topic.

Single Entry Point, Single Exit Point Style

A piece of code is likely to be more easily understood if it has only one entry point (at the top of its listing) and only one exit point at (or near, e.g., a return statement just before the end of a function) the bottom of its listing. In VB, this philosophy has the following implications:

  • Don't use goto statements. If the target of a goto is not at the start of its block, then its block has two entry points; if the goto statement is not at the end of its block, then its block has two exit points. The goto statement can also have other undesirable effects on the understanding of the listing; for example, see the Loop Style section.

  • Don't use Exit, Exit Do, Exit For, etc., statements, since doing so provides a second exit point from the block of code being exited.

Some argue for exceptions to the rules stated above in order to simplify the flow of control amidst complex code. For example, some argue that if main calls on blah, which calls on ..., which calls on gallumph, and an unusual error is detected by the code of the latter subprogram that makes further processing within the program pointless, it is easier for the programmer to use an Exit statement to end the run of the program than to provide appropriate If statements, one in each of the chain of subprograms alluded to above, to provide "normal" exit from the main Sub's end-of-listing or Return statement. Since student exercises rarely get complex enough to make a strong argument along these lines, I prefer my students to follow the guidelines above.

Wraparound

Lines of code that "wrap around" the screen tend to print with apparent line breaks in undesirable locations. Programmers should learn to use their continuation symbols and Enter keys during program editing. It should be clear that anyone reading your code will prefer the version on the right to the version on the left, below:

Reference

[Dijkstra] E.W.G. Dijkstra, "Go To statement considered harmful," Communications of the Association for Computing Machinery 11, 3 (Mar., 1968), pp. 147-148;

Online at http://www.cs.utexas.edu/users/EWD/ewd02xx/EWD215.PDF

Dijkstra's letter was reprinted with some paraphrasing as

"(A Look Back at) Go To Statement Considered Harmful," Communications of the Association for Computing Machinery 51, 1 (Jan., 2008), pp. 7-9.

Online at http://mags.acm.org/communications/200801/