Chapter 7. Making Control Flow Easy to Read

image with no caption

If code had no conditionals, loops, or any other control flow statements, it would be very easy to read. These jumps and branches are the hard stuff, where code can get confusing quickly. This chapter is about making the control flow in your code easy to read.

Key Idea

Make all your conditionals, loops, and other changes to control flow as “natural” as possible—written in a way that doesn’t make the reader stop and reread your code.

The Order of Arguments in Conditionals

Which of these two pieces of code is more readable:

if (length >= 10)

or

if (10 <= length)

To most programmers, the first is much more readable. But what about the next two:

while (bytes_received < bytes_expected)

or

while (bytes_expected > bytes_received)

Again, the first version is more readable. But why? What’s the general rule? How do you decide whether it’s better to write a < b or b > a?

Here’s a guideline we’ve found useful:

Left-hand sideRight-hand side

The expression “being interrogated,” whose value is more in flux.

The expression being compared against, whose value is more constant.

This guideline matches English usage—it’s pretty natural to say, “if you make at least $100K/year” or “if you are at least 18 years old.” It’s unnatural to say, “if 18 years is less than or equal to your age.”

This explains why while (bytes_received < bytes_expected) is more readable. bytes_received is the value that we’re checking up on, and it’s increasing as the loop executes. bytes_expected is the more “stable” value being compared against.

The Order of if/else Blocks

image with no caption

When writing an if/else statement, you usually have the freedom to swap the order of the blocks. For instance, you can either write it like:

if (a == b) {
    // Case One ...
} else {
    // Case Two ...
}

or as:

if (a != b) {
    // Case Two ...
} else {
    // Case One ...
}

You may not have given much thought about this before, but in some cases there are good reasons to prefer one order over the other:

  • Prefer dealing with the positive case first instead of the negative—e.g., if (debug) instead of if (!debug).

  • Prefer dealing with the simpler case first to get it out of the way. This approach might also allow both the if and the else to be visible on the screen at the same time, which is nice.

  • Prefer dealing with the more interesting or conspicuous case first.

Sometimes these preferences conflict, and you have to make a judgment call. But in many cases, there is a clear winner.

For example, suppose you have a web server that’s building a response based on whether the URL contains the query parameter expand_all:

if (!url.HasQueryParameter("expand_all")) {
    response.Render(items);
    ...
} else {
    for (int i = 0; i < items.size(); i++) {
        items[i].Expand();
    }
    ...
}

When the reader glances at the first line, her brain immediately thinks about the expand_all case. It’s like when someone says, “Don’t think of a pink elephant.” You can’t help but think about it—the “don’t” is drowned out by the more unusual “pink elephant.”

Here, expand_all is our pink elephant. Because it’s the more interesting case (and it’s the positive case, too), let’s deal with it first:

if (url.HasQueryParameter("expand_all")) {
    for (int i = 0; i < items.size(); i++) {
        items[i].Expand();
    }
    ...
} else {
    response.Render(items);
    ...
}

On the other hand, here’s a situation where the negative case is the simpler and more interesting/dangerous one, so we deal with it first:

if not file:
    # Log the error ...
else:
    # ...

Again, depending on the details, this may be a judgment call.

To summarize, our advice is simply to pay attention to these factors and watch out for cases where your if/else is in an awkward order.

The ?: Conditional Expression (a.k.a. “Ternary Operator”)

In C-like languages, you can write a conditional expression as cond ? a : b, which is essentially a compact way to write if (cond) { a } else { b }.

Its effect on readability is controversial. Proponents think it’s a nice way to write something in one line that would otherwise require multiple lines. Opponents argue that it can be confusing to read and difficult to step through in a debugger.

Here’s a case where the ternary operator is readable and compact:

time_str += (hour >= 12) ? "pm" : "am";

Avoiding the ternary operator, you might write:

if (hour >= 12) {
    time_str += "pm";
} else {
    time_str += "am";
}

which is a bit drawn out and redundant. In this case, a conditional expression seems reasonable.

However, these expressions can quickly become difficult to read:

return exponent >= 0 ? mantissa * (1 << exponent) : mantissa / (1 << -exponent);

Here, the ternary operator is no longer just choosing between two simple values. The motivation for writing code like this is usually to “squeeze everything on one line.”

Key Idea

Instead of minimizing the number of lines, a better metric is to minimize the time needed for someone to understand it.

Spelling out the logic with an if/else statement makes the code more natural:

if (exponent >= 0) {
    return mantissa * (1 << exponent);
} else {
    return mantissa / (1 << -exponent);
}

Advice

By default, use an if/else. The ternary ?: should be used only for the simplest cases.

Avoid do/while Loops

image with no caption

Many respected programming languages, as well as Perl, have a do { expression } while (condition) loop. The expression is executed at least once. Here’s an example:

// Search through the list, starting at 'node', for the given 'name'.
// Don't consider more than 'max_length' nodes.
public boolean ListHasNode(Node node, String name, int max_length) {
    do {
        if (node.name().equals(name))
            return true;
        node = node.next();
    } while (node != null && --max_length > 0);

    return false;
}

What’s weird about a do/while loop is that a block of code may or may not be reexecuted based on a condition underneath it. Typically, logical conditions are above the code they guard—this is the way it works with if, while, and for statements. Because you typically read code from top to bottom, this makes do/while a bit unnatural. Many readers end up reading the code twice.

while loops are easier to read because you know the condition for all iterations before you read the block of code inside. But it would be silly to duplicate code just to remove a do/while:

// Imitating a do/while — DON'T DO THIS!
body

while (condition) {
    body (again)
}

Fortunately, we’ve found that in practice most do/while loops could have been written as while loops to begin with:

public boolean ListHasNode(Node node, String name, int max_length) {
    while (node != null && max_length-- > 0) {
        if (node.name().equals(name)) return true;
        node = node.next();
    }
    return false;
}

This version also has the benefit that it still works if max_length is 0 or if node is null.

Another reason to avoid do/while is that the continue statement can be confusing inside it. For instance, what does this code do?

do {
    continue;
} while (false);

Does it loop forever or just once? Most programmers have to stop and think about it. (It should loop just once.)

Overall, Bjarne Stroustrup, the creator of C++, says it best (in The C++ Programming Language):

In my experience, the do-statement is a source of errors and confusion. … I prefer the condition “up front where I can see it.” Consequently, I tend to avoid do-statements.

Returning Early from a Function

Some coders believe that functions should never have multiple return statements. This is nonsense. Returning early from a function is perfectly fine—and often desirable. For example:

public boolean Contains(String str, String substr) {
    if (str == null || substr == null) return false;
    if (substr.equals("")) return true;

    ...
}

Implementing this function without these “guard clauses” would be very unnatural.

One of the motivations for wanting a single exit point is so that all the cleanup code at the bottom of the function is guaranteed to be called. But modern languages offer more sophisticated ways to achieve this guarantee:

LanguageStructured idiom for cleanup code
C++destructors
Java, Pythontry finally
Pythonwith
C#using

In pure C, there is no mechanism to trigger specific code when a function exits. So if there’s a large function with a lot of cleanup code, returning early may be difficult to do correctly. In this case, other options include refactoring the function or even judicious use of goto cleanup;.

The Infamous goto

In languages other than C, there is little need for goto because there are so many better ways to get the job done. gotos are also notorious for getting out of hand quickly and making code difficult to follow.

But you can still see goto used in various C projects—most notably the Linux kernel. Before you dismiss all use of goto as blasphemy, it’s useful to dissect why some uses of goto are better than others.

The simplest, most innocent use of goto is with a single exit at the bottom of a function:

    if (p == NULL) goto exit;

    ...

exit:
    fclose(file1);
    fclose(file2);
    ...

    return;

If this were the only form of goto allowed, goto wouldn’t be much of a problem.

The problems can come when there are multiple goto targets, especially when their paths cross. In particular, gotos that go upward can make for real spaghetti code, and they can surely be replaced with structured loops. Most of the time, goto should be avoided.

Minimize Nesting

Deeply nested code is hard to understand. Each level of nesting pushes an extra condition onto the reader’s “mental stack.” When the reader sees a closing brace (}) it can be hard to “pop” the stack and remember what condition is underneath.

Here is a relatively simple example of this—see if you notice yourself looking back up to double-check which block conditions you’re in:

if (user_result == SUCCESS) {
    if (permission_result != SUCCESS) {
        reply.WriteErrors("error reading permissions");
        reply.Done();
        return;
    }
    reply.WriteErrors("");
} else {
    reply.WriteErrors(user_result);

}
reply.Done();

When you see that first closing brace, you have to think to yourself, Oh, permission_result != SUCCESS has just ended, so now permission_result == SUCCESS, and this is still inside the block where user_result == SUCCESS.

Overall, you have to keep the values of user_result and permission_result in your head at all times. And as each if { } block closes, you have to toggle the corresponding value in your mind.

This particular code is even worse because it keeps alternating between the SUCCESS and non-SUCCESS situations.

How Nesting Accumulates

Before we try to fix the previous example code, let’s talk about how it ended up the way it did. Originally, the code was simple:

if (user_result == SUCCESS) {
    reply.WriteErrors("");
} else {
    reply.WriteErrors(user_result);
}
reply.Done();

This code is perfectly understandable—it figures out what error string to write, and then it’s done with the reply.

But then the programmer added a second operation:

if (user_result == SUCCESS) {
    if (permission_result != SUCCESS) {
       reply.WriteErrors("error reading permissions");
       reply.Done();
       return;
    }
    reply.WriteErrors("");

...

This change makes sense—the programmer had a new chunk of code to insert, and she found the easiest place to insert it. This new code was fresh and mentally “bolded” in her mind. And the “diff” of this change is very clean—it looks like a simple change.

But when someone else comes across the code later, all that context is gone. This is the way it was for you when you first read the code at the beginning of this section—you had to take it in all at once.

Key Idea

Look at your code from a fresh perspective when you’re making changes. Step back and look at it as a whole.

Removing Nesting by Returning Early

Okay, so let’s improve the code. Nesting like this can be removed by handling the “failure cases” as soon as possible and returning early from the function:

if (user_result != SUCCESS) {
    reply.WriteErrors(user_result);
    reply.Done();
    return;
}

if (permission_result != SUCCESS) {
    reply.WriteErrors(permission_result);
    reply.Done();
    return;
}

reply.WriteErrors("");
reply.Done();

This code only has one level of nesting, instead of two. But more importantly, the reader never has to “pop” anything from his mental stack—every if block ends in a return.

Removing Nesting Inside Loops

The technique of returning early isn’t always applicable. For example, here’s a case of code nested in a loop:

for (int i = 0; i < results.size(); i++) {>
    if (results[i] != NULL) {
        non_null_count++;

        if (results[i]->name != "") {
            cout << "Considering candidate..." << endl;

            ...
        }
    }
}

Inside a loop, the analogous technique to returning early is to continue:

for (int i = 0; i < results.size(); i++) {
    if (results[i] == NULL) continue;
    non_null_count++;

    if (results[i]->name == "") continue;
    cout << "Considering candidate..." << endl;

    ...
}

In the same way that an if (...) return; acts as a guard clause for a function, these if (...) continue; statements act as guard clauses for the loop.

In general, the continue statement can be confusing, because it bounces the reader around, like a goto inside the loop. But in this case, each iteration of the loop is independent (the loop is a “for each”), so the reader can easily see that continue just means “skip over this item.”

Can You Follow the Flow of Execution?

image with no caption

This chapter has been about low-level control flow: how to make loops, conditionals, and other jumps easy to read. But you should also think about the “flow” of your program at a high level. Ideally, it would be easy to follow the entire execution path of your program—you’d start at main() and mentally step through the code, as one function calls another, until the program exits.

In practice, however, programming languages and libraries have constructs that let code execute “behind the scenes” or make it difficult to follow. Here are some examples:

Programming constructHow high-level program flow gets obscured
threadingIt’s unclear what code is executed when.
signal/interrupt handlersCertain code might be executed at any time.
exceptionsExecution can bubble up through multiple function calls.
function pointers & anonymous functionsIt’s hard to know exactly what code is going to run because that isn’t known at compile time.
virtual methodsobject.virtualMethod() might invoke code of an unknown subclass.

Some of these constructs are very useful, and they can even make your code more readable and less redundant. But as programmers, sometimes we get carried away and use them excessively without realizing how difficult it will be for readers to understand the code later. Also, these constructs make bugs much harder to track down.

The key is to not let too large a percentage of your code use these constructs. If you abuse these features, it can make tracing through your code like a game of Three-Card Monte (as in the cartoon).

Summary

There are a number of things you can do to make your code’s control flow easier to read.

When writing a comparison (while (bytes_expected > bytes_received)), it’s better to put the changing value on the left and the more stable value on the right (while (bytes_received < bytes_expected)).

You can also reorder the blocks of an if/else statement. Generally, try to handle the positive/easier/interesting case first. Sometimes these criteria conflict, but when they don’t, it’s a good rule of thumb to follow.

Certain programming constructs, like the ternary operator (: ?), the do/while loop, and goto often result in unreadable code. It’s usually best not to use them, as clearer alternatives almost always exist.

Nested code blocks require more concentration to follow along. Each new nesting requires more context to be “pushed onto the stack” of the reader. Instead, opt for more “linear” code to avoid deep nesting.

Returning early can remove nesting and clean up code in general. “Guard statements” (handling simple cases at the top of the function) are especially useful.

    Reset