If code had no conditionals, loops, or any other control flow statements, it would be very easy to read. These jumps and branches are the hard stuff, where code can get confusing quickly. This chapter is about making the control flow in your code easy to read.
Make all your conditionals, loops, and other changes to control flow as “natural” as possible—written in a way that doesn’t make the reader stop and reread your code.
Which of these two pieces of code is more readable:
if (length >= 10)
if (10 <= length)
To most programmers, the first is much more readable. But what about the next two:
while (bytes_received < bytes_expected)
while (bytes_expected > bytes_received)
Again, the first version is more readable. But
why? What’s the general rule? How do you decide whether it’s better to
write a < b
or b > a
?
Here’s a guideline we’ve found useful:
Left-hand side | Right-hand side |
---|---|
The expression “being interrogated,” whose value is more in flux. | The expression being compared against, whose value is more constant. |
This guideline matches English usage—it’s pretty natural to say, “if you make at least $100K/year” or “if you are at least 18 years old.” It’s unnatural to say, “if 18 years is less than or equal to your age.”
This explains why while
(bytes_received < bytes_expected)
is more readable. bytes_received
is the value that we’re checking
up on, and it’s increasing as the loop executes. bytes_expected
is the more “stable” value being
compared against.
When writing an if/else statement, you usually have the freedom to swap the order of the blocks. For instance, you can either write it like:
if (a == b) { // Case One ... } else { // Case Two ... }
if (a != b) {
// Case Two ...
} else {
// Case One ...
}
You may not have given much thought about this before, but in some cases there are good reasons to prefer one order over the other:
Prefer dealing with the positive case
first instead of the negative—e.g., if
(debug)
instead of if
(!debug)
.
Prefer dealing with the
simpler case first to get it out of the way. This
approach might also allow both the if
and the else
to be visible on the screen at the same
time, which is nice.
Prefer dealing with the more interesting or conspicuous case first.
Sometimes these preferences conflict, and you have to make a judgment call. But in many cases, there is a clear winner.
For example, suppose you have a web
server that’s building a response
based
on whether the URL contains the query parameter expand_all
:
if (!url.HasQueryParameter("expand_all")) { response.Render(items); ... } else { for (int i = 0; i < items.size(); i++) { items[i].Expand(); } ... }
When the reader glances at the first line, her
brain immediately thinks about the expand_all
case. It’s like when someone says,
“Don’t think of a pink elephant.” You can’t help but think about it—the
“don’t” is drowned out by the more unusual “pink elephant.”
Here, expand_all
is our pink elephant. Because it’s
the more interesting case (and it’s the positive case, too), let’s deal
with it first:
if (url.HasQueryParameter("expand_all")) { for (int i = 0; i < items.size(); i++) { items[i].Expand(); } ... } else { response.Render(items); ... }
On the other hand, here’s a situation where the negative case is the simpler and more interesting/dangerous one, so we deal with it first:
if not file: # Log the error ... else: # ...
Again, depending on the details, this may be a judgment call.
To summarize, our advice is simply to pay
attention to these factors and watch out for cases where your if/else
is in an awkward order.
In C-like languages, you can write a conditional expression as
cond ? a : b
, which is essentially a
compact way to write if (cond) { a } else { b
}
.
Its effect on readability is controversial. Proponents think it’s a nice way to write something in one line that would otherwise require multiple lines. Opponents argue that it can be confusing to read and difficult to step through in a debugger.
Here’s a case where the ternary operator is readable and compact:
time_str += (hour >= 12) ? "pm" : "am";
Avoiding the ternary operator, you might write:
if (hour >= 12) { time_str += "pm"; } else { time_str += "am"; }
which is a bit drawn out and redundant. In this case, a conditional expression seems reasonable.
However, these expressions can quickly become difficult to read:
return exponent >= 0 ? mantissa * (1 << exponent) : mantissa / (1 << -exponent);
Here, the ternary operator is no longer just choosing between two simple values. The motivation for writing code like this is usually to “squeeze everything on one line.”
Instead of minimizing the number of lines, a better metric is to minimize the time needed for someone to understand it.
Spelling out the logic with an if/else
statement makes the code more
natural:
if (exponent >= 0) { return mantissa * (1 << exponent); } else { return mantissa / (1 << -exponent); }
Many respected programming languages, as well as Perl, have a
do { expression } while (condition)
loop. The expression
is executed at
least once. Here’s an example:
// Search through the list, starting at 'node', for the given 'name'. // Don't consider more than 'max_length' nodes. public boolean ListHasNode(Node node, String name, int max_length) { do { if (node.name().equals(name)) return true; node = node.next(); } while (node != null && --max_length > 0); return false; }
What’s weird about a do/while
loop is that a block of code may or may
not be reexecuted based on a condition underneath it.
Typically, logical conditions are above the code they
guard—this is the way it works with if
,
while
, and for
statements. Because you typically read code
from top to bottom, this makes do/while
a bit unnatural. Many readers end up reading the code twice.
while
loops are easier to read because you know the condition for
all iterations before you read the block of code inside. But it would be
silly to duplicate code just to remove a do/while
:
// Imitating a do/while — DON'T DO THIS! body while (condition) { body (again) }
Fortunately, we’ve found that in practice most
do/while
loops could have been written
as while
loops to begin with:
public boolean ListHasNode(Node node, String name, int max_length) { while (node != null && max_length-- > 0) { if (node.name().equals(name)) return true; node = node.next(); } return false; }
This version also has the benefit that it still
works if max_length
is 0 or if node
is null
.
Another reason to avoid do/while
is that the continue
statement can be
confusing inside it. For instance, what does this code do?
do { continue; } while (false);
Does it loop forever or just once? Most programmers have to stop and think about it. (It should loop just once.)
Overall, Bjarne Stroustrup, the creator of C++, says it best (in The C++ Programming Language):
Some coders believe that functions should never have multiple
return
statements. This is nonsense.
Returning early from a function is perfectly fine—and often desirable. For
example:
public boolean Contains(String str, String substr) { if (str == null || substr == null) return false; if (substr.equals("")) return true; ... }
Implementing this function without these “guard clauses” would be very unnatural.
One of the motivations for wanting a single exit point is so that all the cleanup code at the bottom of the function is guaranteed to be called. But modern languages offer more sophisticated ways to achieve this guarantee:
In pure C, there is no mechanism to
trigger specific code when a function exits. So if there’s a large
function with a lot of cleanup code, returning early may be difficult to
do correctly. In this case, other options include refactoring the function
or even judicious use of goto
cleanup;
.
In languages other than C, there is little need for goto
because there are so many better ways to
get the job done. goto
s are also
notorious for getting out of hand quickly and making code difficult to
follow.
But you can still see goto
used in various C projects—most notably
the Linux kernel. Before you dismiss all use of goto
as blasphemy, it’s useful to dissect why
some uses of goto
are better than
others.
The simplest, most innocent use of
goto
is with a single
exit
at the bottom of a function:
if (p == NULL) goto exit; ... exit: fclose(file1); fclose(file2); ... return;
If this were the only form of goto
allowed, goto
wouldn’t be much of a problem.
The problems can come when there are
multiple goto
targets, especially when their paths cross. In particular, goto
s that go upward can
make for real spaghetti code, and they can surely be replaced with
structured loops. Most of the time, goto
should be avoided.
Deeply nested code is hard to understand. Each level of nesting
pushes an extra condition onto the reader’s “mental stack.” When the
reader sees a closing brace (}
) it can be hard to “pop”
the stack and remember what condition is underneath.
Here is a relatively simple example of this—see if you notice yourself looking back up to double-check which block conditions you’re in:
if (user_result == SUCCESS) { if (permission_result != SUCCESS) { reply.WriteErrors("error reading permissions"); reply.Done(); return; } reply.WriteErrors(""); } else { reply.WriteErrors(user_result); } reply.Done();
When you see that first closing
brace, you have to think to yourself, Oh, permission_result != SUCCESS
has just
ended, so now permission_result ==
SUCCESS
, and this is still inside the block
where user_result ==
SUCCESS
.
Overall, you have to keep the values
of user_result
and permission_result
in your head at all times. And
as each if { }
block closes, you have
to toggle the corresponding value in your mind.
This particular code is even worse
because it keeps alternating between the SUCCESS
and non-SUCCESS
situations.
Before we try to fix the previous example code, let’s talk about how it ended up the way it did. Originally, the code was simple:
if (user_result == SUCCESS) { reply.WriteErrors(""); } else { reply.WriteErrors(user_result); } reply.Done();
This code is perfectly
understandable—it figures out what error string to write, and then it’s
done with the reply
.
But then the programmer added a second operation:
if (user_result == SUCCESS) { if (permission_result != SUCCESS) { reply.WriteErrors("error reading permissions"); reply.Done(); return; } reply.WriteErrors(""); ...
This change makes sense—the programmer had a new chunk of code to insert, and she found the easiest place to insert it. This new code was fresh and mentally “bolded” in her mind. And the “diff” of this change is very clean—it looks like a simple change.
But when someone else comes across the code later, all that context is gone. This is the way it was for you when you first read the code at the beginning of this section—you had to take it in all at once.
Look at your code from a fresh perspective when you’re making changes. Step back and look at it as a whole.
Okay, so let’s improve the code. Nesting like this can be removed by handling the “failure cases” as soon as possible and returning early from the function:
if (user_result != SUCCESS) { reply.WriteErrors(user_result); reply.Done(); return; } if (permission_result != SUCCESS) { reply.WriteErrors(permission_result); reply.Done(); return; } reply.WriteErrors(""); reply.Done();
This code only has one level of
nesting, instead of two. But more importantly, the reader never has to
“pop” anything from his mental stack—every if
block ends in a return
.
The technique of returning early isn’t always applicable. For example, here’s a case of code nested in a loop:
for (int i = 0; i < results.size(); i++) {> if (results[i] != NULL) { non_null_count++; if (results[i]->name != "") { cout << "Considering candidate..." << endl; ... } } }
Inside a loop, the analogous technique
to returning early is to continue
:
for (int i = 0; i < results.size(); i++) { if (results[i] == NULL) continue; non_null_count++; if (results[i]->name == "") continue; cout << "Considering candidate..." << endl; ... }
In the same way that an if (...) return;
acts as a guard clause for a
function, these if (...) continue;
statements act as guard clauses for the loop.
In general, the continue
statement can be confusing, because
it bounces the reader around, like a goto
inside the loop. But in this case, each
iteration of the loop is independent (the loop is a “for each”), so the
reader can easily see that continue
just means “skip over this item.”
This chapter has been about low-level control flow: how to make
loops, conditionals, and other jumps easy to read. But you should also
think about the “flow” of your program at a high level. Ideally, it would
be easy to follow the entire execution path of your program—you’d start at
main()
and mentally step through the
code, as one function calls another, until the program exits.
In practice, however, programming languages and libraries have constructs that let code execute “behind the scenes” or make it difficult to follow. Here are some examples:
Some of these constructs are very useful, and they can even make your code more readable and less redundant. But as programmers, sometimes we get carried away and use them excessively without realizing how difficult it will be for readers to understand the code later. Also, these constructs make bugs much harder to track down.
The key is to not let too large a percentage of your code use these constructs. If you abuse these features, it can make tracing through your code like a game of Three-Card Monte (as in the cartoon).
There are a number of things you can do to make your code’s control flow easier to read.
When writing a comparison (while (bytes_expected > bytes_received)
),
it’s better to put the changing value on the left and the more stable
value on the right (while (bytes_received <
bytes_expected)
).
You can also reorder the blocks of an
if/else
statement. Generally, try to
handle the positive/easier/interesting case first. Sometimes these
criteria conflict, but when they don’t, it’s a good rule of thumb to
follow.
Certain programming constructs, like the
ternary operator (: ?
), the do/while
loop, and goto
often result in unreadable code. It’s
usually best not to use them, as clearer alternatives almost always
exist.
Nested code blocks require more concentration to follow along. Each new nesting requires more context to be “pushed onto the stack” of the reader. Instead, opt for more “linear” code to avoid deep nesting.
Returning early can remove nesting and clean up code in general. “Guard statements” (handling simple cases at the top of the function) are especially useful.