Chapter 8. Breaking Down Giant Expressions

The giant squid is an amazing and intelligent animal, but its near-perfect body design has one fatal flaw: it has a donut-shaped brain that wraps around its esophagus. So if it swallows too much food at once, it gets brain damage.

What does this have to do with code? Well, code that comes in “chunks” that are too big can have the same kind of effect. Recent research suggests that most of us can only think about three or four “things” at a time.^[1] Simply put, the larger an expression of code is, the harder it will be to understand.

Key Idea

Break down your giant expressions into more digestible pieces.

In this chapter, we’ll go through various ways you can manipulate and break down your code so that it’s easier to swallow.

Explaining Variables

The simplest way to break down an expression is to introduce an extra variable that captures a smaller subexpression. This extra variable is sometimes called an “explaining variable” because it helps explain what the subexpression means.

Here is an example:

if line.split(':')[0].strip() == "root":
    ...

Here is the same code, now with an explaining variable:

username = line.split(':')[0].strip()
if username == "root":
    ...

Summary Variables

Even if an expression doesn’t need explaining (because you can figure out what it means), it can still be useful to capture that expression in a new variable. We call this a summary variable if its purpose is simply to replace a larger chunk of code with a smaller name that can be managed and thought about more easily.

For example, consider the expressions in this code:

if (request.user.id == document.owner_id) {
    // user can edit this document...
}

...

if (request.user.id != document.owner_id) {
    // document is read-only...
}

The expression request.user.id == document.owner_id may not seem that big, but it has five variables, so it takes a little extra time to think about.

The main concept in this code is, “Does the user own the document?” That concept can be stated more clearly by adding a summary variable:

final boolean user_owns_document = (request.user.id == document.owner_id);

if (user_owns_document) {
    // user can edit this document...
}

...

if (!user_owns_document) {
    // document is read-only...
}

It may not seem like much, but the statement if (user_owns_document) is a little easier to think about. Also, having user_owns_document defined at the top tells the reader upfront that “this is a concept we’ll be referring to throughout this function.”

Using De Morgan’s Laws

If you ever took a course in circuits or logic, you might remember De Morgan’s laws. They are two ways to rewrite a boolean expression into an equivalent one:

1)   not (a or b or c)    ⇔   (not a) and (not b) and (not c)
2)   not (a and b and c)  ⇔    (not a) or (not b) or (not c)

If you have trouble remembering these laws, a simple summary is “Distribute the not and switch and/or.” (Or going the other way, you “factor out the not.”)

You can sometimes use these laws to make a boolean expression more readable. For instance, if your code is:

if (!(file_exists && !is_protected)) Error("Sorry, could not read file.");

It can be rewritten to:

if (!file_exists || is_protected) Error("Sorry, could not read file.");

Abusing Short-Circuit Logic

In most programming languages, boolean operators perform short-circuit evaluation. For example, the statement if (a || b) won’t evaluate b if a is true. This behavior is very handy but can sometimes be abused to accomplish complex logic.

Here is an example of a statement once written by one of the authors:

assert((!(bucket = FindBucket(key))) || !bucket->IsOccupied());

In English, what this code is saying is, “Get the bucket for this key. If the bucket is not null, then make sure it isn’t occupied.”

Even though it’s only one line of code, it really makes most programmers stop and think. Now compare it to this code:

bucket = FindBucket(key);
if (bucket != NULL) assert(!bucket->IsOccupied());

It does exactly the same thing, and even though it’s two lines of code, it’s much easier to understand.

So why was the code written as a single giant expression in the first place? At the time, it felt very clever. There’s a certain pleasure in paring logic down to a concise nugget of code. That’s understandable—it’s like solving a miniature puzzle, and we all like to have fun at work. The problem is that the code was a mental speed bump for anyone reading through the code.

Key Idea

Beware of “clever” nuggets of code—they’re often confusing when others read the code later.

Does this mean you should avoid making use of short-circuit behavior? No. There are plenty of cases where it can be used cleanly, for instance:

if (object && object->method()) ...

There is also a newer idiom worth mentioning: in languages like Python, JavaScript, and Ruby, the “or” operator returns one of its arguments (it doesn’t convert to a boolean), so code like :

x = a || b || c

can be used to pick out the first “truthy” value from a, b, or c.

Example: Wrestling with Complicated Logic

Suppose you’re implementing the following Range class:

struct Range {
    int begin;
    int end;

    // For example, [0,5) overlaps with [3,8)
    bool OverlapsWith(Range other);
};

The following figure shows some example ranges:

Note that end is noninclusive. So A, B, and C don’t overlap with each other, but D overlaps with all of them.

Here is one attempt at implementing OverlapsWith()—it checks if either endpoint of its range falls inside the other’s range:

bool Range::OverlapsWith(Range other) {
    // Check if 'begin' or 'end' falls inside 'other'.
    return (begin >= other.begin && begin <= other.end) ||
           (end >= other.begin && end <= other.end);
}

Even though the code is only two lines long, there’s a lot going on. The following figure shows all the logic involved.

There are so many cases and conditions to think about that it’s easy for a bug to slip by.

Speaking of which, there is a bug. The previous code will claim that the Range [0,2) overlaps with [2,4) when in fact it doesn’t.

The problem is that you have to be careful when comparing begin/end values using <= or just <. Here’s a fix to this problem:

return (begin >= other.begin && begin < other.end) ||
       (end > other.begin && end <= other.end);

Now it’s correct, right? Actually, there’s another bug. This code has ignored the case when begin/end completely surround other.

Here’s a fix that handles this case, too:

return (begin >= other.begin && begin < other.end) ||
       (end > other.begin && end <= other.end) || 
       (begin <= other.begin && end >= other.end);

Yikes—this code has become way too complicated. You can’t expect anyone to read this code and confidently know that it’s correct. So what do we do? How can we break down this giant expression?

Finding a More Elegant Approach

This is one of those times when you should stop and consider a different approach altogether. What started as a simple problem (checking whether two ranges overlap) turned into a surprisingly convoluted piece of logic. This is often a sign that there must be an easier way.

But finding a more elegant solution takes creativity. How do you go about it? One technique is to see if you can solve the problem the “opposite” way. Depending on the situation you’re in, this could mean iterating through arrays in reverse or filling in some data structure backward rather than forward.

Here, the opposite of OverlapsWith() is “doesn’t overlap.” Determining if two ranges don’t overlap turns out to be a much simpler problem, because there are only two possibilities:

The other range ends before this one begins.
The other range begins after this one ends.

We can turn this into code quite easily:

bool Range::OverlapsWith(Range other) {
    if (other.end <= begin) return false;  // They end before we begin
    if (other.begin >= end) return false;  // They begin after we end

    return true;  // Only possibility left: they overlap
}

Each line of code here is much simpler—it involves only a single comparison. That leaves the reader with enough brainpower to focus on whether <= is correct .

Breaking Down Giant Statements

This chapter is about breaking down individual expressions, but the same techniques apply to breaking down larger statements as well. For example, the following JavaScript code has a lot to take in at once:

var update_highlight = function (message_num) {
    if ($("#vote_value" + message_num).html() === "Up") {
        $("#thumbs_up" + message_num).addClass("highlighted");
        $("#thumbs_down" + message_num).removeClass("highlighted");
    } else if ($("#vote_value" + message_num).html() === "Down") {
        $("#thumbs_up" + message_num).removeClass("highlighted");
        $("#thumbs_down" + message_num).addClass("highlighted");
    } else {
        $("#thumbs_up" + message_num).removeClass("highighted"); 
        $("#thumbs_down" + message_num).removeClass("highlighted");
    }
};

The individual expressions in this code aren’t that big, but when placed all together, it forms one giant statement that hits you all at once.

Fortunately, a lot of the expressions are the same, which means we can extract them out as summary variables at the top of the function (this is also an instance of the DRY—Don’t Repeat Yourself—principle):

var update_highlight = function (message_num) {
    var thumbs_up = $("#thumbs_up" + message_num);
    var thumbs_down = $("#thumbs_down" + message_num);
    var vote_value = $("#vote_value" + message_num).html();
    var hi = "highlighted";

    if (vote_value === "Up") {
        thumbs_up.addClass(hi);
        thumbs_down.removeClass(hi);
    } else if (vote_value === "Down") {
        thumbs_up.removeClass(hi);
        thumbs_down.addClass(hi);
    } else {
        thumbs_up.removeClass(hi);
        thumbs_down.removeClass(hi);
    }
};

The creation of var hi = "highlighted" isn’t strictly needed, but as there were six copies of it, there were compelling benefits:

It helps avoid typing mistakes. (In fact, did you notice that in the first example, the string was misspelled as "highighted" in the fifth case?)
It shrinks the line width even more, making the code easier to scan through.
If the class name needed to change, there’s just one place to change it.

Another Creative Way to Simplify Expressions

Here’s another example with a lot going on in each expression, this time in C++:

void AddStats(const Stats& add_from, Stats* add_to) {
    add_to->set_total_memory(add_from.total_memory() + add_to->total_memory());
    add_to->set_free_memory(add_from.free_memory() + add_to->free_memory());
    add_to->set_swap_memory(add_from.swap_memory() + add_to->swap_memory());
    add_to->set_status_string(add_from.status_string() + add_to->status_string());
    add_to->set_num_processes(add_from.num_processes() + add_to->num_processes());
    ...
}

Once again, your eyes are faced with code that’s long and similar, but not exactly the same. After ten seconds of careful scrutiny, you might realize that each line is doing the same thing, just to a different field each time:

    add_to->set_XXX(add_from.XXX() + add_to->XXX());

In C++, we can define a macro to implement this:

void AddStats(const Stats& add_from, Stats* add_to) {
    #define ADD_FIELD(field) add_to->set_##field(add_from.field() + add_to->field())

    ADD_FIELD(total_memory);
    ADD_FIELD(free_memory);
    ADD_FIELD(swap_memory);
    ADD_FIELD(status_string);
    ADD_FIELD(num_processes);
    ...
    #undef ADD_FIELD
}

Now that we’ve stripped away all the clutter, you can look at the code and immediately understand the essence of what’s happening. It’s very clear that each line is doing the same thing.

Note that we’re not advocating using macros very often—in fact, we usually avoid them because they can make code confusing and introduce subtle bugs. But sometimes, as in this case, they’re simple and can provide a clear benefit to readability.

Summary

Giant expressions are hard to think about. This chapter showed a number of ways to break them down so the reader can digest them piece by piece.

One simple technique is to introduce “explaining variables” that capture the value of some large subexpression. This approach has three benefits:

It breaks down a giant expression into pieces.
It documents the code by describing the subexpression with a succinct name.
It helps the reader identify the main “concepts” in the code.

Another technique is to manipulate your logic using De Morgan’s laws—this technique can sometimes rewrite a boolean expression in a cleaner way (e.g., if (!(a && !b)) turns into if (!a || b)).

We showed an example where a complex logical condition was broken down into tiny statements like “if (a < b) ...”. In fact, all of the improved-code examples in this chapter had if statements with no more than two values inside them. This setup is ideal. It may not always seem possible to do this—sometimes it requires “negating” the problem or considering the opposite of your goal.

Finally, even though this chapter is about breaking down individual expressions, these same techniques often apply to larger blocks of code, too. So be aggressive in breaking down complex logic wherever you see it.

^[1]Cowan, N. (2001). The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Behavioral and Brain Sciences, 24, 97–185.