Whether you’re naming a variable, a function, or a class, a lot of the same principles apply. We like to think of a name as a tiny comment. Even though there isn’t much room, you can convey a lot of information by choosing a good name.
Pack information into your names.
A lot of the names we see in programs are vague,
like tmp
. Even words that may seem
reasonable, such as size
or get
, don’t pack much information. This chapter
shows you how to pick names that do.
This chapter is organized into six specific topics:
Choosing specific words
Avoiding generic names (or knowing when to use them)
Using concrete names instead of abstract names
Attaching extra information to a name, by using a suffix or prefix
Deciding how long a name should be
Using name formatting to pack extra information
Part of “packing information into names” is choosing words that are very specific and avoiding “empty” words.
For example, the word “get” is very unspecific, as in this example:
def GetPage(url):
...
The word “get” doesn’t really say much. Does
this method get a page from a local cache, from a database, or from the
Internet? If it’s from the Internet, a more specific name might be
FetchPage()
or DownloadPage()
.
Here’s an example of a BinaryTree
class:
class BinaryTree {
int Size();
...
};
What would you expect the Size()
method to return? The height of the tree,
the number of nodes, or the memory footprint of the tree?
The problem is that Size()
doesn’t convey much information. A more
specific name would be Height()
,
NumNodes()
, or MemoryBytes()
.
As another example, suppose you have some sort
of Thread
class:
class Thread {
void Stop();
...
};
The name Stop()
is okay, but depending on what exactly it
does, there might be a more specific name. For instance, you might call it
Kill()
instead, if it’s a heavyweight
operation that can’t be undone. Or you might call it Pause()
, if there is a way to Resume()
it.
Don’t be afraid to use a thesaurus or ask a friend for better name suggestions. English is a rich language, and there are a lot of words to choose from.
Here are some examples of a word, as well as more “colorful” versions that might apply to your situation:
Word | Alternatives |
---|---|
send | deliver, dispatch, announce, distribute, route |
find | search, extract, locate, recover |
start | launch, create, begin, open |
make | create, set up, build, generate, compose, add, new |
Don’t get carried away, though. In PHP, there
is a function to explode()
a string.
That’s a colorful name, and it paints a good picture of breaking
something into pieces, but how is it any different from split()
? (The two functions are different, but
it’s hard to guess their differences based on the name.)
Names like tmp
, retval
,
and foo
are usually cop-outs that mean “I can’t think of a name.”
Instead of using an empty name like this, pick a
name that describes the entity’s value or purpose.
For example, here’s a JavaScript function that
uses retval
:
var euclidean_norm = function (v) { var retval = 0.0; for (var i = 0; i < v.length; i += 1) retval += v[i] * v[i]; return Math.sqrt(retval); };
It’s tempting to use retval
when you can’t think of a better name for
your return value. But retval
doesn’t contain much information other than “I am a return value” (which
is usually obvious anyway).
A better name would describe the purpose of the variable or the
value it contains. In this case, the variable is accumulating the sum of
the squares of v
. So a better name is
sum_squares
. This would announce the
purpose of the variable upfront and might help catch a bug.
For instance, imagine if the inside of the loop were accidentally:
retval += v[i];
This bug would be more obvious if the name were
sum_squares
:
sum_squares += v[i]; // Where's the "square" that we're summing? Bug!
The name retval
doesn’t pack
much information. Instead, use a name that describes the variable’s
value.
There are, however, some cases where generic names do carry meaning. Let’s take a look at when it makes sense to use them.
Consider the classic case of swapping two variables:
if (right < left) { tmp = right; right = left; left = tmp; }
In cases like these, the name tmp
is perfectly fine. The variable’s sole
purpose is temporary storage, with a lifetime of only a few lines. The
name tmp
conveys specific meaning to
the reader—that this variable has
no other duties. It’s not being passed around to other functions or
being reset or reused multiple times.
But here’s a case where tmp
is just used out of laziness:
String tmp = user.name(); tmp += " " + user.phone_number(); tmp += " " + user.email(); ... template.set("user_info", tmp);
Even though this variable has a short
lifespan, being temporary storage isn’t the most important thing about
this variable. Instead, a name like user_info
would be more descriptive.
In the following case, tmp
should be in the name, but just as a
part of it:
tmp_file = tempfile.NamedTemporaryFile() ... SaveData(tmp_file, ...)
Notice that we named the variable tmp_file
and not just tmp
, because it is a file object. Imagine if
we just called it tmp
:
SaveData(tmp, ...)
Looking at just this one line of code, it
isn’t clear if tmp
is a file, a
filename, or maybe even the data being written.
The name tmp
should be used
only in cases when being short-lived and temporary is the most
important fact about that variable.
Names like i
, j
,
iter
, and it
are commonly used as indices and loop iterators. Even though these names are
generic, they’re understood to mean “I am an iterator.” (In fact, if you
used one of these names for some other purpose, it
would be confusing—so don’t do that!)
But sometimes there are better iterator names
than i
, j
, and k
.
For instance, the following loops find which users belong to which
clubs:
for (int i = 0; i < clubs.size(); i++) for (int j = 0; j < clubs[i].members.size(); j++) for (int k = 0; k < users.size(); k++) if (clubs[i].members[k] == users[j]) cout << "user[" << j << "] is in club[" << i << "]" << endl;
In the if
statement, members[]
and
users[]
are using the wrong index.
Bugs like these are hard to spot because that line of code seems fine in
isolation:
if (clubs[i].members[k] == users[j])
In this case, using more precise names may have helped. Instead of
naming the loop indexes (i
,j
,k
),
another choice would be (club_i
,
members_i
, users_i
) or, more succinctly (ci
, mi
,
ui
). This approach would help the bug
stand out more:
if (clubs[ci].members[ui] == users[mi]) # Bug! First letters don't match up.
When used correctly, the first letter of the index would match the first letter of the array:
if (clubs[ci].members[mi] == users[ui]) # OK. First letters match.
As you’ve seen, there are some situations where generic names are useful.
If you’re going to use a generic name like tmp
, it
,
or retval
, have a good reason for
doing so.
A lot of the time, they’re overused out of pure laziness. This is
understandable—when nothing better comes to mind, it’s easier to just
use a meaningless name like foo
and
move on. But if you get in the habit of taking an extra few seconds to
come up with a good name, you’ll find your “naming muscle” builds
quickly.
When naming a variable, function, or other element, describe it concretely rather than abstractly.
For example, suppose you have an internal
method named ServerCanStart()
, which
tests whether the server can listen on a given TCP/IP port. The name
ServerCanStart()
is somewhat abstract,
though. A more concrete name would be CanListenOnPort()
. This name directly describes
what the method will do.
The next two examples illustrate this concept in more depth.
Here’s an example from the codebase at Google. In C++, if you don’t define a copy constructor or assignment operator for your class, a default is provided. Although handy, these methods can easily lead to memory leaks and other mishaps because they’re executed “behind the scenes” in places you might not have realized.
As a result, Google has a convention to disallow these “evil” constructors, using a macro:
class ClassName {
private:
DISALLOW_EVIL_CONSTRUCTORS(ClassName);
public:
...
};
This macro was defined as:
#define DISALLOW_EVIL_CONSTRUCTORS(ClassName) \
ClassName(const ClassName&); \
void operator=(const ClassName&);
By placing this macro in the private:
section of a class, these two methods
become private, so that they can’t be used, even accidentally.
The name DISALLOW_EVIL_CONSTRUCTORS
isn’t very good,
though. The use of the word “evil” conveys an overly strong stance on a
debatable issue. More important, it isn’t clear what that macro is
disallowing. It disallows the operator=()
method, and that isn’t even a
“constructor”!
The name was used for years but was eventually replaced with something less provocative and more concrete:
#define DISALLOW_COPY_AND_ASSIGN(ClassName) ...
One of our programs had an optional command-line flag named --run_locally
. This flag would cause the program to print extra debugging
information but run more slowly. The flag was typically used when
testing on a local machine, like a laptop. But when the program was
running on a remote server, performance was important, so the flag
wasn’t used.
You can see how the name --run_locally
came about, but it has some
problems:
A new member of the team didn’t know what it did. He would use it when running locally (imagine that), but he didn’t know why it was needed.
Occasionally, we needed to print
debugging information while the program ran remotely. Passing
--run_locally
to a program that
is running remotely looks funny, and it’s just confusing.
Sometimes we would run a performance test
locally and didn’t want the logging slowing it down, so we wouldn’t
use --run_locally
.
The problem is that --run_locally
was named after the circumstance
where it was typically used. Instead, a flag name like --extra_logging
would be more direct and
explicit.
But what if --run_locally
needs to do more than just extra
logging? For instance, suppose that it needs to set up and use a special
local database. Now the name --run_locally
seems more tempting because it
can control both of these at once.
But using it for that purpose would be
picking a name because it’s vague and indirect,
which is probably not a good idea.
The better solution is to create a second flag named --use_local_database
.
Even though you have to use two flags now, these flags are much more
explicit; they don’t try to smash two orthogonal ideas into one, and
they give you the option of using just one and not the other.
As we mentioned before, a variable’s name is like a tiny comment. Even though there isn’t much room, any extra information you squeeze into a name will be seen every time the variable is seen.
So if there’s something very important about a variable that the reader must know, it’s worth attaching an extra “word” to the name. For example, suppose you had a variable that contained a hexadecimal string:
string id; // Example: "af84ef845cd8"
You might want to name it hex_id
instead, if it’s important for the reader
to remember the ID’s format.
If your variable is a measurement (such as an amount of time or a number of bytes), it’s helpful to encode the units into the variable’s name.
For example, here is some JavaScript code that measures the load time of a web page:
var start = (new Date()).getTime(); // top of the page ... var elapsed = (new Date()).getTime() - start; // bottom of the page document.writeln("Load time was: " + elapsed + " seconds");
There is nothing obviously wrong with this
code, but it doesn’t work, because getTime()
returns milliseconds, not
seconds.
By appending _ms
to our variables, we can make everything
more explicit:
var start_ms = (new Date()).getTime(); // top of the page ... var elapsed_ms = (new Date()).getTime() - start_ms; // bottom of the page document.writeln("Load time was: " + elapsed_ms / 1000 + " seconds");
Besides time, there are plenty of other units that come up in programming. Here is a table of unitless function parameters, and better versions that include the units:
This technique of attaching extra information to a name isn’t limited to values with units. You should do it any time there’s something dangerous or surprising about the variable.
For example, many security exploits come
from not realizing that some data your program receives is not yet in a
safe state. For this, you might want to use variable names like untrustedUrl
or unsafeMessageBody
. After calling functions
that cleanse the unsafe input, the resulting variables might be trustedUrl
or safeMessageBody
.
The following table shows additional examples of when extra information should be encoded in the name:
You shouldn’t use attributes like unescaped_
or _utf8
for every variable
in your program. They’re most important in places where a bug can easily
sneak in if someone mistakes what the variable is, especially if the
consequences are dire, as with a security bug. Essentially, if it’s a critical thing to understand, put
it in the name.
When picking a good name, there’s an implicit constraint that the name shouldn’t be too long. No one likes to work with identifiers like this:
newNavigationControllerWrappingViewControllerForDataSourceOfClass
The longer a name is, the harder it is to remember, and the more space it consumes on the screen, possibly causing extra lines to wrap.
On the other hand, programmers can take this
advice too far, using only single-word (or single-letter) names. So how
should you manage this trade-off? How do you decide between naming a
variable d
, days
, or days_since_last_update
?
This decision is a judgment call whose best answer depends on exactly how that variable is being used. But here are some guidelines to help you decide.
When you go on a short vacation, you typically pack less luggage than if you go on a long vacation. Similarly, identifiers that have a small “scope” (how many other lines of code can “see” this name) don’t need to carry as much information. That is, you can get away with shorter names because all that information (what type the variable is, its initial value, how it’s destroyed) is easy to see:
if (debug) { map<string,int> m; LookUpNamesNumbers(&m); Print(m); }
Even though m
doesn’t pack any information, it’s not a
problem, because the reader already has all the information she needs to
understand this code.
However, suppose m
were a class member or a global variable,
and you saw this snippet of code:
LookUpNamesNumbers(&m); Print(m);
This code is much less readable, as it’s
unclear what the type or purpose of m
is.
So if an identifier has a large scope, the name needs to carry enough information to make it clear.
There are many good reasons to avoid long names, but “they’re harder to type” is no longer one of them. Every programming text editor we’ve seen has “word completion” built in. Surprisingly, most programmers aren’t aware of this feature. If you haven’t tried this feature on your editor yet, please put this book down right now and try it:
It’s surprisingly accurate. It works on any type of file, in any language. And it works for any token, even if you’re typing a comment.
Programmers sometimes resort to acronyms and abbreviations to keep
their names small—for example, naming a class BEManager
instead of BackEndManager
. Is this shrinkage worth the
potential confusion?
In our experience, project-specific abbreviations are usually a bad idea. They appear cryptic and intimidating to those new to the project. Given enough time, they even start to appear cryptic and intimidating to the authors.
So our rule of thumb is: would a new teammate understand what the name means? If so, then it’s probably okay.
For example, it’s fairly common for
programmers to use eval
instead of
evaluation
, doc
instead of document
, str
instead of string
. So a new teammate seeing FormatStr()
will probably understand what that
means. However, he or she probably won’t understand what a BEManager
is.
The way you use underscores, dashes, and capitalization can also pack more information in a name. For example, here is some C++ code that follows the formatting conventions used for Google open source projects:
static const int kMaxOpenFiles = 100; class LogReader { public: void OpenFile(string local_file); private: int offset_; DISALLOW_COPY_AND_ASSIGN(LogReader); };
Having different formats for different entities is like a form of syntax highlighting—it helps you read the code more easily.
Most of the formatting in this example is
pretty common—using CamelCase
for class
names, and using lower_separated
for
variable names. But some of the other conventions may have surprised
you.
For instance, constant values are of the form
kConstantName
instead of CONSTANT_NAME
. This style has the benefit of
being easily distinguished from #define
macros, which are MACRO_NAME
by
convention.
Class member variables are like normal
variables, but must end with an underscore, like offset_
. At first, this convention may seem
strange, but being able to instantly distinguish members from other
variables is very handy. For instance, if you’re glancing through the code
of a large method, and see the line:
stats.clear();
you might ordinarily wonder,
Does stats
belong to this class? Is this code changing the internal state
of the class? If the member_
convention is used, you can quickly conclude, No,
stats
must be a local
variable. Otherwise it would be named stats_
.
Depending on the context of your project or language, there may be other formatting conventions you can use to make names contain more information.
For instance, in JavaScript: The
Good Parts (Douglas Crockford, O’Reilly, 2008), the
author suggests that “constructors” (functions intended to be called with new
) should be capitalized and that ordinary
functions should start with a lowercase letter:
var x = new DatePicker(); // DatePicker() is a "constructor" function var y = pageHeight(); // pageHeight() is an ordinary function
Here’s another JavaScript example: when calling the jQuery library
function (whose name is the single character $
), a useful convention is to prefix jQuery
results with $
as well:
var $all_images = $("img"); // $all_images is a jQuery object
var height = 250; // height is not
Throughout the code, it will be clear that
$all_images
is a jQuery result
object.
Here’s a final example, this time about
HTML/CSS: when giving an HTML tag an id
or class
attribute, both underscores and dashes
are valid characters to use in the value. One possible convention is to
use underscores to separate words in IDs and dashes to separate words in
classes:
<div id="middle_column" class="main-content"> ...
Whether you decide to use conventions like these is up to you and your team. But whichever system you use, be consistent across your project.
The single theme for this chapter is: pack information into your names. By this, we mean that the reader can extract a lot of information just from reading the name.
Here are some specific tips we covered:
Use specific
words—for example, instead of Get
, words like Fetch
or Download
might be better, depending on the
context.
Avoid generic
names like tmp
and
retval
, unless there’s a specific
reason to use them.
Use concrete
names that describe things in more detail—the name ServerCanStart()
is vague compared to
CanListenOnPort()
.
Attach important
details to variable names—for example, append _ms
to a variable whose value is in
milliseconds or prepend raw_
to an
unprocessed variable that needs escaping.
Use longer names for larger scopes—don’t use cryptic one- or two-letter names for variables that span multiple screens; shorter names are better for variables that span only a few lines.
Use capitalization, underscores, and so on in a meaningful way—for example, you can append “_” to class members to distinguish them from local variables.