What’s Wrong With GNU make?

ihacku 2015-05-30

展開(kāi)全文

GNU make is a widely used tool for automating software builds. It is the de facto standard build tool on Unix. It is less popular among Windows developers, but even there it has spawned imitators such as Microsoft’s nmake.

Despite its popularity, make is a deeply flawed tool. Its reliability is suspect; its performance is poor, especially for large projects; and its makefile language is arcane and lacks basic language features that we take for granted in other programming languages.

Admittedly, make is not the only automated build tool. Many other tools have been built to address make‘s limitations. Some of these tools are clearly better than make, but make‘s popularity endures. The goal of this document is, very simply, to educate you about some of the issues with make—to increase awareness of these problems.

Most of the points in this article apply to the original Unix make as well as GNU make. Most people using make today are probably using GNU make, though, so, where differences exist, when we refer to make and “makefiles” here we are speaking of GNU make.

This document assumes that the reader is already familiar at a basic level with make and understands concepts such as rules, targets, and dependencies.

Language Design

Anyone who has written a makefile has probably learned the hard way about one “feature” of its syntax: its use of tabs. Any line specifying a shell command for a rule must begin with a tab. Spaces will not do—it must be a tab.

Unfortunately, this is just one of the many strange aspects of the make language.

Recursive Make

“Recursive make” is a common makefile coding pattern where a rule invokes another session of make. Since each session of make only reads in one top-level makefile, this is a natural way to build a makefile for a project consisting of several submodules.

Recursive make causes so many problems that a classic article Recursive Make Considered Harmful was written describing what’s wrong with it. The article makes many valid points, some of which are discussed later in this document, but it’s genuinely difficult to write makefiles that do not use recursive make.

The Parser

Most programming language parsers follow a similar pattern. First, the input is “tokenized” or “scanned,” discarding comments and whitespace and converting freeform text into a stream of “tokens,” such as symbols, identifiers, and reserved words. The resulting token stream is “parsed” using a grammar that specifies what combinations and orderings of tokens are legal. Finally, the resulting “parse tree” is interpreted, compiled, etc.

make‘s parser does not follow this standard model. You can’t parse a makefile without also executing it. Variable substitution can happen almost anywhere, and if you don’t know the value of a variable, you can’t continue parsing. It is challenging to write other tools that can parse makefiles, because you must reimplement the whole language.

There’s no clear separation of tokens in make. Take the handling of commas. Sometimes a comma is part of a string and has no special meaning:

X = y,z

Sometimes a comma separates the strings being compared in an if statement:

ifeq ($(X),$(Y))

Sometimes a comma separates arguments to a function:

$(filter %.c,$(SRC_FILES))

But sometimes, even within arguments to a function, a comma is just part of a string:

$(filter %.c,a.c b.c c.cpp d,e.c)

(since filter only takes two arguments, that last comma doesn’t introduce a new argument; it’s just another character in the second argument)

Whitespace follows a similarly obscure set of rules. Sometimes whitespace matters; sometimes it doesn’t. Strings are not quoted, so it’s not visually clear which whitespaces matter. Since there is no “l(fā)ist” data type, only strings, whitespace must be used to separate elements in lists. This leads to a lot of complexity if a filename ever includes a space.

The following example illustrates the confusing treatment of whitespace. An obscure trick is required to create a variable whose value ends with a space. (Normally trailing whitespace is swallowed by the parser, but this happens before, not after variable substitution.)

NOTHING := SPACE := $(NOTHING) $(NOTHING) CC_TARGET_PREFIX := -o$(SPACE) # now I can write rules like $(CC_TARGET_PREFIX)$@

We’ve only scratched the surface with commas and whitespace. Few people understand all the intricacies of the make parser.

Uninitialized Variables and Environment Variables

If a makefile accesses an undefined variable, make does not generate an error. Instead, it obtains the variable’s initial value from the identically named environment variable in the calling shell. If the environment variable doesn’t exist, the variable starts as an empty string.

This leads to two types of problems. First, typos are not caught and flagged as errors. (You can pass make an argument to flag these as warnings, but this isn’t the default, and sometimes an uninitialized variable is used intentionally.) Second, environment variables may unexpectedly interfere with your makefile code. You can’t predict what environment variables the user might set, so to be safe you must carefully initialize every variable before you reference it or append to it using +=.

There is also a confusing distinction between the results of make FOO=1 vs. the results of export FOO=1 followed by make. In the former, a line in the makefile FOO = 0 will have no effect! Instead, you must write override FOO = 0.

Conditional Syntax

One major weakness of the make language is its limited support for “if” conditionals. (Conditional statements are especially important in cross-platform makefiles.) Recent versions of make have helped matters by introducing an “else if” syntax. However, there are still only four basic variants of “if”: ifeq, ifneq, ifdef, and ifndef. If your conditional is more complex, requiring “and”, “or”, and “not” clauses, very cumbersome code is required.

Suppose we want to detect the Linux/x86 target platform. The following hack is a common way of faking out the existence of an “and” conditional:

ifeq ($(TARGET_OS)-$(TARGET_CPU),linux-x86) foo = bar endif

“or” is not as easy. Suppose we want to detect x86 or x86_64, and suppose foo = bar is really a placeholder for 10 or more lines of code that we don’t want to replicate. We are left with unpleasant options such as:

# Terse but somewhat confusing ifneq (,$(filter x86 x86_64,$(TARGET_CPU)) foo = bar endif

# Verbose but easier to understand ifeq ($(TARGET_CPU),x86) TARGET_CPU_IS_X86 := 1 else ifeq ($(TARGET_CPU),x86_64) TARGET_CPU_IS_X86 := 1 else TARGET_CPU_IS_X86 := 0 endif ifeq ($(TARGET_CPU_IS_X86),1) foo = bar endif

A lot of makefile code could be simplified if the language supported a full-fledged expression syntax.

Two Types of Variables

There are two types of variable assignments in make. := evaluates its right-hand side immediately; = evaluates it later when the variable is referenced. The former is how most other programming languages work and tends to be more efficient, particularly if the expression is expensive to evaluate. The latter, however, is more common in most makefiles.

There are some valid reasons for using = (deferred evaluation), but often it can be eliminated with careful makefile design. Aside from the performance problem, deferred evaluation makes it more difficult to read and understand makefile code.

Normally, you can read a program from top to bottom—the same order that the statements are executed—and know exactly what the state of the program is at each point in time. With deferred evaluation, you cannot know the value of a variable without knowing what happens later in the program, too. A variable’s value can change without your directly modifying it. If you try to debug a makefile using “debug prints” such as:

$(warning VAR=$(VAR))

…you may not get the information you want.

Pattern Rules and Search Paths

Some rules use % characters to represent an arbitrary filename—the rule transformation one class of file to another. For example, a %.o: %.c rule compiles a .c source file into an .o object file.

Suppose we need to build an object file foo.o, but foo.c lives somewhere other than the current directory. make‘s vpath feature tells it where to look for these files. Unfortunately, if there are two files named foo.c in the vpath directories for %.c, it may select the wrong one.

The following standard makefile coding pattern falls apart if two source files have the same name—even if one of them is unused and just happens to live in the same directory as another one that you do use. The problem is that the mapping from source path to object path loses information, but make‘s design requires it to attempt to reverse this mapping.

O_FILES := $(patsubst %.c,%.o,$(notdir $(C_FILES))) vpath %.c $(sort $(dir $(C_FILES))) $(LIB): $(O_FILES)

Other Missing Language Features

make has no data types other than strings. There is no Boolean type, list type, or hash/dictionary type.

There is no scoping. All variables are global.

Support for looping is limited. $(foreach) will evaluate an expression several times and concatenate the results, but the resulting string is still just a variable expansion. For example, you can’t use $(foreach) to create a family of related rules.

User-defined functions exist but have the same limitations as $(foreach). They are just variable expansions and cannot use the full language syntax or create rules.

Reliability

make‘s reliability is poor, especially for larger or incremental builds. Sometimes a build fails with a strange error and you need to resort to “voodoo magic” like typing make clean, hoping this will fix things. Sometimes (more dangerous) it appears to succeed, but something wasn’t rebuilt and you’ll get mysterious crashes, etc. at runtime.

Missing Dependencies

You must tell make all of the dependencies of each target. If you don’t tell it about a dependency, it won’t rebuild the target when that dependency changes.

For C/C++, many compilers can output dependency information in a format that make understands. For other tools, though, dependencies are invariably incomplete. Consider a Python script that imports other Python modules. A change to the script may change the script’s output; this is easy to remember and code into the makefile. But a change to one of those modules may also change the script’s output. It’s challenging to list all these dependencies and keep them up to date.

Last-Modified Timestamps

make determines whether a target is out of date by comparing its last-modified timestamp against those of its dependencies. It does not examine the contents of the files, only their timestamps.

File system timestamps are not especially reliable, particularly in a networked environment. Systems’ clocks drift out of sync. Sometimes clocks go backwards in time. Sometimes programs explicitly set a file’s timestamp, wiping out the real last-modified time.

When these things happen, make doesn’t rebuild things that needed to be rebuilt, resulting in an incomplete build.

Command Line Dependencies

When a program’s command line arguments change, its output might also change. (An example: changing the -D options passed to the C preprocessor.) make doesn’t rebuild in this case, resulting in incorrect incremental builds.

You can work around this by having each rule depend on Makefile. This is error-prone, because you might forget to do this on a particular rule. Also, Makefile might include other makefiles, which in turn might include other makefiles; you must list all of them and keep the list up to date. Further, many makefile changes are innocuous. You probably don’t want to rebuild every target in a makefile just because you changed a comment.

Environment Variable Inheritance and Dependencies

Not only does every environment variable turn into a make variable, but also these environment variables are passed on to the programs make runs. Since every user has different environment variables set, two users running the same build may unexpectedly get different behavior.

Changing an environment variable exported to a child process might affect its output, so this ought to trigger a rebuild if you want to be completely safe. make does not rebuild when this happens.

Multiple Concurrent Sessions

If you run two instances of make in the same directory tree at the same time, they will collide with one another when they try to build the same files. Most likely, one or both will die with an error.

Editing Files During a Build

If you edit and save a file in the middle of a make session, the results are unpredictable. It may correctly pick up the changes; it may not, and you’ll have to type make again; or, if you are unlucky, depending on timing, you may end up with a tree where some targets are stale but cannot be fixed with another make.

Cleaning Up Old Files

Suppose your project used to have a source file foo.c, but that file was deleted and removed from the makefile. The object file foo.o built from foo.c will stay around. This is usually acceptable, but these old files may accumulate over time, and sometimes this can cause problems. For example, they may be erroneously picked up as part of a vpath search.

Another example: suppose that a file that was previously generated by the build is checked into revision control, and the rule that generated it is removed from the makefile. Revision control systems will usually not overwrite the old autogenerated file, out of fear that they might destroy something important. If you don’t notice this error message, delete the file manually, and re-update your tree, you will be using a stale version of the file.

Path Canonicalization

Files have more than one path. Even ignoring both hard and symbolic links, foo.c, ./foo.c, ../bar/foo.c, and /home/user/bar/foo.c might all be the same file. make should treat them the same for purposes of walking the dependency tree, but it doesn’t.

This problem is worse under Windows, where the file system is not case sensitive.

After a Failed or Cancelled Build

Once one build has failed, further incremental builds may be unsafe. In particular, after a command fails, make does not delete the partially built output file! If you type make again, it might conclude that the file is already up to date and try to use it. make has an option to delete these files, but it isn’t the default.

Hitting Ctrl-C during a build can also leave your tree in a suspect state.

Any time you run into a problem with an incremental build, your tree is suspect—if one file wasn’t rebuilt properly, who knows how many others weren’t? If this happens, you should probably start again from a clean tree.

A misbehaving process can also do a lot of harm to your tree. If a build step malfunctions and starts overwriting or deleting random files in your tree, make clean won’t be good enough. You will probably have to set up a brand new tree from scratch. (Hopefully the runaway process didn’t destroy your changes.)

Performance

make‘s performance scales poorly (nonlinearly) with project size.

Incremental Build Performance

One would hope that rebuilding a project would take time proportional to the number of targets that need to be rebuilt. Unfortunately, this is not the case.

Since make‘s incremental build reliability is suspect, users must do clean builds on a regular basis, either as necessary (whenever you hit a build error, try a clean build) or even all the time (out of paranoia). Better to be safe and wait for a clean build than to take the risk that a build appears to pass but is out of sync with the source code.

A file’s last-modified timestamp can change without its contents changing. This leads to unnecessary rebuilds.

A buggy makefile may list too many dependencies, so a target may be rebuilt even though none of its (real) dependencies have changed. Careless use of “phony” rules is another common problem (phony rules must always be rerun).

Even if your makefiles are bug-free and your incremental builds are perfectly reliable, performance is less than ideal. Suppose you edit a single .c file (not a header file) in a large project. If you type make from the top of the project, make must reparse all the makefiles, recursively invoking itself many times, and walk the dependency tree of all targets to see whether any of them need to be rebuilt. The time spent running the compiler may only be a small fraction of the total time.

Recursive Make and Performance

Sloppy use of recursive make is particularly dangerous. Suppose that your project consists of two executables A and B, which both depend on a library C. The top makefile needs to recurse into directories A and B, obviously. We’d also like to be able to type make from A or B if we only want to build A or only B, so we might have those makefiles recurse into sibling directory ../C. Now suppose we type make from the top of the tree; we’ll recurse into C twice!

This is mostly innocuous in this example, but in large projects, a single directory might be visited dozens of times. Each time, it has to be reparsed and its targets’ dependency trees must be walked. make has no built-in safeguards against this.

Parallel Make

make‘s “parallel make” promises large speedups, especially with the increasing popularity of multi-core CPUs. Unfortunately, the delivery falls short of the promise.

Parallel make‘s output is difficult to read. It’s hard to see which warnings, etc. are associated with which commands when several processes are running concurrently in the same shell.

Parallel make is exceptionally sensitive to correct dependency specification. If two rules are not connected through dependencies, make assumes that those rules can run in any order, or for that matter in parallel. When running a makefile serially, make tends to behave predictably: if A depends on B and C, then first B will be built, then C, then A. Of course make is free to build C before B, but with serial make, the order is deterministic.

With parallel make, B and C may (but are not guaranteed to) build in parallel. If C depends on B having run first, but this dependency was not spelled out in the makefile, C’s build will probably fail (but may not, depending on timing). Parallel make tends to flush out these missing dependencies from makefiles. That’s not a bad thing, since missing dependencies cause other problems and it’s good to find and fix them, but the practical effect is that using parallel make on a large project is frustrating.

Parallel make‘s interactions with recursive make are problematic. Each session of make is independent, so each one attempts to parallelize the build independently of the others and has an incomplete view of the overall dependency graph. We face a tradeoff between reliability and performance. On one hand we would like to parallelize builds not just within a single makefile, but across all of the makefiles. But since make doesn’t know about inter-makefile dependencies, fully parallelizing submakes does not work.

Certain submakes can be run in parallel, while others must be run serially. Specifying these dependencies is clumsy, and it’s easy to forget some of them. It’s tempting to fall back to the safe option of walking the makefile tree serially and parallelizing only within a single makefile at a time, but this greatly reduces parallelism, particularly in incremental builds.

Automatic Dependency Generation with Microsoft Visual C++

Many compilers, like gcc, can output dependency information in a format that make understands. Unfortunately, Microsoft Visual C++ cannot. It has a command line option /showIncludes, however, that can print this information, and another script can postprocess it into a form that make understands. This requires running an extra script for every C file. Launching (for example) the Python interpreter for each C file is not cheap.

Builtin Rules

make has numerous builtin rules. These slightly simplify coding small makefiles, but medium to large projects usually override them. They hurt performance because make walks these extra pattern rules trying to find ways to build files. Many of these rules are obsolete—for example, two of them are intended for use with the RCS and SCCS revision control systems, which very few people still use—and yet they slow down everyone’s builds.

You can disable them on the command line with make -r, but this is not the default. There is a line you can add to your makefile to remove them, but this is also not the default and many people forget to add it.

Miscellaneous

There are a few other issues with make that don’t fit cleanly into the categories above.

Silence is Golden

According to Eric Raymond, “One of Unix’s oldest and most persistent design rules is that when a program has nothing interesting or surprising to say, it should shut up. Well-behaved Unix programs do their jobs unobtrusively, with a minimum of fuss and bother. Silence is golden.” make does not follow this rule.

When you type make, by default, the log includes every program’s full command line and everything it prints to stdout and stderr. This is too much information. Important warning/error messages are buried in the output, and the text can scroll by so quickly as to make it unreadable.

You can suppress a lot of this output with make -s, but this is not the default. Also, there is no intermediate mode where make tells you what file it is currently building, without printing out the command lines.

Multi-Target Rules

Some tools produce more than one output file, but make rules can only have one target. If you try to write a dependency on an additional output file that wasn’t listed as the target of the rule, make will not recognize the connection between the two rules.

Warnings That Should Be Errors

make prints a warning but does not abort with an error if it detects a circular dependency. This likely indicates a serious makefile bug, but make treats it as a minor problem.

Likewise, make prints a warning but does not abort with an error if there are two rules describing how to build one target. It simply ignores one of them. Again, this is a serious makefile bug, but make does not treat it as such.

Creating Output Directories

It is useful to put output files for separate build configurations in separate output directories, so that you don’t have to rebuild from scratch when you switch build configurations. For example, you might put debug binaries in a debug directory and release binaries in a release directory. Before you can create files in these directories, you must first create the directories.

It would be nice if make did this automatically—clearly you can’t build a file when the directory it’s supposed to live in doesn’t exist yet—but it doesn’t.

It’s not very practical to put a mkdir -p $(dir $@)) command at the start of every rule. It would be inefficient, too, because you’d be running mkdir many times on the same directory. You’d also have to ignore the errors if the directory already exists.

You might try to solve this problem as follows:

debug/%.o: %.c debug $(CC) -c $< -o $@

debug: mkdir $@

This looks like it ought to work—if debug doesn’t exist, create it before trying to build debug/foo.o—but it doesn’t. Creating a new directory entry increases the directory’s last modified time. Suppose we are building both debug/foo.o and debug/bar.o. Creating debug/bar.o will increase debug‘s last-modified time. Now debug‘s last-modified time will be newer than debug/foo.o‘s, so the next time we type make, debug/foo.o will unnecessarily be rebuilt. If “rebuilding” it actually deletes it and then creates a new file, rather than truncating the existing file, you get a never-ending cycle of rebuilds.

The solution is to depend on a file rather than a directory (debug/dummy.txt rather than debug). This requires an extra unnecessary build step (touch debug/dummy.txt), and it can interact poorly with make‘s automatic deletion of “intermediate” files generated during a build. If you’re not careful to make every target depend on its dummy.txt file, you may also have problems with parallel make.

Conclusion

make is a popular but flawed tool. You could do worse than to use make, but you could also do better. If you are working on a large software project, you should consider using an alternative tool. If you must use make, you should be aware of its defects.

Did you find this whitepaper interesting or valuable? If so, you might want to subscribe to our blog, which features regular posts on software engineering topics. You might also want to check out our product Cascade, which helps automate builds and regression tests.

本站是提供個(gè)人知識(shí)管理的網(wǎng)絡(luò)存儲(chǔ)空間，所有內(nèi)容均由用戶發(fā)布，不代表本站觀點(diǎn)。請(qǐng)注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購(gòu)買等信息，謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請(qǐng)點(diǎn)擊一鍵舉報(bào)。

轉(zhuǎn)藏 分享

QQ空間 QQ好友新浪微博微信

獻(xiàn)花（0） +1

來(lái)自： ihacku > 《我的圖書館》

舉報(bào)/認(rèn)領(lǐng)