April 20, 2015

Parallel make & optimizations

Here's a bunch of makefile notes for creating valid parallel makefiles. The problem with writing makefiles and having then executed in parallel is that you'll have to know your source dependencies and resolve them so make does not attempt to build your in the wrong order.

I have header a lot about how bad make is at solving dependencies, and hvor other tool like the latest build in Visual Studio build is much better etc. I strongly disagree. VC hides the dependencies behind a "pointy clicky" scheme. Make dosent assume anything it leaves the dependency problem up to the developers, meanins that the people working on the source should know how to solve their sources dependencies!

As an example, the common idiom for writing recursive makefiles is that directory traversing, or any traversion for that matter, is implemented as a shell loop, or by using the make function "foreach". As both of these methods solve the traversion, they also both have their drawbacks.

The drawback with the shell method is that firstly you'll invoke a separate shell for each directory in the make process, and make cannot know whats going on in these sub shells and therefore make cannot solve any dependencies in them, which in the end usually causes the build to break. The "foreach" method forces you to create a function based makefile, which again makes writing dependencies hard(er).

Makes primary way to handle dependencies is using targets. The usual way is to say i.e.

all: prepare: depend

The above says that to build the all target, you'll have to build first prepare, which in turn depends on depend, meaning that make will have to build depend before prepare. Simple, and as i may add easy to read. For make this means that to start any processes it'll first build depend, and any parts of depend that can be executed in parallel will be so, again this goes for prepare and all. But how do you then traverse a set of subdirectories? Simple, you'll just make your dub directories a target i.e.

SUBDIRS:= src lib interface doc

.PHONY: $(SUBDIRS)
$(SUBDIRS):
            $(MAKE) $(MAKEFLAGS) --directory=$@ $(MAKECMDGOALS)

.PHONY: all
all: $(SUBDIRS)

The snipplet above executes make in the SUBDIRS list, using a make target instead of the common "for" approach. Not each of these SUBDIRS may have a set of dependencies that needs to be resolved, these dependencies are solved in the respective SUBDIRS directory, where they belong.

Lets say, that for the sake of illustration that lib depends on src and interface. Meaning that you cannot build the lib part of the project before both src and interface are finished. All you'd have to do to tell make these dependencies is to add a new target line stating these dependencies i.e

lib: src interface

There, the lib target is now depended on the src and interface targets meaning that make cannot execute the lib target in parallel with either src and interface. So, because the dependencies must be resolved, make will execute src, interface and doc in parallel and once it's done build the lib target. Just as you specified.

Let's say that the src directory is a huge directory, containing roughly 100 different software components for the build. Some set of these components are interdependend. Then, again you'll have to create a makefile containing a SUBDIRS target to traverse the various src components. Now, even though there are 100 components you still have to know which ones are depending on each other, and which can be build in parallel.

Usually, solving your source dependencies are pretty easy, but it could be a major writing task. Because it may only be 4 of the 100 components that must be build first. You can solve this in various ways, by creating 2 targets like SUBDIRS making the higher level components dependend on the lower levels i.e.

LOWLEVEL:=hw io drv os

These are the components lowest level, they must be build prior to other components in the src directory. For the sake of argument, hw contain the memory map, a interrupt layout etc. io contain hardware io, you know hardware ;), drv contain drivers needed by the os to communicate with the hw etc.

Next, the src directory contain a core components part, usually some components that are part of a framework and running on top of the lower levels, this could again be a subset of the 100 directories.
 

CORE:=menu network error

And finally, the rest of the 100 components are applications that all must be build after LOWLEVEL and CORE

APPLICATIONS:=$(filter-out $(LOWLEVEL) $(CORE), $(shell ls -d ./*/))

The applications are found using a shell command, this is because listing everyfile in this directory would be a pain. But you should remember that automatically finding things like this can have its drawbacks.Once you have all the targets in your makefile, you'll have to resolve the dependencies i.e.

$(APPLICATIONS):$(CORE):$(LOWLEVEL)

Should do the trick. Here's a listing of the short make file I used to explore these issues independently with.

SHELL:=/bin/sh
ALL_DIRS:=$(shell ls -d ./*/)
ENV:=${ENV}
LOWLEVEL:=src elektra
STUFF:=Documents Music
CORE:=$(filter-out $(LOWLEVEL) $(STUFF),$(ALL_DIRS))
SUBDIRS:=$(STUFF) $(CORE) $(LOWLEVEL)

.PHONY:
all: $(SUBDIRS)
    @echo $(ENV)

.PHONY: $(SUBDIRS)
$(SUBDIRS):

.PHONY: $(LOWLEVEL)
$(LOWLEVEL):
    @echo "lowlevel-> $@"

.PHONY: $(STUFF)
$(STUFF):
    @echo "stuff-> $@"   

.PHONY: $(CORE)
$(CORE):
    @echo "core-> $@"

$(CORE):$(LOWLEVEL) $(STUFF)

.PHONY: lowlevel
lowlevel: $(LOWLEVEL)

.PHONY: core
core: $(CORE)

.PHONY: stuff
stuff: $(STUFF)


References: makefile_tricks, Recursive make considered harmful