Makefile Tip: Keep-Going
This is a small post on patterns in Make in response to a frustrating error in which I would generate, say, 400 tasks for Make but Make would complete only a small subset and quit. Here is a small example and an explanation.
A common workflow in our lab is to receive a set of large paired-end fastq files that needs to be demultiplexed by a barcode and each file needs to be subsequently split.
### Simple Data Processing
###
###
SHELL := /bin/bash
.SECONDARY:
# obtain the sample names from the first column of
# a file. skip the first entry which is a column header
$(samples): $(shell cut -f 1 ${mappingfile} | cut -d$$'\n' -f 2-)
#register fastas/% and denoised/%
$(fastqs): $(patsubst %, fastqs/%.fastq, $(samples))
$(denoised): $(patsubst %, denoised/%.fastq, $(samples))
#demultiplex individual samples/fastqs from the master
generatefastqs.txt:
python generatefastqs.py big.fastq -outdir fastqs
# register the fastqs
$(fastqs): generatefastqs
# process each fastq
$(denoised): denoised/%: fastq/%
denoise -in $< -out $@
all: $(denoised)
Run The Sample and Encounter Premature Termination#
# start make with 20 processes
# this would fail
make all -j 20
Solution!#
# -k will ignore failed subprocesses and attempt each target.
make -k -j 20
In my case some of the files may have too few sequences to generate an output and this would trigger an error in make which would then stop all except the existing processes. Effectively it rendered Make useless as a build tool. The addition og -k of –keep-going forces Make to evaluate all non-dependent jobs. In my case that means that a single low quality or low-abundance sample won’t hold up the entire processing job.