Monthly Archives: August 2015

I received some Illumina data from collaborators without knowing much about how it had been generated.

Inspecting the files I found that the data had already been demultiplexed and stripped of their barcodes. There were also paired reads for each sample. I wasn’t familiar with how to deal with this sort of data, but Robert Edgar has a discussion here, with Example 2 being the appropriate case:

It’s a simple matter to adapt his helpful solution for the multiple file case, but I always find myself googling basic shell scripting so here’s my version.

First we need to get a list of all of the sample names. Assuming that your file names are in the standard form ‘SampleName_L001_R1_001.fastq’, this can be done by the following:

ls *.fastq | awk -F '_L001' '{print $1'} | uniq > sample_names.txt

Then loop through all the samples, doing 1) merging the forward and reverse reads; 2) filtering of the reads; 3) adding the barcode=SampleName annotation; 4) concatenating the reads into a single file.

while read p; do
    echo 'Processing reads for '"$p"''
    usearch61 -fastq_mergepairs ''"$p"'_L001_R1_001.fastq' \
     -reverse ''"$p"'_L001_R2_001.fastq' -fastqout ''"$p"'_merged.fastq'
    usearch61 -fastq_filter ''"$p"'_merged.fastq' \
     -fastaout ''"$p"'_filtered.fa' -fastq_maxee 1.0
    sed '-es/^>\(.*\)/>\1;barcodelabel='"$p"';/' \
    cat ''"$p"'.fa' >> reads.fa
done < sample_names.txt

When running I encountered the error:

Cannot find fastq-join. Is it installed? Is it in your path?

The solution was apparently to install ea-utils, which contains fastq-join.

So I tried that.

However, make failed with the same error as detailed here:!msg/ea-utils/nR5qvhgZKIY/yx5BSEta_dQJ

The trick here was that not everything in ea-utils was required to make fastq-join work, as Eric Aronesty pointed out on the above thread:

“Of course fastq-join doesn’t use sparse-hash… so if you ran “make fastq-join” … it would work, even on a Mac.”

Then of course you need to move the resulting fastq-join binary to somewhere in your path. Now should work fine.