seqfu lanes
Note
This function was called merge
in a pre-release.
lanes is one of the core subprograms of SeqFu, that allows to quickly and easily merge Illumina lanes.
Usage: lanes [options] -o <outdir> <input_directory>
A program to merge Illumina lanes for a whole directory.
Options:
-o, --outdir DIR Output directory
-e, --extension STR File extension [default: .fastq]
-s, --file-separator STR Field separator in filenames [default: _]
--comment-separator STR String separating sequence name and its comment [default: TAB]
-v, --verbose Verbose output
-h, --help Show this help
Input
A directory containing files in the standard Illumina naming scheme, like:
ID1_S99_L001_R1_001.fastq.gz
ID1_S99_L001_R2_001.fastq.gz
ID1_S99_L002_R1_001.fastq.gz
ID1_S99_L002_R2_001.fastq.gz
ID1_S99_L003_R1_001.fastq.gz
ID1_S99_L003_R2_001.fastq.gz
ID1_S99_L004_R1_001.fastq.gz
ID1_S99_L004_R2_001.fastq.gz
ID2_S99_L001_R1_001.fastq.gz
ID2_S99_L001_R2_001.fastq.gz
ID2_S99_L002_R1_001.fastq.gz
ID2_S99_L002_R2_001.fastq.gz
ID2_S99_L003_R1_001.fastq.gz
ID2_S99_L003_R2_001.fastq.gz
ID2_S99_L004_R1_001.fastq.gz
ID2_S99_L004_R2_001.fastq.gz
ID3_S99_L001_R1_001.fastq.gz
ID3_S99_L001_R2_001.fastq.gz
ID3_S99_L002_R1_001.fastq.gz
ID3_S99_L002_R2_001.fastq.gz
ID3_S99_L003_R1_001.fastq.gz
ID3_S99_L003_R2_001.fastq.gz
ID3_S99_L004_R1_001.fastq.gz
ID3_S99_L004_R2_001.fastq.gz
Performance
If compared with an efficient Bash implementation (as described here), SeqFu is >10X faster.
Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
---|---|---|---|---|
seqfu merge -o /tmp/ data/lane | 2.6 ± 0.9 | 1.6 | 10.4 | 1.00 |
merge_lanes.sh data/lane/ | 31.8 ± 4.0 | 25.4 | 49.5 | 12.42 ± 4.46 |
The merge_lanes.sh script is as follows:
DIR=$PWD
cd $1
ls *R1* | cut -d _ -f 1 | sort | uniq \
| while read id; do \
cat $id*R1*.fastq.gz > $id.R1.fastq.gz;
cat $id*R2*.fastq.gz > $id.R2.fastq.gz;
done
cd $DIR/
rm $1/*.R{1,2}.*
and the test was performed against the /data/lane
directory of SeqFu repository using the hyperfine program.