seqfu count
count (or cnt) is one of the core subprograms of SeqFu. It's used to count the sequences in FASTA/FASTQ files, and it's paired-end aware so it will print the count of both files in a single line, but checking that both files have the same number of sequences.
In version 1.5 the program has been redesigned to parse multiple files simultaneously.
Usage: count [options] [<inputfile> ...]
Options:
-a, --abs-path Print absolute paths
-b, --basename Print only filenames
-u, --unpair Print separate records for paired end files
-f, --for-tag R1 Forward tag [default: auto]
-r, --rev-tag R2 Reverse tag [default: auto]
-t, --threads INT Working threads [default: 4]
-v, --verbose Verbose output
-h, --help Show this help
Streaming
Input from stream is supported.
Example output
Output is a TSV text with three columns: sample name, number of reads and type ("SE" for Single End, "Paired" for Paired End)
data/test.fastq 3 SE
data/comments.fastq 5 SE
data/test2.fastq 3 SE
data/qualities.fq 5 SE
data/illumina_1.fq.gz 7 Paired
In case of errors will print a warning:
ERROR: Different counts in data/longerone_R1.fq.gz and data/longerone_R2.fq.gz
# data/longerone_R1.fq.gz: 7
# data/longerone_R2.fq.gz: 2
Multithreading
Performance improvement measured on the MiSeq SOP dataset from mothur:
Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
---|---|---|---|---|
seqfu count ../mothur-sop/*.fastq -t 4 | 142.5 ± 5.8 | 127.3 | 152.3 | 1.00 |
seqfu count ../mothur-sop/*.fastq -t 1 | 416.5 ± 15.2 | 397.8 | 440.9 | 2.92 ± 0.16 |
seqfu count-legacy ../mothur-sop/*.fastq | 539.2 ± 16.6 | 519.6 | 577.4 | 3.78 ± 0.19 |
Legacy algorithm
Usage: count-legacy [options] [<inputfile> ...]
Options:
-a, --abs-path Print absolute paths
-b, --basename Print only filenames
-u, --unpair Print separate records for paired end files
-f, --for-tag R1 Forward string, like _R1 [default: auto]
-r, --rev-tag R2 Reverse string, like _R2 [default: auto]
-m, --multiqc FILE Save report in MultiQC format
-v, --verbose Verbose output
-h, --help Show this help
MultiQC output
Using the --multiqc OUTPUTFILE
option it's possible to save a MultiQC compatible file (we recommend to use the projectname_mqc.tsv filename format). After coolecting all the MultiQC files in a directory, using multiqc -f .
will generate the MultiQC report. MultiQC itself can be installed via Bioconda with conda install -y -c bioconda multiqc
.
To understand how to use MultiQC, if you never did so, check their excellent documentation.