seqfu count

count (or cnt) is one of the core subprograms of SeqFu. It's used to count the sequences in FASTA/FASTQ files, and it's paired-end aware so it will print the count of both files in a single line, but checking that both files have the same number of sequences.

In version 1.5 the program has been redesigned to parse multiple files simultaneously.

Usage: count [options] [<inputfile> ...]

  -a, --abs-path         Print absolute paths
  -b, --basename         Print only filenames
  -u, --unpair           Print separate records for paired end files
  -f, --for-tag R1       Forward tag [default: auto]
  -r, --rev-tag R2       Reverse tag [default: auto]
  -t, --threads INT      Working threads [default: 4]
  -v, --verbose          Verbose output
  -h, --help             Show this help


Input from stream is supported.

Example output

Output is a TSV text with three columns: sample name, number of reads and type ("SE" for Single End, "Paired" for Paired End)

data/test.fastq       3  SE
data/comments.fastq   5  SE
data/test2.fastq      3  SE
data/qualities.fq     5  SE
data/illumina_1.fq.gz 7  Paired

In case of errors will print a warning:

ERROR: Different counts in data/longerone_R1.fq.gz and data/longerone_R2.fq.gz
# data/longerone_R1.fq.gz: 7
# data/longerone_R2.fq.gz: 2


Performance improvement measured on the MiSeq SOP dataset from mothur:

Command Mean [ms] Min [ms] Max [ms] Relative
seqfu count ../mothur-sop/*.fastq -t 4 142.5 ± 5.8 127.3 152.3 1.00
seqfu count ../mothur-sop/*.fastq -t 1 416.5 ± 15.2 397.8 440.9 2.92 ± 0.16
seqfu count-legacy ../mothur-sop/*.fastq 539.2 ± 16.6 519.6 577.4 3.78 ± 0.19

Legacy algorithm

Usage: count-legacy [options] [<inputfile> ...]

  -a, --abs-path         Print absolute paths
  -b, --basename         Print only filenames
  -u, --unpair           Print separate records for paired end files
  -f, --for-tag R1       Forward string, like _R1 [default: auto]
  -r, --rev-tag R2       Reverse string, like _R2 [default: auto]
  -m, --multiqc FILE     Save report in MultiQC format
  -v, --verbose          Verbose output
  -h, --help             Show this help

MultiQC output

Using the --multiqc OUTPUTFILE option it's possible to save a MultiQC compatible file (we recommend to use the projectname_mqc.tsv filename format). After coolecting all the MultiQC files in a directory, using multiqc -f . will generate the MultiQC report. MultiQC itself can be installed via Bioconda with conda install -y -c bioconda multiqc.

To understand how to use MultiQC, if you never did so, check their excellent documentation.


Screenshot of