seqfu stats can print the total number of sequences, bases, average, N50, N75, N90 and AuN, minimum and maximum length of a dataset, both in TSV format and with a nicer console oriented output:
┌────────────────────┬───────┬──────────┬───────┬─────┬─────┬─────┬────────┬─────┬─────┐
│ File │ #Seq │ Total bp │ Avg │ N50 │ N75 │ N90 │ auN │ Min │ Max │
├────────────────────┼───────┼──────────┼───────┼─────┼─────┼─────┼────────┼─────┼─────┤
│ filt.fa.gz │ 78730 │ 24299931 │ 308.6 │ 316 │ 316 │ 220 │ 0.385 │ 180 │ 485 │
│ illumina_1.fq.gz │ 7 │ 630 │ 90.0 │ 90 │ 90 │ 90 │ 12.857 │ 90 │ 90 │
│ illumina_2.fq.gz │ 7 │ 630 │ 90.0 │ 90 │ 90 │ 90 │ 12.857 │ 90 │ 90 │
│ illumina_nocomm.fq │ 7 │ 630 │ 90.0 │ 90 │ 90 │ 90 │ 12.857 │ 90 │ 90 │
└────────────────────┴───────┴──────────┴───────┴─────┴─────┴─────┴────────┴─────┴─────┘
Very common tasks when dealing with Illumina Paired-End sequences are interleaving and deinterleaving the datasets.
seqfu interleave and seqfu deinterleave can do that, with high speed and lower corruption risks.
Multiple lanes can be quickly merged with seqfu lanes.
seqfu sort can sort sequences by length.
seqfu derep can be used to dereplicate datasets, printing the number of identical sequences. In particular, this information can be used also from the input dataset, allowing to dereplicating a set of dereplicated files keeping trace of the number of sequences.