Overview
SeqFu is a compiled program designed to provide a broad range of utilities for the manipulation of FASTA and FASTQ files, from format conversion to sequence extraction and statistics.
Written in Nim, SeqFu is fast and memory-efficient, making it suitable for both interactive use and large-scale bioinformatics pipelines.
Key Features
- seqfu stats – print sequence statistics from multiple files
- seqfu count – count sequences in FASTA/FASTQ files
- seqfu derep – dereplicate sequences
- seqfu interleave / deinterleave – interleave or deinterleave paired-end reads
- seqfu cat – concatenate FASTA/FASTQ files with filtering options
- seqfu head / tail – extract the first or last sequences
- seqfu grep – extract sequences by name or pattern
- seqfu rc – reverse complement sequences
- seqfu sort / tabulate / view – organise and inspect sequences
Installation
SeqFu can be installed via Bioconda:
conda install -c bioconda seqfu
Example Usage
Print statistics for a set of FASTQ files:
seqfu stats *.fastq.gz
Count sequences across multiple files:
seqfu count data/*.fq.gz
Dereplicate a FASTA file:
seqfu derep --min-size 2 input.fasta > unique.fasta