fu-split
Note
Preliminary version
Split a FASTQ or FASTA files into multiple files, by:
- Number of desired output files (
--num-files INT
) - Number of (max) sequences per file (
--num-seqs INT
) - Number of (max) bases per file (
--num-bases INT
)
An important component to configure the program is the "output file" string, see below.
usage: fu-split [-h] -i INPUT [-I INPUT_REV] -o OUTPUT [--for-tag FOR_TAG] [--rev-tag REV_TAG]
(-n NUM_FILES | -s NUM_SEQS | -b NUM_BASES) [--threads THREADS] [--number-char NUMBER_CHAR] [--compress]
[--verbose] [--debug] [--version]
Split FASTA/FASTQ files into multiple files
optional arguments:
-h, --help show this help message and exit
-i INPUT, --input INPUT
Input file
-I INPUT_REV, --input-rev INPUT_REV
Second input file (reverse pair)
-o OUTPUT, --output OUTPUT
Output file (add a stretch of 3+ zeroes to specify the progressive number), compression will be detected.
Example: parz_0000.fq.gz
--for-tag FOR_TAG Tag to use for the forward reads output (default: _R1
--rev-tag REV_TAG Tag to use for the forward reads output (default: _R2
-n NUM_FILES, --num-files NUM_FILES
Number of desired files
-s NUM_SEQS, --num-seqs NUM_SEQS
Number of sequences per file
-b NUM_BASES, --num-bases NUM_BASES
Number of bases per file
--version show program's version number and exit
Other options:
--threads THREADS Number of threads (-n only) [default: 8
--number-char NUMBER_CHAR
Character used to represent the progressive number in output string [default: 0
--compress Force compression of the output files
--verbose Verbose mode
--debug Debug mode
Output file string
The ideal way to use fu-split
is to use the --output
option to specify the output file format with this apprach: prefix, progressive number, suffix, where progressive number is a stretch of zeroes as long as you would like the progressive number (zeroes can be changed with --number-char
).
Example:
--output parz_0000.fq.gz
: forces output in FASTQ format, compressed with Gzip, with four digits of progressive number.--output parz_000.fa
: forces output in FASTA format, uncompressed, with three digits of progressive number.--output parz
: missing the progressive number part, this will be used as prefix, with a four digits progressive number (not recommended, behaviour can change in the future)
Paired end
Specify both --input
and --input-rev
to split a paired end FASTQ file. The output file will be differentiated with the --for-tag
and --rev-tag
options (default: _R1
and _R2
).
Binary
Export $SEQFU_BIN
to specify the path to SeqFu, otherwise will default to seqfu
(as in $PATH
).