seqfu orf
Extract open reading frames (ORFs) from nucleotide sequences (FASTA/FASTQ, gzipped supported).
seqfu orf is the preferred command.
The legacy binary fu-orf is still available and accepts the same options.
orf - extract ORF from nucleotide sequences
Usage:
orf [options] <InputFile>
orf [options] -1 File_R1.fq
orf [options] -1 File_R1.fq -2 File_R2.fq
orf --help | --codes
Input files:
-1, --R1 FILE First paired end file
-2, --R2 FILE Second paired end file
ORF Finding and Output options:
-m, --min-size INT Minimum ORF size (aa) [default: 25]
-p, --prefix STRING Rename reads using this prefix
-r, --scan-reverse Also scan reverse complemented sequences
-c, --code INT NCBI Genetic code to use [default: 1]
-l, --min-read-len INT Minimum read length to process [default: 25]
-t, --translate Consider input CDS
Paired-end options:
-j, --join Attempt Paired-End joining
--min-overlap INT Minimum PE overlap [default: 12]
--max-overlap INT Maximum PE overlap [default: 200]
--min-identity FLOAT Minimum sequence identity in overlap [default: 0.80]
Other options:
--codes Print NCBI genetic codes and exit
--pool-size INT Reads per batch [default: 250]
--in-flight-batches INT Max buffered batches before flush; 0 = auto [default: 0]
--verbose Print verbose log
--debug Print debug log
--help Show help
Notes
--max-overlapis a hard cap during paired-end overlap scan.--in-flight-batchescontrols memory/throughput balance:- lower values use less RAM
- higher values can improve throughput
0enables automatic sizing
Examples
Single input file:
seqfu orf --min-size 500 data/orf.fa.gz
Paired-end reads:
seqfu orf --min-size 29 -1 data/illumina_1.fq.gz -2 data/illumina_2.fq.gz
Paired-end with join and tighter memory budget:
seqfu orf -j --min-size 29 --in-flight-batches 4 -1 data/illumina_1.fq.gz -2 data/illumina_2.fq.gz
Legacy equivalent:
fu-orf --min-size 29 -1 data/illumina_1.fq.gz -2 data/illumina_2.fq.gz
Genetic codes
Use --code to select an NCBI genetic code.
Run seqfu orf --codes (or fu-orf --codes) to print supported codes.