seqfu tabulate

Note

This tool has been introduced with seqfu 1.2.2

This program converts a sequence file (FASTA or FASTQ) to a tabular file, and vice versa.

Several Unix tools can process a stream of information line-by-line, and tabular file can be easily modified and filtered piping serveral programs.

This tool will allow to tabulate (convert to TSV) and detabulate (convert to FASTX) sequences.

Usage: tabulate [options] [<file>]

Convert FASTQ to TSV and viceversa. Single end is a 4 columns table (name, comment, seq, qual),
paired end have 4 columns for the R1 and 4 columns for the R2. 
Paired end reads need to be supplied as interleaved.
 

Options:
  -i, --interleaved        Input is interleaved (paired-end)
  -d, --detabulate         Convert TSV to FASTQ (if reading from file is autodetected) 
  -c, --comment-sep CHAR   Separator between name and comment (default: tab)
  -s, --field-sep CHAR     Field separator (default: tab)
  -v, --verbose            Verbose output
  -h, --help               Show this help

Tabular format

The conversion works as follows:

  • FASTA files are converted to a 3 columns table: name, comments and sequence
  • Single-End FASTQ files are converted to a 4 columns table: name, comments, sequence and quality
  • Paired-End FASTQ (interleaved) files are converted to 8 colums table: R1 name, comments, sequence and quality and R2 name, comments, sequence and quality

To allow an efficient use of streams, paired-end datasets need to be interleaved (see: seqfu interleave).

To make an example this FASTQ file:

@read_1 tag=A
ATTACAATTACAATTACAA
+
EAFFFFFABBBAAAAAAAC
@read_2 tag=B
GGACAGAATTACAATTTTT
+
FFFFFFFFABBBAAAAAAA

will become:

read_1    tag=A    ATTACAATTACAATTACAA    EAFFFFFABBBAAAAAAAC
read_2    tag=B    GGACAGAATTACAATTTTT    FFFFFFFFABBBAAAAAAA

Conversions

Sequence to table

A single file can be converted to tabular format.

NOTE: If the file is automatically detected as interleaved (the first and second read have the same name) you can omit -i (or --interleave), but we recommend to use it to make the command clearer.

seqfu tabulate file.fastq | gzip -c > tabular.tab.gz

Table to sequences

When a file is provided, the input format is automatically detected. Otherwise specify -d (or --detabulate to convert from table to FASTX).

seqfu tabulate file.tab > sequences.fq

or, via stream:

cat file.tab.gz | seqfu tabulate  --detabulate > sequences.fq

Pipeline

We designed the tool to provide a simple way to make ad hoc modifications via tabular lines, so a full workflow could be:

seqfu interleave ... | seqfu tabulate | CUSTOM_STEP | seqfu tabulate --detabulate | seqfu deinterleave -o basename -