fu-shred

Note

Since 1.18 paired end support was enabled

A program to systematically shotgun a reference (i.e. this does not simulate a random shotgun library preparation, but produce reads of length L sliding over the reference chromosomes at a step S).

This tool is to test the effect of read size alone on alignment and classification methods, and was introduced in SeqFu 1.4.

Usage: fu-shred [options]  [<fastq-file>...]

  Systematically produce a "shotgun" of input sequences. Can read from standard input.

  Options:
    -l, --length INT           Segment length [default: 100]
    -s, --step INT             Distance from one segment start to the following [default: 10] 
    -q, --quality INT          Quality (constant) for the segment, if -1 is 
                               provided will be printed in FASTA [default: 40]
    -r, --add-rc               Print every other read in reverse complement
    -b, --basename             Prepend the file basename to the read name
    --split-basename STRING    Split the file basename at this character [default: .]
    --prefix-separator STRING  Join the basename with the rest of the read name with this [default: _]
    -f, --frag-len INT         Total fragment length [default: 500]
    -o, --out-prefix STR       If specified, will run in paired end mode, and will output two files
                               with this prefix, one for each end. If not specified, will output
                               to STDOUT in single end mode.

    -v, --verbose              Verbose output
    -h, --help                 Show this help
  

Input

One or more FASTA or FASTQ files. By default will read from STDIN.

Parameters

Main parameters:

the desired sequence length with --length INT
the distance between the starting site of each read, with --step INT
the quality value of each base, with --quality INT (if you supply -1, the output will be in FASTA format)

If processing multiple files, it can be convenient to prepend the file basename with --basename. The basename will be split at the first ., but this can be changed with --split-basename STR/CHAR.

If a mix of forward and reverse reads is required, --add-rc will reverse complement every other read. If you want to test every read and its reverse complement, run the program without --add-rc and make a reverse complement of the whole dataset with seqfu rc.

Paired end mode

If you specify --out-prefix STR, the program will run in paired end mode, and will output two files with this prefix, one for each end. The fist step is to generate a "read" as long as the fragment (--frag-len) and then the first bases (--read-len) will be used as the first read, and the last bases as the second read. The second read is reverse complemented

Output

The generated sequences will be printed to the standard output (STDOUT). Each read has a progressive name generated like this:

file basename (if --basename is specified)
a string separator (if --basename is specified)
the chromosome name
a string separator
a progressive number

@k141_1_1 
GTCGGAGTCGTTTATCCGCAACATCCTGCTTGCACAGGAGTTTTATAAAAAGGAGTTCGGCATCAAGTCGAAGGATATGTTCCTGCCCGACTGCTTCGGA
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@k141_1_2 
TCGGGCAGGAACATATCCTTCGACTTGATGCCGAACTCCTTTTTATAAAACTCCTGTGCAAGCAGGATGTTGCGGATAAACGACTCCGACGACGGCATGT
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@k141_1_3 
AACGACCCGAACATGCCGTCGTCGGAGTCGTTTATCCGCAACATCCTGCTTGCACAGGAGTTTTATAAAAAGGAGTTCGGCATCAAGTCGAAGGATATGT
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@k141_1_4 
CGACTTGATGCCGAACTCCTTTTTATAAAACTCCTGTGCAAGCAGGATGTTGCGGATAAACGACTCCGACGACGGCATGTTCGGGTCGTTGGCCTCGAAC
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@k141_1_5 
CGGGGGCTTCGTTCGAGGCCAACGACCCGAACATGCCGTCGTCGGAGTCGTTTATCCGCAACATCCTGCTTGCACAGGAGTTTTATAAAAAGGAGTTCGG
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
  

Shotgun simulation

If you need to simulate a whole genome shotun, you will need alternative software like ART.