Usage

Tools

BamToCov is inspired by the UNIX Phylosophy and the tools are designed for efficient computation of a very specific task. Integration of multiple samples and specific tasks can be achieved with scripts and we provide a set to demonstrate the process.

bamtocov will produce a coverage BED from a single BAM file, or a count matrix from a set of alignments and a target (in BED, GTF or GFF format). Used without a target, it is a drop-in replacement for covtobed, but discarding invalid alignments by default. When providing the target, it can produce coverage statistics for each region in the target, also with multiple BAM files.

bamtocounts will count the number of reads covering each target region, rather than the nucleotidic coverage

bamcountrefs is a shortcut to count the number of reads per chromosome, with filters on the read flags, length and quality

covtotarget (legacy) is an utility to create a count table from the output of the original covtobed program.

Quick start

bamtocov alignment.bam > coverage.bed

will produce a coverage BED file from the alignment file.

File formats

BED files

A BED file (.bed) is a tab-delimited text file that defines a feature track. In this context the magnitude refers to the nucleotide coverage of the interval.

The columns are chromosome name, start position (inclusive, zero-based), end position (non-inclusive, zero-based) and coverage. An example is:

seq1    0       9       0
seq1    9       109     5
seq1    109     189     0
seq1    189     200     2

Target statistics

:warning: this format is not final.

For each sample, 5 columns are printed:

  • bam_bases
  • bam_mean
  • bam_min
  • bam_max
  • bam_length
interval bam_bases bam_mean bam_min bam_max bam_length
target1_8X 699 3.495 1 6 200
target2_0X 0 0.0 0 0 50
target3_1X . . . . .
for_rev_10Xa 100 10.0 10 10 10
for_rev_10Xb 100 10.0 10 10 10
for_rev_10Xc . . . . .