Andrea Telatin
Andrea Telatin Senior bioinformatician at the Quadram Institute Bioscience, Norwich.

Running Kraken and Bracken with a script

A more complex script to perform a for loop is available from GitHub as shown below:

Some notes on how to make a script

This script requires a parameter from the user (the input directory). Bash will store the first variable in $1. If the script is launched without any argument, we should check that the first variable is not set, we should warn the user:

1
2
3
4
5
# Let's check if the user supplied a variable name
if [  -z ${1+x} ]; then
  echo "This script requires a parameter: directory (with paired reads)"
  exit 1
fi

Introducing some checks

If the database is not where we think it is, Kraken will fail. We can check that hash.k2d is present in the directory. This is one of the three files composing a Kraken database.

Bracken has some extra files, that are read length dependent, so we should also check that Bracken can run.

1
2
3
4
5
6
7
8
9
10
# Let's check the Database is present
if [[ ! -e "$DB"/hash.k2d ]]; then
  echo "Kraken2 database not found at $DB"
  exit 1
fi

if [[ ! -e "$DB"/database${READLEN}mers.kmer_distrib ]]; then
  echo "Bracken database not found at $DB/database${READLEN}mers.kmer_distrib"
  exit 1
fi

A for loop to run Kraken and Bracken

As usual we prepare a for loop, manipulating the variables:

  • Replacing “_R1” with “_R2” to get the second pair
    • :warning: this should be changed if our files have different tags
  • Truncating the basename on the first underscore to get the Sample name
1
2
3
4
5
6
for FILE1 in $1/*_R1*;
do
  FILE2=${FILE1/_R1/_R2}
  BASE=$(basename $FILE1  |  cut -f1 -d_)
  echo "Sample: $BASE (input files: $FILE1 and $FILE2)"
done   

Safety net

If kraken fails, what is the point of running Kraken? This is why we have the set -euo pipefail instruction.