Andrea Telatin
Andrea Telatin Senior bioinformatician at the Quadram Institute Bioscience, Norwich.

Kaiju

Kaiju can be installed, as usual, from Conda. Like Kraken2, we have access to pre-built databases, and for this tutorial we used the nr 2021-02-24 (52 GB).

1
2
kaiju -t $DB/kaiju/nodes.dmp -f $DB/kaiju/kaiju_db_nr.fmi  -o kaiju.tsv -z 32 -v \
 -i subsampled_R1.fq.gz-j subsampled_R2.fq.gz 

A typical line of Kaiju’s output looks like:

1
C       RL|S1|R549      55507   259     55507,  WP_072934244.1, NTMTAGLVASYIGRITAAWNAENIGTPPIELITRTWFNPNQTTRWAFLPG,
  • Classified / Unclassified
  • Read name
  • NCBI TaxID
  • Length/Score of the best match
  • Comma separated list of all the matches (TaxIDs)
  • Comma separated list of aminoacidic matches

Generate a report

Kaiju won’t generate a report on-the-fly, but ships a program to do one (that can be automatically imported by MultiQC).

1
2
3
4
5
6
7
# Phylum level
kaiju2table -t /data/db/kaiju/nodes.dmp -n /data/db/kaiju/names.dmp \
  -r phylum -o kaiju-phylum.tsv kaiju.tsv 

# Species level
kaiju2table -t /data/db/kaiju/nodes.dmp -n /data/db/kaiju/names.dmp \
  -r species -o kaiju-species.tsv kaiju.tsv 

The output is in TSV format:

1
2
3
4
file        percent     reads   taxon_id   taxon_name
kaiju.tsv   15.465896   308688  55507      Schwartzia succinivorans
kaiju.tsv   12.757531   254631  1004304    Hydrotalea sandarakina
kaiju.tsv   8.060514    160882  1736532    Massilia sp. Root418

Exporting to Krona

Kaiju also ships a small utility to prepare a tabular file to be imported in Krona. If we want the unclassified to be reported, we need to add the -u flag.

1
2
3
4
5
6
# Prepare the Krona input
kaiju2krona -t /data/db/kaiju/nodes.dmp -n /data/db/kaiju/names.dmp \
  -i kaiju.tsv -o kaiju.krona  -u

# Plot with Krona
ktImportText -o kaiju.out.html kaiju.krona

A complete script

Coherently with the rest of the workshop: