Andrea Telatin
Andrea Telatin Senior bioinformatician at the Quadram Institute Bioscience, Norwich.

Using MultiQC to plot Kraken2 data

Installing MultiQC

We installed MultiQC via conda, but in case we need to install it again:

1
conda install -c bioconda multiqc

What is MultiQC

MultiQC is a fantastic tool that can aggregate outputs from different bioinformatics programs in a single report. MultiQC is capable of understanding the output of a hunder tools (including: fastp, cutadapt, prokka, kaiju, quast…)

:mag: to see the flexibility of MultiQC, you can see an example complete report.

We will combine our fastp and Kraken2 classifications to have a single report.

It works with a bit of magic: it scans all your files to check if some looks like a bioinformatic output. Sometimes the filename is important as well, for example for fastp it will use the .json file, that should be in the *.fastp.json format.

1
2
3
4
5
cd ~/kraken-ws
for i in reports/*.json;
do
  mv $i ${i/json/fastp.json}
done

We can first create a report just based on FASTP with:

1
multiqc -o fastp-report reports/

With -o we specify the output directory, then we need to tell where MultiQC should scan for known files. The output should be similar to this one. If you check your report, you will notice that MultiQC thinks our samples are called Samplename_1, because it’s taken from the first pair.

We want to remove the _1 to make it mergeable with Kraken:

1
sed -i 's/_1//' reports/*.json

Now we can combine fastp and Kraken2:

1
multiqc -o multiqc reports/ kraken/

:mag: The output should be like this one

Citation

Philip Ewels, Måns Magnusson, Sverker Lundin and Max Käller, MultiQC: Summarize analysis results for multiple tools and samples in a single report, Bioinformatics (2016)