Using MultiQC to plot Kraken2 data
Installing MultiQC
We installed MultiQC via conda, but in case we need to install it again:
1
conda install -c bioconda multiqc
What is MultiQC
MultiQC is a fantastic tool that can aggregate outputs from different bioinformatics programs in a single report. MultiQC is capable of understanding the output of a hunder tools (including: fastp, cutadapt, prokka, kaiju, quast…)
to see the flexibility of MultiQC, you can see an example complete report.
We will combine our fastp and Kraken2 classifications to have a single report.
It works with a bit of magic: it scans all your files to check if some looks like a bioinformatic output. Sometimes the filename is important as well, for example
for fastp it will use the .json
file, that should be in the *.fastp.json
format.
1
2
3
4
5
cd ~/kraken-ws
for i in reports/*.json;
do
mv $i ${i/json/fastp.json}
done
We can first create a report just based on FASTP with:
1
multiqc -o fastp-report reports/
With -o we specify the output directory, then we need to tell where MultiQC should scan for known files. The output should be similar to this one. If you check your report, you will notice that MultiQC thinks our samples are called Samplename_1, because it’s taken from the first pair.
We want to remove the _1
to make it mergeable with Kraken:
1
sed -i 's/_1//' reports/*.json
Now we can combine fastp and Kraken2:
1
multiqc -o multiqc reports/ kraken/
The output should be like this one
Citation
Philip Ewels, Måns Magnusson, Sverker Lundin and Max Käller, MultiQC: Summarize analysis results for multiple tools and samples in a single report, Bioinformatics (2016)