QC Notes
On our first day we well cover the concepts behind taxonomic classification using Kraken2 (and Bracken), and see how to remove host reads and perform the quality checks (and filtering).
Our dataset
We have a set of samples from an ongoing study gut metagenomics by Aimee Parker (Quadram Institute) and co-workers. Part of her study has been described in a pre-print, we took some samples to practice our classification skills.
- Sample_3
- Sample_6
- Sample_30
- Sample_4
- Sample_22
- Sample_25
- Sample_13
- Sample_31
Initial QC
A well known tool for initial QC of FASTQ reads is FastQC.
1
2
mkdir fastqc-reports
fastqc --outdir fastqc-reports --threads 4 reads/*.fastq.gz
Quality filtering
fastp was designed to have “good defaults”, but it’s always a good idea to check what our options are to tweak the parameters.
1
2
3
4
5
6
7
8
for i in reads/*R1*gz;
do
fastp -w 16 -i $i -I ${i/R1/R1} \
-o filt/$(basename $i) -O filt/$(basename ${i/_R1/_R2}) \
--detect_adapter_for_pe \
--length_required 100 \
--overrepresentation_analysis;
done
Host removal
Reads filtering