Andrea Telatin
Andrea Telatin Senior bioinformatician at the Quadram Institute Bioscience, Norwich.

QC Notes

On our first day we well cover the concepts behind taxonomic classification using Kraken2 (and Bracken), and see how to remove host reads and perform the quality checks (and filtering).

Our dataset

We have a set of samples from an ongoing study gut metagenomics by Aimee Parker (Quadram Institute) and co-workers. Part of her study has been described in a pre-print, we took some samples to practice our classification skills.

  • Sample_3
  • Sample_6
  • Sample_30
  • Sample_4
  • Sample_22
  • Sample_25
  • Sample_13
  • Sample_31

Initial QC

A well known tool for initial QC of FASTQ reads is FastQC.

1
2
mkdir fastqc-reports
fastqc --outdir fastqc-reports --threads 4 reads/*.fastq.gz

Quality filtering

fastp was designed to have “good defaults”, but it’s always a good idea to check what our options are to tweak the parameters.

1
2
3
4
5
6
7
8
for i in reads/*R1*gz;
do
  fastp  -w 16 -i $i -I ${i/R1/R1} \
   -o filt/$(basename $i) -O filt/$(basename  ${i/_R1/_R2}) \
   --detect_adapter_for_pe \
   --length_required 100 \
   --overrepresentation_analysis;
 done

Host removal

Reads filtering