A prescreening pipeline for GMH Metagenome studies
MultiQC reports summarise analysis results about host contamination and QC
- Contact E-mail
- Application Type
- GMH Metagenomics
- Project Type
- Whole Metagenome Sequencing
- Sequencing Setup
- 2x150bp
Report generated on 2023-01-06, 11:20 UTC
General Statistics
Showing 3/3 rows and 7/9 columns.Sample Name | % [Ruminococcus] gnavus | % Top 5 Species | % Unclassified | % Duplication | GC content | % PF | % Adapter |
bificoli | 0.0% | 100.0% | 0.0% | 0.0% | 54.0% | 98.6% | 2.1% |
phicov | 0.0% | 100.0% | 0.0% | 38.0% | 100.0% | 0.0% | |
phicovrumi | 99.1% | 99.1% | 0.9% | 0.0% | 42.9% | 97.7% | 3.5% |
Software Versions
Version of the programs used in the pipeline.
Program | Version |
cleanup/pipeline | 1.5 |
fastp | 0.23.2 |
seqfu | 1.14.0 |
kraken2 | 2.1.0 |
pigz | 2.6 |
Illumina indexes
Index data as found in the FASTQ files.
Sample | PASS | Index | Ratio | Instrument | Run | Flowcell |
phicov | PASS | GCTATCCT+AACAGGTG | 1.00 | A00709 | n_421 | H3TGLDSX5 |
bificoli | PASS | ACGACAGA+CGCAACTA | 1.00 | A00709 | n_421 | H3TGLDSX5 |
phicovrumi | PASS | GAGCTTGT+CCTCGAAT | 1.00 | A00709 | n_421 | H3TGLDSX5 |
Host removal
Summary host removal and read filtering step.
Sample | Raw sequences | Host (%) | Non-host reads | PF (%) | Cleaned reads | Contaminants reads | HG-Check |
bificoli | 23423 | 0.00% | 23423 | 98.63% | 23103 | N/A | No chr contam |
phicov | 118 | 15.25% | 100 | 84.75% | 100 | N/A | No chr contam |
phicovrumi | 11949 | 0.15% | 11931 | 97.54% | 11655 | N/A | No chr contam |
Kraken is a taxonomic classification tool that uses exact k-mer matches to find the lowest common ancestor (LCA) of a given sequence.DOI: 10.1186/gb-2014-15-3-r46.
Top taxa
The number of reads falling into the top 5 taxa across different ranks.
To make this plot, the percentage of each sample assigned to a given taxa is summed across all samples. The counts for these top five taxa are then plotted for each of the 9 different taxa ranks. The unclassified count is always shown across all taxa ranks.
The total number of reads is approximated by dividing the number of unclassified
reads by the percentage of
the library that they account for.
Note that this is only an approximation, and that kraken percentages don't always add to exactly 100%.
The category "Other" shows the difference between the above total read count and the sum of the read counts in the top 5 taxa shown + unclassified. This should cover all taxa not in the top 5, +/- any rounding errors.
Note that any taxon that does not exactly fit a taxon rank (eg. -
or G2
) is ignored.
fastp An ultra-fast all-in-one FASTQ preprocessor (QC, adapters, trimming, filtering, splitting...).DOI: 10.1093/bioinformatics/bty560.
Filtered Reads
Filtering statistics of sampled reads.
Insert Sizes
Insert size estimation of sampled reads.
Sequence Quality
Average sequencing quality over each base of all reads.
GC Content
Average GC content over each base of all reads.
N content
Average N content over each base of all reads.