Andrea Telatin
Andrea Telatin Senior bioinformatician at the Quadram Institute Bioscience, Norwich.

Extra Qiime2 notes

Getting the data

The samples are a subset of the ECAM study, which consists of monthly fecal samples collected from children at birth up to 24 months of life, as well as corresponding fecal samples collected from the mothers throughout the same period

1
2
wget -O dataset.zip "https://qiita.ucsd.edu/public_artifact_download/?artifact_id=81253"
unzip dataset.zip

:bulb: It’s a good practice to use quotes around URLs as they can contain special characters, like &, that would be interpreted as instructions for the shell.

Understanding Qiime2 workflow

1
2
3
4
5
6
7
8
 for fileR1 in per_sample_FASTQ/98546/*R1*;
 do
   echo $fileR1
   fastp -i $fileR1 -I ${fileR1/_R1/_R2} -o reads-trimmed/$(basename $fileR1) \
     -O reads-trimmed/$(basename  ${fileR1/_R1/_R2}) -Q -f 17 -F 21 -w 8 \
     -h per_sample_FASTQ/$(basename $fileR1|cut -f1 -d.).html \
     -j per_sample_FASTQ/$(basename $fileR1|cut -f1 -d.).json;
done

Preparing a manifest

1
2
3
4
5
6
echo -e 'sample-id\tabsolute-filepath' > manifest.tsv
for i in per_sample_FASTQ/81253/*gz;
do
  n=$(basename $i);
  echo -e "${n%.fastq.gz}\t$PWD/$i" >> manifest.tsv;
done
1
2
3
4
5
qiime tools import \
       --input-path manifest.tsv \
       --type 'SampleData[SequencesWithQuality]' \
       --input-format SingleEndFastqManifestPhred33V2 \
       --output-path raw-reads.qza
1
2
3
qiime demux summarize \
      --i-data raw-reads.qza \
      --o-visualization raw-reads.qzv

Sequence quality control and feature table construction

1
2
3
4
qiime quality-filter q-score \
       --i-demux raw-reads.qza \
       --o-filtered-sequences demux-filtered.qza \
       --o-filter-stats demux-filter-stats.qza

Denoising with deblur

1
2
3
4
5
6
7
8
qiime deblur denoise-16S \
       --i-demultiplexed-seqs demux-filtered.qza \
       --p-trim-length 150 \
       --p-sample-stats \
       --p-jobs-to-start 4 \
       --o-stats deblur-stats.qza \
       --o-representative-sequences rep-seqs-deblur.qza \
       --o-table table-deblur.qza
1
2
3
4
5
6
7
8
9
10
11
12
13
qiime deblur visualize-stats \
       --i-deblur-stats deblur-stats.qza \
       --o-visualization deblur-stats.qzv

qiime feature-table tabulate-seqs \
       --i-data rep-seqs-deblur.qza \
       --o-visualization rep-seqs-deblur.qzv

qiime feature-table summarize \
       --i-table table-deblur.qza \
       --m-sample-metadata-file metadata.tsv \
       --o-visualization table-deblur.qzv

1
2
wget -O "sepp-refs-gg-13-8.qza" \
    "https://data.qiime2.org/2019.10/common/sepp-refs-gg-13-8.qza"

We will use the fragment-insertion tree-building method as described by Janssen et al. (2018) using the sepp action of the q2-fragment-insertion plugin, which has been shown to outperform traditional alignment-based methods with short 16S amplicon data. This method aligns our unknown short fragments to full-length sequences in a known reference database and then places them onto a fixed tree. Note that this plugin has only been tested and benchmarked on 16S data against the Greengenes reference database (McDonald et al., 2012), so if you are using different data types you should consider the alternative methods mentioned below.

1
2
3
4
5
6
qiime fragment-insertion sepp \
        --i-representative-sequences rep-seqs-deblur.qza \
        --i-reference-database sepp-refs-gg-13-8.qza \
        --p-threads 48 \
        --o-tree insertion-tree.qza \
        --o-placements insertion-placements.qza

Once the insertion tree is created, you must filter your feature table so that it only contains fragments that are in the insertion tree. This step is needed because SEPP might reject the insertion of some fragments, such as erroneous sequences or those that are too distantly related to the reference alignment and phylogeny.

Features in your feature table without a corresponding phylogeny will cause diversity computation to fail, because branch lengths cannot be determined for sequences not in the tree.

1
2
3
4
5
qiime fragment-insertion filter-features \
       --i-table table-deblur.qza \
       --i-tree insertion-tree.qza \
       --o-filtered-table filtered-table-deblur.qza \
       --o-removed-table removed-table.qza

Alternative taxonomy assigment with Qiime2

1
2
wget -O "sepp-refs-gg-13-8.qza" \
    "https://data.qiime2.org/2021.4/common/sepp-refs-gg-13-8.qza"

We will use the fragment-insertion tree-building method as described by Janssen et al. (2018) using the sepp action of the q2-fragment-insertion plugin, which has been shown to outperform traditional alignment-based methods with short 16S amplicon data. This method aligns our unknown short fragments to full-length sequences in a known reference database and then places them onto a fixed tree. Note that this plugin has only been tested and benchmarked on 16S data against the Greengenes reference database (McDonald et al., 2012), so if you are using different data types you should consider the alternative methods mentioned below.

1
2
3
4
5
6
qiime fragment-insertion sepp \
        --i-representative-sequences repseqs.qza \
        --i-reference-database sepp-refs-gg-13-8.qza \
        --p-threads 1 \
        --o-tree insertion-tree.qza \
        --o-placements insertion-placements.qza

Once the insertion tree is created, you must filter your feature table so that it only contains fragments that are in the insertion tree. This step is needed because SEPP might reject the insertion of some fragments, such as erroneous sequences or those that are too distantly related to the reference alignment and phylogeny.

Features in your feature table without a corresponding phylogeny will cause diversity computation to fail, because branch lengths cannot be determined for sequences not in the tree.

1
2
3
4
5
qiime fragment-insertion filter-features \
       --i-table table.qza \
       --i-tree insertion-tree.qza \
       --o-filtered-table filtered-table-deblur.qza \
       --o-removed-table removed-table.qza

Taxonomic classification

1
2
3
wget https://github.com/BenKaehler/readytowear/raw/master/data/gg_13_8/515f-806r/human-stool.qza
wget https://github.com/BenKaehler/readytowear/raw/master/data/gg_13_8/515f-806r/ref-seqs.qza
wget https://github.com/BenKaehler/readytowear/raw/master/data/gg_13_8/515f-806r/ref-tax.qza
1
2
3
4
5
6
7
8
9
10
qiime feature-classifier fit-classifier-naive-bayes \
       --i-reference-reads ref-seqs.qza \
       --i-reference-taxonomy ref-tax.qza \
       --i-class-weight human-stool.qza \
       --o-classifier gg138_v4_human-stool_classifier.qza

qiime feature-classifier classify-sklearn \
       --i-reads rep-seqs-deblur.qza \
       --i-classifier gg138_v4_human-stool_classifier.qza \
       --o-classification bespoke-taxonomy.qza
1
2
3
4
5
qiime metadata tabulate \
       --m-input-file bespoke-taxonomy.qza \
       --m-input-file rep-seqs-deblur.qza \
       --o-visualization bespoke-taxonomy.qzv

Bibliography

  • Estaki, M., Jiang, L., Bokulich, N. A., McDonald, D., González, A., Kosciolek, T., Martino, C., Zhu, Q., Birmingham, A., Vázquez-Baeza, Y., Dillon, M. R., Bolyen, E., Caporaso, J. G., & Knight, R. (2020). QIIME 2 enables comprehensive end-to-end analysis of diverse microbiome data and comparative studies with publicly available data. Current Protocols in Bioinformatics doi: 10.1002/cpbi.100