Getting the data
The samples are a subset of the ECAM study, which consists of monthly fecal samples collected from children at birth up to 24 months of life, as well as corresponding fecal samples collected from the mothers throughout the same period
1
2
|
wget -O dataset.zip "https://qiita.ucsd.edu/public_artifact_download/?artifact_id=81253"
unzip dataset.zip
|
It’s a good practice to use quotes around URLs as they can contain special characters,
like &
, that would be interpreted as instructions for the shell.
Understanding Qiime2 workflow
1
2
3
4
5
6
7
8
|
for fileR1 in per_sample_FASTQ/98546/*R1*;
do
echo $fileR1
fastp -i $fileR1 -I ${fileR1/_R1/_R2} -o reads-trimmed/$(basename $fileR1) \
-O reads-trimmed/$(basename ${fileR1/_R1/_R2}) -Q -f 17 -F 21 -w 8 \
-h per_sample_FASTQ/$(basename $fileR1|cut -f1 -d.).html \
-j per_sample_FASTQ/$(basename $fileR1|cut -f1 -d.).json;
done
|
Preparing a manifest
1
2
3
4
5
6
|
echo -e 'sample-id\tabsolute-filepath' > manifest.tsv
for i in per_sample_FASTQ/81253/*gz;
do
n=$(basename $i);
echo -e "${n%.fastq.gz}\t$PWD/$i" >> manifest.tsv;
done
|
1
2
3
4
5
|
qiime tools import \
--input-path manifest.tsv \
--type 'SampleData[SequencesWithQuality]' \
--input-format SingleEndFastqManifestPhred33V2 \
--output-path raw-reads.qza
|
1
2
3
|
qiime demux summarize \
--i-data raw-reads.qza \
--o-visualization raw-reads.qzv
|
Sequence quality control and feature table construction
1
2
3
4
|
qiime quality-filter q-score \
--i-demux raw-reads.qza \
--o-filtered-sequences demux-filtered.qza \
--o-filter-stats demux-filter-stats.qza
|
Denoising with deblur
1
2
3
4
5
6
7
8
|
qiime deblur denoise-16S \
--i-demultiplexed-seqs demux-filtered.qza \
--p-trim-length 150 \
--p-sample-stats \
--p-jobs-to-start 4 \
--o-stats deblur-stats.qza \
--o-representative-sequences rep-seqs-deblur.qza \
--o-table table-deblur.qza
|
1
2
3
4
5
6
7
8
9
10
11
12
13
|
qiime deblur visualize-stats \
--i-deblur-stats deblur-stats.qza \
--o-visualization deblur-stats.qzv
qiime feature-table tabulate-seqs \
--i-data rep-seqs-deblur.qza \
--o-visualization rep-seqs-deblur.qzv
qiime feature-table summarize \
--i-table table-deblur.qza \
--m-sample-metadata-file metadata.tsv \
--o-visualization table-deblur.qzv
|
1
2
|
wget -O "sepp-refs-gg-13-8.qza" \
"https://data.qiime2.org/2019.10/common/sepp-refs-gg-13-8.qza"
|
We will use the fragment-insertion tree-building method as described by
Janssen et al. (2018) using the sepp action of the q2-fragment-insertion
plugin,
which has been shown to outperform traditional alignment-based methods with
short 16S amplicon data. This method aligns our unknown short fragments to
full-length sequences in a known reference database and then places them onto
a fixed tree.
Note that this plugin has only been tested and benchmarked on 16S data against
the Greengenes reference database (McDonald et al., 2012),
so if you are using different data types you should consider
the alternative methods mentioned below.
1
2
3
4
5
6
|
qiime fragment-insertion sepp \
--i-representative-sequences rep-seqs-deblur.qza \
--i-reference-database sepp-refs-gg-13-8.qza \
--p-threads 48 \
--o-tree insertion-tree.qza \
--o-placements insertion-placements.qza
|
Once the insertion tree is created, you must filter your feature table so that
it only contains fragments that are in the insertion tree.
This step is needed because SEPP might reject the insertion of some fragments,
such as erroneous sequences or those that are too distantly related to the
reference alignment and phylogeny.
Features in your feature table without a
corresponding phylogeny will cause diversity computation to fail, because
branch lengths cannot be determined for sequences not in the tree.
1
2
3
4
5
|
qiime fragment-insertion filter-features \
--i-table table-deblur.qza \
--i-tree insertion-tree.qza \
--o-filtered-table filtered-table-deblur.qza \
--o-removed-table removed-table.qza
|
Alternative taxonomy assigment with Qiime2
1
2
|
wget -O "sepp-refs-gg-13-8.qza" \
"https://data.qiime2.org/2021.4/common/sepp-refs-gg-13-8.qza"
|
We will use the fragment-insertion tree-building method as described by
Janssen et al. (2018) using the sepp action of the q2-fragment-insertion
plugin,
which has been shown to outperform traditional alignment-based methods with
short 16S amplicon data. This method aligns our unknown short fragments to
full-length sequences in a known reference database and then places them onto
a fixed tree.
Note that this plugin has only been tested and benchmarked on 16S data against
the Greengenes reference database (McDonald et al., 2012),
so if you are using different data types you should consider
the alternative methods mentioned below.
1
2
3
4
5
6
|
qiime fragment-insertion sepp \
--i-representative-sequences repseqs.qza \
--i-reference-database sepp-refs-gg-13-8.qza \
--p-threads 1 \
--o-tree insertion-tree.qza \
--o-placements insertion-placements.qza
|
Once the insertion tree is created, you must filter your feature table so that
it only contains fragments that are in the insertion tree.
This step is needed because SEPP might reject the insertion of some fragments,
such as erroneous sequences or those that are too distantly related to the
reference alignment and phylogeny.
Features in your feature table without a
corresponding phylogeny will cause diversity computation to fail, because
branch lengths cannot be determined for sequences not in the tree.
1
2
3
4
5
|
qiime fragment-insertion filter-features \
--i-table table.qza \
--i-tree insertion-tree.qza \
--o-filtered-table filtered-table-deblur.qza \
--o-removed-table removed-table.qza
|
Taxonomic classification
1
2
3
|
wget https://github.com/BenKaehler/readytowear/raw/master/data/gg_13_8/515f-806r/human-stool.qza
wget https://github.com/BenKaehler/readytowear/raw/master/data/gg_13_8/515f-806r/ref-seqs.qza
wget https://github.com/BenKaehler/readytowear/raw/master/data/gg_13_8/515f-806r/ref-tax.qza
|
1
2
3
4
5
6
7
8
9
10
|
qiime feature-classifier fit-classifier-naive-bayes \
--i-reference-reads ref-seqs.qza \
--i-reference-taxonomy ref-tax.qza \
--i-class-weight human-stool.qza \
--o-classifier gg138_v4_human-stool_classifier.qza
qiime feature-classifier classify-sklearn \
--i-reads rep-seqs-deblur.qza \
--i-classifier gg138_v4_human-stool_classifier.qza \
--o-classification bespoke-taxonomy.qza
|
1
2
3
4
5
|
qiime metadata tabulate \
--m-input-file bespoke-taxonomy.qza \
--m-input-file rep-seqs-deblur.qza \
--o-visualization bespoke-taxonomy.qzv
|
Bibliography
- Estaki, M., Jiang, L., Bokulich, N. A., McDonald, D., González, A., Kosciolek, T., Martino, C., Zhu, Q., Birmingham, A., Vázquez-Baeza, Y., Dillon, M. R., Bolyen, E., Caporaso, J. G., & Knight, R. (2020). QIIME 2 enables comprehensive end-to-end analysis of diverse microbiome data and comparative studies with publicly available data. Current Protocols in Bioinformatics doi: 10.1002/cpbi.100