Andrea Telatin Follow Senior bioinformatician at the Quadram Institute Bioscience, Norwich.

A primer on Dadaist2

What is Dadaist2

Dadaist2 is a pipeline built around DADA2, optimized to streamline the workflow from the raw reads to R.

It’s both a monolithic pipeline (one command to perform the whole analysis) and a set of tools.

The easiest way is currently using mamba (conda is too slow for this):

mamba create -n dadaist -c conda-forge -c bioconda dadaist2

A dedicated tool, dadaist2-getdb is used to download reference databases.

A list of the available databases is obtainable via dadaist2-getdb --list.

For this example we will download Silva and store it in our home directory:

dadaist2-getdb -d decipher-silva-138 -o ~/refs/

To have a first feeling of the workflow we can run the pipeline with:

dadaist2 -i reads/ -o dadaist-output -t 12 -d ~/ref/SILVA_SSU_r138_2019.RData

where:

-i is the input directory with paired end reads, identified with R1 and R2 (can be changed)
-o is the output directory (will be created)
-t is the number of computing cores
-d is the reference database to use
-m link to the metadata file (if not supplied a blank one will be generated and used)

Notable files:

rep-seqs.fasta representative sequences (ASVs) in FASTA format
rep-seqs-tax.fasta representative sequences (ASVs) in FASTA format, with taxonomy labels as comments
feature-table.tsv table of raw counts (after cross-talk removal if specified)
taxonomy.tsv a text file with the taxonomy of each ASV (used to add the labels to the rep-seqs-tax.fasta)
copy of the metadata.tsv file

Subdirectories:

MicrobiomeAnalyst a set of files formatted to be used with the online (also available offline as R package) software MicrobiomeAnalyst.
Rhea a directory with files to be used with the Rhea pipeline, as well as some pre-calculated outputs (Normalization and Alpha diversity are done by default, as they don’t require knowledge about metadata categories)
R a directory with the PhyloSeq object

01 Feb 2021

« Denoising with Deblur A simple workflow with USEARCH »