A primer on Dadaist2
What is Dadaist2
Dadaist2 is a pipeline built around DADA2, optimized to streamline the workflow from the raw reads to R.
It’s both a monolithic pipeline (one command to perform the whole analysis) and a set of tools.
Installing Dadaist2
The easiest way is currently using mamba (conda is too slow for this):
1
mamba create -n dadaist -c conda-forge -c bioconda dadaist2
Getting one reference database
A dedicated tool, dadaist2-getdb is used to download reference databases.
A list of the available databases is obtainable via dadaist2-getdb --list
.
For this example we will download Silva and store it in our home directory:
1
dadaist2-getdb -d decipher-silva-138 -o ~/refs/
Running the pipeline with default parameters
To have a first feeling of the workflow we can run the pipeline with:
1
dadaist2 -i reads/ -o dadaist-output -t 12 -d ~/ref/SILVA_SSU_r138_2019.RData
where:
-i
is the input directory with paired end reads, identified with R1 and R2 (can be changed)-o
is the output directory (will be created)-t
is the number of computing cores-d
is the reference database to use-m
link to the metadata file (if not supplied a blank one will be generated and used)
The output directory
Notable files:
- rep-seqs.fasta representative sequences (ASVs) in FASTA format
- rep-seqs-tax.fasta representative sequences (ASVs) in FASTA format, with taxonomy labels as comments
- feature-table.tsv table of raw counts (after cross-talk removal if specified)
- taxonomy.tsv a text file with the taxonomy of each ASV (used to add the labels to the rep-seqs-tax.fasta)
- copy of the metadata.tsv file
Subdirectories:
- MicrobiomeAnalyst a set of files formatted to be used with the online (also available offline as R package) software MicrobiomeAnalyst.
- Rhea a directory with files to be used with the Rhea pipeline, as well as some pre-calculated outputs (Normalization and Alpha diversity are done by default, as they don’t require knowledge about metadata categories)
- R a directory with the PhyloSeq object