Gathering the (virome) reads
The goal of this section is to get a set of reads to test our programs with
An example dataset can be gathered from the paper by Liang et al. “The stepwise assembly of the neonatal virome is modulated by breastfeeding” (2020).
The reads are available from the NCBI SRA under the accession number PRJNA524703.
From the study, we selected 10 samples (5 C-section delivery and 5 vaginal delivery), having the following IDs (and partial metadata):
Sample | Feeding_type | Formula_type | Delivery_type | Gender |
---|---|---|---|---|
SRR8653245 | Formula | cow-milk | C-Section | Female |
SRR8653218 | Formula | cow-milk | C-Section | Male |
SRR8653221 | Formula | cow-milk | C-Section | Male |
SRR8653248 | Formula | cow-milk | C-Section | Male |
SRR8653247 | Formula | soy-protein | C-Section | Female |
SRR8653084 | Formula | cow-milk | Spontaneous delivery | Female |
SRR8652914 | Formula | cow-milk | Spontaneous delivery | Female |
SRR8652969 | Formula | cow-milk | Spontaneous delivery | Male |
SRR8652861 | Formula | cow-milk | Spontaneous delivery | Female |
SRR8653090 | Formula | cow-milk | Spontaneous delivery | Female |
They are all stool samples from 4 months old infants.
These 10 samples were re-analysed in our MetaPhage pipeline paper, and we will call them the “full” dataset.
Downloading the reads using Docker
Create a file with a list of desired SRA codes, called list.txt
.
An example of the content can be:
1
2
3
4
5
6
SRR8653245
SRR8653218
SRR8653221
SRR8653084
SRR8652914
SRR8652969
For the EBAME workshop the reads are pre-downloaded
Then we can use a NextFlow pipeline to automatically download the reads (and the needed tools). If we use Miniconda as dependency manager, we can run the following command:
1
2
nextflow run telatin/getreads -r main \
--list list.txt -profile conda
If Docker is available, we can replace the -profile conda
with -profile docker
.
The programme
- EBAME-22 notes: EBAME-7 specific notes
- Gathering the reads: downloading and subsampling reads from public repositories (optional)
- Gathering the tools: we will use Miniconda to manage our dependencies
- Reads by reads profiling: using Phanta to quickly profile the bacterial and viral components of a microbial community
- De novo mining: assembly based approach, using VirSorter as an example miner
- Viral taxonomy: ab initio taxonomy profiling using vConTACT2
- MetaPhage overview: what is MetaPhage, a reads to report pipeline for viral metagenomics