Andrea Telatin
Andrea Telatin Senior bioinformatician at the Quadram Institute Bioscience, Norwich.

Gathering the (virome) reads

Gathering the (virome) reads

The goal of this section is to get a set of reads to test our programs with

An example dataset can be gathered from the paper by Liang et al. “The stepwise assembly of the neonatal virome is modulated by breastfeeding” (2020).

The reads are available from the NCBI SRA under the accession number PRJNA524703.

From the study, we selected 10 samples (5 C-section delivery and 5 vaginal delivery), having the following IDs (and partial metadata):

Sample Feeding_type Formula_type Delivery_type Gender
SRR8653245 Formula cow-milk C-Section Female
SRR8653218 Formula cow-milk C-Section Male
SRR8653221 Formula cow-milk C-Section Male
SRR8653248 Formula cow-milk C-Section Male
SRR8653247 Formula soy-protein C-Section Female
SRR8653084 Formula cow-milk Spontaneous delivery Female
SRR8652914 Formula cow-milk Spontaneous delivery Female
SRR8652969 Formula cow-milk Spontaneous delivery Male
SRR8652861 Formula cow-milk Spontaneous delivery Female
SRR8653090 Formula cow-milk Spontaneous delivery Female

They are all stool samples from 4 months old infants.

These 10 samples were re-analysed in our MetaPhage pipeline paper, and we will call them the “full” dataset.

Downloading the reads using Docker

Create a file with a list of desired SRA codes, called list.txt. An example of the content can be:

1
2
3
4
5
6
SRR8653245
SRR8653218
SRR8653221
SRR8653084
SRR8652914
SRR8652969

:warning: For the EBAME workshop the reads are pre-downloaded

Then we can use a NextFlow pipeline to automatically download the reads (and the needed tools). If we use Miniconda as dependency manager, we can run the following command:

1
2
nextflow run telatin/getreads -r main \
   --list list.txt -profile conda

:bulb: If Docker is available, we can replace the -profile conda with -profile docker.


The programme

  • :zero: EBAME-22 notes: EBAME-7 specific notes
  • :one: Gathering the reads: downloading and subsampling reads from public repositories (optional)
  • :two: Gathering the tools: we will use Miniconda to manage our dependencies
  • :three: Reads by reads profiling: using Phanta to quickly profile the bacterial and viral components of a microbial community
  • :four: De novo mining: assembly based approach, using VirSorter as an example miner
  • :five: Viral taxonomy: ab initio taxonomy profiling using vConTACT2
  • :six: MetaPhage overview: what is MetaPhage, a reads to report pipeline for viral metagenomics

:arrow_left: Back to the main page