Category Jekyll

Powerful things you can do with the Markdown editor

There are lots of powerful things you can do with the Markdown editor. If you’ve gotten pretty comfortable with writing in Markdown, then you may enjoy some more advanced tips...

Press and education

Even the press, the classroom, the platform, and the pulpit in many instances do not give us objective and unbiased truths. To save man from the morass of propaganda, in...

External Featured Image

Education must also train one for quick, resolute and effective thinking. To think incisively and to think for one’s self is very difficult.

Microbiology from the Command Line

This website collects some notes, tutorials and handsout used for microbial bioinformatics training events.

Category bash

IGV: A quick overview

The Integrative Genome Browser (IGV) from the Broad Institute can be considered the microscope for the Bioinformatician.

Connecting to a remote server

How to use SSH to connect to a remote server

Pipes

Pipes: how to combine commands to perform complex tasks

Redirection

Redirection: how to save to a file the output from shell commands

Text files and the command line

Here we introduce some new commands and concepts to work with the command line, in particular for text file parsing and manipulation.

Linux Command Line for Bioinformatics

A tutorial on the Linux command line (CLI) and its use in bioinformatics.

The first commands

A very first primer on the Linux command line for bioinformatics. Open your terminal and start typing the first commands! There are several pages dedicated to Bash in this website,...

Install Miniconda

The problem and its solution

Bash script getting parameters from the users

A basic introduction on passing parameters to shell scripts

Bash script safety net

We can instruct our scripts to stop when a problem is found with the set -euo pipefail directive.

If, then. Condition checking for Bash scripts

The idea to check if some condition is true before executing some commands is very simple and useful. The Bash syntax, for historical reasons, is probably the ugliest you’ll ever...

for loops in Bash scripting

After a small introduction to Bash scripting, we finally create a first bioinformatics script… introducing one of the loops we can use with the shell. A loop is a structure...

A small introduction to Bash scripting

After using the Linux terminal for a while, everyone wants to be able to write simple “scripts” to perform repetitive tasks. They looks like recipes, with the main difference of...

Category tutorial

IGV: A quick overview

The Integrative Genome Browser (IGV) from the Broad Institute can be considered the microscope for the Bioinformatician.

Connecting to a remote server

How to use SSH to connect to a remote server

Pipes

Pipes: how to combine commands to perform complex tasks

Redirection

Redirection: how to save to a file the output from shell commands

Text files and the command line

Here we introduce some new commands and concepts to work with the command line, in particular for text file parsing and manipulation.

Linux Command Line for Bioinformatics

A tutorial on the Linux command line (CLI) and its use in bioinformatics.

The first commands

A very first primer on the Linux command line for bioinformatics. Open your terminal and start typing the first commands! There are several pages dedicated to Bash in this website,...

MetaPhage, automated reads-to-report pipeline

MetaPhage is a complete reads-to-report pipeline for viral metagenomics. It is a Nextflow pipeline that can be run on a local machine or on a cluster. It is designed to...

Taxonomy placement of viral OTUs

vConTACT2 is a tool to infer the taxonomic relationship of viral sequences with a network-based approach, based on whole genome gene-sharing profiles.

De novo assembly of viromes

To identify putative viral sequences without using a classifier, we can use a de novo assembly approach. We first need to perform a standard metagenome assembly, then we need to...

Profiling viromes with Phanta

Phanta profiling requires these steps:

Gathering the (virome) tools

A short overview of tools used and how to get them

Gathering the (virome) reads

The goal of this section is to get a set of reads to test our programs with

EBAME-7 Viromics notes

This “virome primer” is designed to provide a short overview on viromics and the tools available to analyse viral metagenomes. The tutorial is agnostic to the environment, and can be...

Virome: a primer

“Virome refers to the assemblage of viruses that is often investigated and described by metagenomic sequencing of viral nucleic acids that are found associated with a particular ecosystem, organism or...

Nextflow: implementing a simple pipeline

First, let’s run the pipeline!

Nextflow: first steps

Installing Nextflow

Taxonomic profiling of whole metagenome shotgun

CLIMB Workshop on Metagenomics taxonomic profiling

16S Analysis with Qiime2

CLIMB Workshop on Metabarcoding (16S) analysis

QC Notes

On our first day we well cover the concepts behind taxonomic classification using Kraken2 (and Bracken), and see how to remove host reads and perform the quality checks (and filtering)....

Kaiju

Kaiju can be installed, as usual, from Conda. Like Kraken2, we have access to pre-built databases, and for this tutorial we used the nr 2021-02-24 (52 GB).

Join tabular files

When performing multiple operations on the same dataset, as some of you pointed out during the workshop, we often want to collate metadata. R is the ideal choice for doing...

Profile the community using kraken2

To determine the microbial composition in your samples, one method to get this information is taxonomic read profiling. Here you compare your reads to a database of interest. Kraken2 is...

Running Kraken and Bracken with a script

A more complex script to perform a for loop is available from GitHub as shown below:

Profile the community using kraken2

To determine the microbial composition in your samples, one method to get this information is taxonomic read profiling. Here you compare your reads to a database of interest. Kraken2 is...

Build a custom host database for Kraken2

In our workshop we proivided a kraken2 database for you to use. However, most of the times, you would need to create a database for your own host. For the...

A first experiment with Kraken2

Our goal here is to create a small and artificial set of sequences to be classified using Kraken2, to practice with its parameters and its output formats.

Warming up: welcome to our server

First steps Log in into the server You should find a directory called sequences in your home. List the files in that directory (for example with ls -l ~/sequences) The...

Remove host contamination from shotgun data

If you are working with host-associate microbiome, we are usually only interested in the microbial fraction and not the host DNA. In fact, host DNA can intefere with the read...

Taxonomic profiling of whole metagenome shotgun (day 3)

An introduction to the “Tidyverse”

Taxonomic profiling of whole metagenome shotgun (day 2)

Today we will use Braken to recalibrate our estimations, and a set of scripts to merge multiple samples in a single table, and see how to filter it. We will...

Taxonomic profiling of whole metagenome shotgun (day 1)

On our first day we well cover the concepts behind taxonomic classification using Kraken2 (and Bracken), and see how to remove host reads and perform the quality checks (and filtering)....

Extra Qiime2 notes

Getting the data

Metabarcoding workshop (day 3)

From the Command line to R

Metabarcoding workshop (day 2)

Yesterday we introduced the whole workflow to analyze 16S reads, and the powerful framework that Qiime2 introduced that:

A primer on Dadaist2

What is Dadaist2

Denoising with Deblur

Deblur is an alternative method to produce a set of denoised sequences. While DADA2 can natively support paired-end reads, Deblur can only manage single-end reads.

Install Miniconda

The problem and its solution

Bash script getting parameters from the users

A basic introduction on passing parameters to shell scripts

Bash script safety net

We can instruct our scripts to stop when a problem is found with the set -euo pipefail directive.

If, then. Condition checking for Bash scripts

The idea to check if some condition is true before executing some commands is very simple and useful. The Bash syntax, for historical reasons, is probably the ugliest you’ll ever...

for loops in Bash scripting

After a small introduction to Bash scripting, we finally create a first bioinformatics script… introducing one of the loops we can use with the shell. A loop is a structure...

A small introduction to Bash scripting

After using the Linux terminal for a while, everyone wants to be able to write simple “scripts” to perform repetitive tasks. They looks like recipes, with the main difference of...

Category singularity

Singularity for bioinformatics

A basic introduction to Singularity containers

Category metabarcoding

16S Analysis with Qiime2

CLIMB Workshop on Metabarcoding (16S) analysis

Extra Qiime2 notes

Getting the data

Metabarcoding workshop (day 3)

From the Command line to R

Metabarcoding workshop (day 2)

Yesterday we introduced the whole workflow to analyze 16S reads, and the powerful framework that Qiime2 introduced that:

A primer on Dadaist2

What is Dadaist2

Denoising with Deblur

Deblur is an alternative method to produce a set of denoised sequences. While DADA2 can natively support paired-end reads, Deblur can only manage single-end reads.

Category 16S

Extra Qiime2 notes

Getting the data

Metabarcoding workshop (day 3)

From the Command line to R

Metabarcoding workshop (day 2)

Yesterday we introduced the whole workflow to analyze 16S reads, and the powerful framework that Qiime2 introduced that:

A primer on Dadaist2

What is Dadaist2

Denoising with Deblur

Deblur is an alternative method to produce a set of denoised sequences. While DADA2 can natively support paired-end reads, Deblur can only manage single-end reads.

Category metagenomics

Taxonomic profiling of whole metagenome shotgun

CLIMB Workshop on Metagenomics taxonomic profiling

QC Notes

On our first day we well cover the concepts behind taxonomic classification using Kraken2 (and Bracken), and see how to remove host reads and perform the quality checks (and filtering)....

Kaiju

Kaiju can be installed, as usual, from Conda. Like Kraken2, we have access to pre-built databases, and for this tutorial we used the nr 2021-02-24 (52 GB).

Join tabular files

When performing multiple operations on the same dataset, as some of you pointed out during the workshop, we often want to collate metadata. R is the ideal choice for doing...

Profile the community using kraken2

To determine the microbial composition in your samples, one method to get this information is taxonomic read profiling. Here you compare your reads to a database of interest. Kraken2 is...

Running Kraken and Bracken with a script

A more complex script to perform a for loop is available from GitHub as shown below:

Profile the community using kraken2

To determine the microbial composition in your samples, one method to get this information is taxonomic read profiling. Here you compare your reads to a database of interest. Kraken2 is...

Build a custom host database for Kraken2

In our workshop we proivided a kraken2 database for you to use. However, most of the times, you would need to create a database for your own host. For the...

A first experiment with Kraken2

Our goal here is to create a small and artificial set of sequences to be classified using Kraken2, to practice with its parameters and its output formats.

Warming up: welcome to our server

First steps Log in into the server You should find a directory called sequences in your home. List the files in that directory (for example with ls -l ~/sequences) The...

Remove host contamination from shotgun data

If you are working with host-associate microbiome, we are usually only interested in the microbial fraction and not the host DNA. In fact, host DNA can intefere with the read...

Taxonomic profiling of whole metagenome shotgun (day 3)

An introduction to the “Tidyverse”

Taxonomic profiling of whole metagenome shotgun (day 2)

Today we will use Braken to recalibrate our estimations, and a set of scripts to merge multiple samples in a single table, and see how to filter it. We will...

Taxonomic profiling of whole metagenome shotgun (day 1)

On our first day we well cover the concepts behind taxonomic classification using Kraken2 (and Bracken), and see how to remove host reads and perform the quality checks (and filtering)....

Category nextflow

Nextflow: implementing a simple pipeline

First, let’s run the pipeline!

Nextflow: first steps

Installing Nextflow

Metaphage

What is MetaPhage

Category docker

Category conda

Category virome

MetaPhage, automated reads-to-report pipeline

MetaPhage is a complete reads-to-report pipeline for viral metagenomics. It is a Nextflow pipeline that can be run on a local machine or on a cluster. It is designed to...

Taxonomy placement of viral OTUs

vConTACT2 is a tool to infer the taxonomic relationship of viral sequences with a network-based approach, based on whole genome gene-sharing profiles.

De novo assembly of viromes

To identify putative viral sequences without using a classifier, we can use a de novo assembly approach. We first need to perform a standard metagenome assembly, then we need to...

Profiling viromes with Phanta

Phanta profiling requires these steps:

Gathering the (virome) tools

A short overview of tools used and how to get them

Gathering the (virome) reads

The goal of this section is to get a set of reads to test our programs with

EBAME-7 Viromics notes

This “virome primer” is designed to provide a short overview on viromics and the tools available to analyse viral metagenomes. The tutorial is agnostic to the environment, and can be...

Virome: a primer

“Virome refers to the assemblage of viruses that is often investigated and described by metagenomic sequencing of viral nucleic acids that are found associated with a particular ecosystem, organism or...

Category ebame

EBAME-7 Viromics notes

This “virome primer” is designed to provide a short overview on viromics and the tools available to analyse viral metagenomes. The tutorial is agnostic to the environment, and can be...

Category tools

Taxonomy placement of viral OTUs

vConTACT2 is a tool to infer the taxonomic relationship of viral sequences with a network-based approach, based on whole genome gene-sharing profiles.

De novo assembly of viromes

To identify putative viral sequences without using a classifier, we can use a de novo assembly approach. We first need to perform a standard metagenome assembly, then we need to...

Profiling viromes with Phanta

Phanta profiling requires these steps:

Gathering the (virome) tools

A short overview of tools used and how to get them

Category metaphage

MetaPhage, automated reads-to-report pipeline

MetaPhage is a complete reads-to-report pipeline for viral metagenomics. It is a Nextflow pipeline that can be run on a local machine or on a cluster. It is designed to...

Category formats

FASTA files

Here we introduce the FASTA format, to store DNA and protein sequences.

SAM format

Here we introduce the SAM format, to store the result of a mapping of one or more DNA sequences against a reference genome

Bioinformatics file formats

Here we introduce some of the most used bioinformatics formats.

Category python

A primer on Python data structures (2)

More data structures in Python using standard libraries.

Python data structures (1)

Built-in data structures in Python: lists, tuples, sets, and dictionaries.

Word distribution, an example project for Python beginners

Counting the number of occurrences of each word in a text is a common task in computational linguistics. It is also a good example of how to use Python to...

Bash learning resources

Some YouTube videos and tutorials to learn how to use the command line.

Category distribution

Word distribution, an example project for Python beginners

Counting the number of occurrences of each word in a text is a common task in computational linguistics. It is also a good example of how to use Python to...

Bash learning resources

Some YouTube videos and tutorials to learn how to use the command line.

Category project

Word distribution, an example project for Python beginners

Counting the number of occurrences of each word in a text is a common task in computational linguistics. It is also a good example of how to use Python to...

Bash learning resources

Some YouTube videos and tutorials to learn how to use the command line.

Category climb

A very short bioinformatics tutorial using CLIMB notebooks

This walkthrough can be completely performed on any Linux terminal where Miniconda/Mamba have been installed, including a web based terminal offered by CLIMB BIG DATA new notebooks.

Mounting CLIMB S3 buckets in a Linux Virtual Machine (VM)

The MRC CLIMB BIG DATA project offers modern storage in the form of S3 buckets, an object storage introduced by Amazon Web Services (AWS).

Category data-structures

A primer on Python data structures (2)

More data structures in Python using standard libraries.

Python data structures (1)

Built-in data structures in Python: lists, tuples, sets, and dictionaries.