Andrea Telatin
Andrea Telatin Senior bioinformatician at the Quadram Institute Bioscience, Norwich.

for loops in Bash scripting

for loops in Bash scripting

After a small introduction to Bash scripting, we finally create a first bioinformatics script… introducing one of the loops we can use with the shell. A loop is a structure that allows to perform a set of commands a number of times.

The for loop, specifically, iterates the commands using a list of terms. Here and example of the syntax:

You can see the highlighted keywords: for, in, do, done. The loop works using a list of elements (in the example three names), a variable that each time will contain each item of the list, and finally a set of instructions (commands) to be executed, between do and done.

Indenting these commands is not required, but make the code clearer.

A real world example

When you have a list of SAM files and you want to convert all of them in (sorted) BAM format, you have a good example of when a for loop can come to use:

Line 5 assign to a variable the total number of .sam files in the current directory (see previous post). Line 8 declares the for loop, using $SamFile as variable, and *.sam instead of the list. This works because the shell will expand this writing to a list of file name¹.

In this script we see a new way of retrieving the content of a variable: ${Variable} instead of $Variable, that allows us to concatenate the content with other strings².

“Find and replace” inside a variable

The script has an annoying bug: if we have a file called alignment.sam, it will create a BAM file called alignment.sam.bam. This because we simply added “.bam” at the end of the filename.

Bash has a feature called variable substitution. It works with this syntax ${VariableName/WhatToFind/Replacement}:

1
2
variable='Hello World!'
echo ${variable/World/Universe}

To see this in action we have a small example:

Now try yourself!

Use the variable substitution as shown in the above example to fix the “all_sam_to_bam.sh” script, and have it creating nicer output file names! If you want to see the solution, have a look here.

Norwich, 2018–02–22


¹ This script has a problem here: if there are no files in the directory, the shell expansion will not work. We will fix this later!

² If we have a variable called Variable and its content is “NAME” and we want to print the string “NAME2”, how can we do this? If we type:

1
echo "$Variable2"

The shell will try to look for the content of a variable called “Variable2”, that does not exist. Here the correct version:

1
echo "${Variable}2"