for loops in Bash scripting
After a small introduction to Bash scripting, we finally create a first bioinformatics script… introducing one of the loops we can use with the shell. A loop is a structure that allows to perform a set of commands a number of times.
The for loop, specifically, iterates the commands using a list of terms. Here and example of the syntax:
You can see the highlighted keywords: for, in, do, done. The loop works using a list of elements (in the example three names), a variable that each time will contain each item of the list, and finally a set of instructions (commands) to be executed, between do and done.
Indenting these commands is not required, but make the code clearer.
A real world example
When you have a list of SAM files and you want to convert all of them in (sorted) BAM format, you have a good example of when a for loop can come to use:
Line 5 assign to a variable the total number of .sam files in the current directory (see previous post).
Line 8 declares the for loop, using $SamFile
as variable, and *.sam
instead of the list.
This works because the shell will expand this writing to a list of file name¹.
In this script we see a new way of retrieving the content of a variable: ${Variable}
instead of $Variable
,
that allows us to concatenate the content with other strings².
“Find and replace” inside a variable
The script has an annoying bug: if we have a file called alignment.sam, it will create a BAM file called alignment.sam.bam. This because we simply added “.bam” at the end of the filename.
Bash has a feature called variable substitution. It works with this syntax ${VariableName/WhatToFind/Replacement}
:
1
2
variable='Hello World!'
echo ${variable/World/Universe}
To see this in action we have a small example:
Now try yourself!
Use the variable substitution as shown in the above example to fix the “all_sam_to_bam.sh” script, and have it creating nicer output file names! If you want to see the solution, have a look here.
Norwich, 2018–02–22
¹ This script has a problem here: if there are no files in the directory, the shell expansion will not work. We will fix this later!
² If we have a variable called Variable and its content is “NAME” and we want to print the string “NAME2”, how can we do this? If we type:
1
echo "$Variable2"
The shell will try to look for the content of a variable called “Variable2”, that does not exist. Here the correct version:
1
echo "${Variable}2"