Warming up: welcome to our server
First steps
- Log in into the server
- You should find a directory called sequencesin your home.
- List the files in that directory (for example with ls -l ~/sequences)
- The files are compressed, we will try not to decompress them as it’s a good practice in metagenomics (to save lots of disk space and time)
- To count the sequences in FASTA files, we can use grep or dedicated tools.
    - Try with grepfirst (for example with zcat FILE | grep -c ‘>’, or cat FILE | gzip -d | grep -c \> if zcat is not available)
- We have SeqFu installed in our machine, so we can test seqfu stats --nice sequences/*gz.
 
- Try with 
A finished bacterial genome
- The file GCF_000027325.fasta.gzcontains a finished bacterial genome, as we saw from the stats it’s a single sequence.
- Print the name of the first sequence,for example with zcat ./sequences/GCF_000027325.fasta.gz | grep \>
- (Optional) Now print the first lines of the file, and copy a small portion (~200 nucleotides) in a new file in FASTA format, 
calling it ~/sequences/myco-bit.fa, and calling the sequence itselfSeq1.- You can use the nanoeditor - that is widely available in most servers - or an improved alternative calledmicro.
 
- You can use the 
Making symbolic links
- Sometimes it’s useful to make in our favourite locations a link to files existing elsewhere. Remember that the link will break if we move/remove the source file.
We have an Illumina paired end sample stored in /data/cami/simple_R1.fq.gz(and its corresponding R2). The command to make symbolic links isln -s SOURCE DESTINATION.
Try:
1
ln -s /data/cami/simple_* ~/sequences/
- With ls -l, note how symbolic links are rendered.
- SeqFu has tools resembling the GNU commands but for sequences, like seqfu head,seqfu tail,seqfu grep, plus other utilities. Tryseqfu head.
- Count the reads present in the files,for example with seqfu count sequences/simple*- This command can take some time, why don’t we save its output to a file instead: seqfu count sequences/simple_* > sequences/simple_counts.txt &(the final&will send the process to the background)
 
- This command can take some time, why don’t we save its output to a file instead: 
- We now want to make a subsample. To do this we can use seqfu head --skip 10 FILE > SUBSAMPLED. Let’s try a “for loop”:
1
2
3
4
5
6
7
mkdir subsampled
for FQFILE in sequences/simple_R*; 
do 
   echo $i; 
   seqfu head --skip 10 $FQFILE > subsampled/$(basename $FQFILE); 
done