About SeqFu

Citing

Telatin A, Fariselli P, Birolo G. SeqFu: A Suite of Utilities for the Robust and Reproducible Manipulation of Sequence Files. Bioengineering 2021, 8, 59. https://doi.org/10.3390/bioengineering8050059

Why

There are several tools for the analysis of FASTQ/FASTA files. My personal choice has been (and is) SeqKit, a general purpose toolkit.

As many other bioinformaticians, I found myself coding small ad hoc scripts, for example:

  • A tool to extract the index from Illumina FASTQ files (taking the most common occurrence from the first 1000 reads)
  • A tool to extract contigs using a list from a predictor
  • Scripts to interleave/deinterleave FASTQ files

The problem was distributing a very small script to users lacking the library I was using (like the excellent pyfastx or our FASTX::Reader).

The possibility to distribute self-contained binaries was an option that was both boosting the performance of the program, and solving the dependency hell for minor applications.

This led to the start of the project.

How

The main parsing library is klib.nim by Heng Li (lh3/biofast), that provides good performances.

For some utilities the readfq library has been used (andreas-wilm/nimreadfq). This is based on the C version of Heng Li's parsed, wrapped in an object oriented module.

About the name

The name of the program was modeled after "ScriptFu", a set of macros built in the image manipulation GIMP.

Apparently, in the US this might sound offensive.

How to get away

So we after a consultation with Microsoft (and thanks to their previous experience with Minesweep), offer a tool to rename the program to SeqFlower.

SeqFlower

The renaming script is called flower.sh in the src directory.

Developer's details

Perl module

A Perl version of the parser is available both from MetaCPAN and from Bioconda:

conda install -c bioconda perl-fastx-reader

Templates

The repository contains some templates to quickly write FASTX parser-based applications (in Nim or in Perl).

:package: seqfu2/templates

Outreach

  • Slides
  • [Slides, PDF)(http://seqfu.it/slides.pdf)