About SeqFu
Citing
Telatin A, Fariselli P, Birolo G. SeqFu: A Suite of Utilities for the Robust and Reproducible Manipulation of Sequence Files. Bioengineering 2021, 8, 59. https://doi.org/10.3390/bioengineering8050059
Why
There are several tools for the analysis of FASTQ/FASTA files. My personal choice has been (and is) SeqKit, a general purpose toolkit.
As many other bioinformaticians, I found myself coding small ad hoc scripts, for example:
- A tool to extract the index from Illumina FASTQ files (taking the most common occurrence from the first 1000 reads)
- A tool to extract contigs using a list from a predictor
- Scripts to interleave/deinterleave FASTQ files
The problem was distributing a very small script to users lacking the library I was using (like the excellent pyfastx or our FASTX::Reader).
The possibility to distribute self-contained binaries was an option that was both boosting the performance of the program, and solving the dependency hell for minor applications.
This led to the start of the project.
How
The main parsing library is klib.nim
by Heng Li (lh3/biofast), that provides good performances.
For some utilities the readfq library has been used (andreas-wilm/nimreadfq). This is based on the C version of Heng Li's parsed, wrapped in an object oriented module.
About the name
The name of the program was modeled after "ScriptFu", a set of macros built in the image manipulation GIMP.
Apparently, in the US this might sound offensive.
So we after a consultation with Microsoft (and thanks to their previous experience with Minesweep), offer a tool to rename the program to SeqFlower.
The renaming script is called flower.sh
in the src directory.
Developer's details
Perl module
A Perl version of the parser is available both from MetaCPAN and from Bioconda:
conda install -c bioconda perl-fastx-reader
Templates
The repository contains some templates to quickly write FASTX parser-based applications (in Nim or in Perl).
Outreach
- Slides
- [Slides, PDF)(http://seqfu.it/slides.pdf)