fu-16Sregion
This utility is still in development, but feedback is welcome.
Align paired-end, or single-end, reads against a 16S reference sequence to determine the hypervariable regions sequences via local alignment.
Can read one FASTA/FASTQ file or data from standard input.
Usage: fu-16Sregion [options] [<FASTQ-File>]
Options:
-r --reference FILE FASTA file with a reference sequence, E. coli 16S by default
-j --regions FILE Regions names in JSON format, E. coli variable regions by default
-m --min-fraction FLOAT Minimum fraction of reads classified to report a region as detected [default: 0.5]
--min-score INT Minimum alignment score (approx. %id * readlen * matchScore) [default: 2000]
--max-reads INT Parse up to INT reads then quit [default: 500]
Smith-Waterman:
--score-match INT Score for a match [default: 10]
--score-mismatch INT Score for a mismatch [default: -5]
--score-gap INT Score for a gap [default: -10]
Other options:
--pool-size INT Number of sequences/pairs to process per thread [default: 25]
-v --verbose Verbose output
--debug Enable diagnostics
-h --help Show this help
Example output
A tabular report, per read, is provided with the following columns
- Read name
- Primary target region
- Alignment score
- Region of the reference sequence covered (coordinates)
The main output is given as two columns:
- regions detected in the same read (e.g. "V3,V4")
- fraction of reads with the combination
Example:
V3-V4 0.83
This is only reported when the number of reads / total reads is above --min-fraction
.
When activating the --verbose
switch, some more information will be printed to the standard error, but also a read by read output will be printed (read name, score, target region covered, detected regions and Pass/Fail):
M05517:39:000000000-CNNWR:1:1105:7840:22808 score:2955 alignment:340..805 regions:V3,V4 Pass
M05517:39:000000000-CNNWR:1:1105:24801:23102 score:3480 alignment:340..805 regions:V3,V4 Pass
M05517:39:000000000-CNNWR:1:1105:9773:23284 score:2835 alignment:340..805 regions:V3,V4 Pass
M05517:39:000000000-CNNWR:1:1105:9791:23288 score:2835 alignment:340..805 regions:V3,V4 Pass
M05517:39:000000000-CNNWR:1:1105:11642:23566 score:1650 alignment:NA regions: Fail
M05517:39:000000000-CNNWR:1:1105:13198:23694 score:1535 alignment:NA regions: Fail
M05517:39:000000000-CNNWR:1:1105:15837:23728 score:2240 alignment:340..641 regions:V3,V4 Pass
M05517:39:000000000-CNNWR:1:1105:14501:23779 score:2225 alignment:340..641 regions:V3,V4 Pass
Example use with paired-reads
To check paired-end reads, they can be processed independently or merged (if they overlap) before. SeqFu merge can be used for this purpose:
seqfu merge -1 read_R1.fq -2 read_R2.fq | fu-16region
Region names
By default the following regions are used, but a new set can be fed in JSON format (each region should contain the start
and end
values as show in the example)
{
"V1": {
"start": 68,
"end": 99
}, "V2": {
"start": 136,
"end": 242
}, "V3": {
"start": 338,
"end": 533
}, "V4": {
"start": 576,
"end": 682
}, "V5": {
"start": 821,
"end": 879
}, "V6": {
"start": 970,
"end": 1046
}, "V7": {
"start": 1117,
"end": 1294
}, "V8": {
"start": 1435,
"end": 1465
}
}