seqfu metadata

Given one (or more) directories containing sequencing reads, will produce a metadata file extracting the ID from the filename and optionally adding the file paths or read counts.

Usage: metadata [options] [<dir>...]

Prepare mapping files from directory containing FASTQ files

Options:
  -1, --for-tag STR      String found in filename of forward reads [default: _R1]
  -2, --rev-tag STR      String found in filename of forward reads [default: _R2]
  -s, --split STR        Separator used in filename to identify the sample ID [default: _]
  --pos INT...           Which part of the filename is the Sample ID [default: 1]

  -f, --format TYPE      Output format: dadaist, irida, manifest, metaphage, qiime1, qiime2  [default: manifest]
  --pe                   Enforce paired-end reads (not supported)
  -p, --add-path         Add the reads absolute path as column 
  -c, --counts           Add the number of reads as a property column
  -t, --threads INT      Number of simultaneously opened files [default: 2]

  FORMAT SPECIFIC OPTIONS
  -P, --project INT      Project ID (only for irida)
  --meta-split STR       Separator in the SampleID to extract metadata, used in MetaPhage [default: _]
  --meta-part INT        Which part of the SampleID to extract metadata, used in MetaPhage [default: 1]
  --meta-default STR     Default value for metadata, used in MetaPhage [default: Cond]

  -v, --verbose          Verbose output
  -h, --help             Show this help

Output formats

  • manifest (used as import manifest for Qiime2 artifacts)
  • qiime1, qiime2 (forward-compatible qiime1 mapping file; a dedicated Qiime2 metadata file is under development)
  • dadaist (Dadaist2 compatible metadata)
  • lotus (Lotus mapping file - tested with Lotus1)
  • irida (IRIDA uploader sample sheet. Requires -P PROJECTID)
  • metaphage (MetaPhage, use --meta-split, --meta-part and --meta-default to customize a Treatment column)

Examples

Manifest

seqfu metadata ./MiSeq_SOP/

Will produce this output:

sample-id	forward-absolute-filepath	reverse-absolute-filepath
F3D0	/Users/telatin/MiSeq_SOP/F3D0_S188_L001_R1_001.fastq.gz	/Users/telatin/MiSeq_SOP/F3D0_S188_L001_R2_001.fastq.gz
F3D1	/Users/telatin/MiSeq_SOP/F3D1_S189_L001_R1_001.fastq.gz	/Users/telatin/MiSeq_SOP/F3D1_S189_L001_R2_001.fastq.gz
F3D141	/Users/telatin/MiSeq_SOP/F3D141_S207_L001_R1_001.fastq.gz	/Users/telatin/MiSeq_SOP/F3D141_S207_L001_R2_001.fastq.gz
F3D142	/Users/telatin/MiSeq_SOP/F3D142_S208_L001_R1_001.fastq.gz	/Users/telatin/MiSeq_SOP/F3D142_S208_L001_R2_001.fastq.gz
F3D143	/Users/telatin/MiSeq_SOP/F3D143_S209_L001_R1_001.fastq.gz	/Users/telatin/MiSeq_SOP/F3D143_S209_L001_R2_001.fastq.gz
F3D144	/Users/telatin/MiSeq_SOP/F3D144_S210_L001_R1_001.fastq.gz	/Users/telatin/MiSeq_SOP/F3D144_S210_L001_R2_001.fastq.gz
F3D145	/Users/telatin/MiSeq_SOP/F3D145_S211_L001_R1_001.fastq.gz	/Users/telatin/MiSeq_SOP/F3D145_S211_L001_R2_001.fastq.gz
F3D146	/Users/telatin/MiSeq_SOP/F3D146_S212_L001_R1_001.fastq.gz	/Users/telatin/MiSeq_SOP/F3D146_S212_L001_R2_001.fastq.gz
F3D147	/Users/telatin/MiSeq_SOP/F3D147_S213_L001_R1_001.fastq.gz	/Users/telatin/MiSeq_SOP/F3D147_S213_L001_R2_001.fastq.gz
F3D148	/Users/telatin/MiSeq_SOP/F3D148_S214_L001_R1_001.fastq.gz	/Users/telatin/MiSeq_SOP/F3D148_S214_L001_R2_001.fastq.gz
F3D149	/Users/telatin/MiSeq_SOP/F3D149_S215_L001_R1_001.fastq.gz	/Users/telatin/MiSeq_SOP/F3D149_S215_L001_R2_001.fastq.gz
F3D150	/Users/telatin/MiSeq_SOP/F3D150_S216_L001_R1_001.fastq.gz	/Users/telatin/MiSeq_SOP/F3D150_S216_L001_R2_001.fastq.gz
F3D2	/Users/telatin/MiSeq_SOP/F3D2_S190_L001_R1_001.fastq.gz	/Users/telatin/MiSeq_SOP/F3D2_S190_L001_R2_001.fastq.gz
F3D3	/Users/telatin/MiSeq_SOP/F3D3_S191_L001_R1_001.fastq.gz	/Users/telatin/MiSeq_SOP/F3D3_S191_L001_R2_001.fastq.gz
F3D5	/Users/telatin/MiSeq_SOP/F3D5_S193_L001_R1_001.fastq.gz	/Users/telatin/MiSeq_SOP/F3D5_S193_L001_R2_001.fastq.gz
F3D6	/Users/telatin/MiSeq_SOP/F3D6_S194_L001_R1_001.fastq.gz	/Users/telatin/MiSeq_SOP/F3D6_S194_L001_R2_001.fastq.gz
F3D7	/Users/telatin/MiSeq_SOP/F3D7_S195_L001_R1_001.fastq.gz	/Users/telatin/MiSeq_SOP/F3D7_S195_L001_R2_001.fastq.gz
F3D8	/Users/telatin/MiSeq_SOP/F3D8_S196_L001_R1_001.fastq.gz	/Users/telatin/MiSeq_SOP/F3D8_S196_L001_R2_001.fastq.gz
F3D9	/Users/telatin/MiSeq_SOP/F3D9_S197_L001_R1_001.fastq.gz	/Users/telatin/MiSeq_SOP/F3D9_S197_L001_R2_001.fastq.gz
Mock	/Users/telatin/MiSeq_SOP/Mock_S280_L001_R1_001.fastq.gz	/Users/telatin/MiSeq_SOP/Mock_S280_L001_R2_001.fastq.gz

Qiime mapping file

Note that -f qiime2 will add a second header line.

seqfu metadata MiSeq_SOP -f qiime1 --add-path --counts

Output:

#SampleID	Counts	Paths
F3D0	7793	F3D0_S188_L001_R1_001.fastq.gz,F3D0_S188_L001_R2_001.fastq.gz
F3D1	5869	F3D1_S189_L001_R1_001.fastq.gz,F3D1_S189_L001_R2_001.fastq.gz
F3D141	5958	F3D141_S207_L001_R1_001.fastq.gz,F3D141_S207_L001_R2_001.fastq.gz
F3D142	3183	F3D142_S208_L001_R1_001.fastq.gz,F3D142_S208_L001_R2_001.fastq.gz
F3D143	3178	F3D143_S209_L001_R1_001.fastq.gz,F3D143_S209_L001_R2_001.fastq.gz
F3D144	4827	F3D144_S210_L001_R1_001.fastq.gz,F3D144_S210_L001_R2_001.fastq.gz
F3D145	7377	F3D145_S211_L001_R1_001.fastq.gz,F3D145_S211_L001_R2_001.fastq.gz
F3D146	5021	F3D146_S212_L001_R1_001.fastq.gz,F3D146_S212_L001_R2_001.fastq.gz
F3D147	17070	F3D147_S213_L001_R1_001.fastq.gz,F3D147_S213_L001_R2_001.fastq.gz
F3D148	12405	F3D148_S214_L001_R1_001.fastq.gz,F3D148_S214_L001_R2_001.fastq.gz
F3D149	13083	F3D149_S215_L001_R1_001.fastq.gz,F3D149_S215_L001_R2_001.fastq.gz
F3D150	5509	F3D150_S216_L001_R1_001.fastq.gz,F3D150_S216_L001_R2_001.fastq.gz
F3D2	19620	F3D2_S190_L001_R1_001.fastq.gz,F3D2_S190_L001_R2_001.fastq.gz
F3D3	6758	F3D3_S191_L001_R1_001.fastq.gz,F3D3_S191_L001_R2_001.fastq.gz
F3D5	4448	F3D5_S193_L001_R1_001.fastq.gz,F3D5_S193_L001_R2_001.fastq.gz
F3D6	7989	F3D6_S194_L001_R1_001.fastq.gz,F3D6_S194_L001_R2_001.fastq.gz
F3D7	5129	F3D7_S195_L001_R1_001.fastq.gz,F3D7_S195_L001_R2_001.fastq.gz
F3D8	5294	F3D8_S196_L001_R1_001.fastq.gz,F3D8_S196_L001_R2_001.fastq.gz
F3D9	7070	F3D9_S197_L001_R1_001.fastq.gz,F3D9_S197_L001_R2_001.fastq.gz
Mock	4779	Mock_S280_L001_R1_001.fastq.gz,Mock_S280_L001_R2_001.fastq.gz

IRIDA uploader

seqfu metadata -f irida  -P 123 data/pe/

Output:

Sample_Name,Project_ID,File_Forward,File_Reverse
sample1,123,sample1_R1.fq.gz,sample1_R2.fq.gz
sample2,123,sample2_R1.fq.gz,sample2_R2.fq.gz

Screenshot

Screenshot of