NovoUtil BGZF

Blocked Multi-threaded File Compression Utility

This utility uses a multithreaded blocked compression algorithm to compress files according the the BAM BGZF format. The files are also compatible with gzip and tabix.
This program can be used to reduce run time of most programs that produce BAM files by selecting no compression option and then piping the output to novoutil bgzf.


picard …. O=report.bam
picard …. O=/dev/stdout COMPRESSION_LEVEL=0 QUIET=true | novoutil bgzf >report.bam



novoutil bgzf [options]



Reads stdin and writes compressed file to stdout.
If the input is compressed (gzip, tabix or BAM) it will be expanded and recompressed based on the options. This allows a gzipped file to be recompressed to tabix format or a BAM file previously compresesd with a low compression level to be recompressed at a higher level.
Novoutil version and build date are written to stderr.



-t 9 specifies the number of compressor theads. Defaults to the number of cores on the server.
-b 99 Specifies tke block size in KBytes. Defaults to 64Kbytes as per bam specification. Block sizes greater than 64K may give better compression but are not BAM or Tabix compatible though they can still be decompressed with gunzip.
1-9 Sets the compression level from 1 to 9 as per gzip standards. Defaults to gzip default (6). Note to those benchmarking this utility, Picard defaults to level 5 compression and samtools to level 6.


Exit Status:

0 Normal completion
-1 An error occurred and message is written to stderr




Convert SAM to BAM and compress…

samtools view -uS report.sam | novoutil bgzf >report.bam
Sort and compress a bam file…
picard SortSam TMP_DIR=./tmp SO=coordinate I=report.bam O=/dev/stdout COMPRESSION_LEVEL=0 QUIET=true | novoutil bgzf -5 >report.bam
Results from sorting a 50Gbyte bam file…
Wall Time(Secs.) CPU Time(Secs.) Command
2298 2211 picard SortSam SO=coordinate I=Test.unsorted.bam  O=Test.bam
1997 3517 picard SortSam SO=coordinate I=Test.unsorted.bam  COMPRESSION_LEVEL=0 QUIET=true O=/dev/stdout ~124~ novoutil bgzf >Test.bam
1658 2828 picard SortSam SO=coordinate I=Test.unsorted.bam  COMPRESSION_LEVEL=0 QUIET=true O=/dev/stdout ~124~ novoutil bgzf -5 >Test.bam

A 28% saving in wall time is possible at the Picard default compression level.

Compress a fastq file…
novoutil bgzf -b 1024 reads.fastq.gz
Create a compressed tar file…
tar -c read_1.fq reads_2.fq | novoutil bgzf -b 1024 >reads.tar.gz
Create a bgzipped-tabix VCF file from a large database of genome-sorted variants:
           cat  large_number_of_variants.vcf | novoutil bgzf  >  large_number_of_variants.vcf.gz && tabix -p vcf   large_number_of_variants.vcf.gz

Loading posts...