Blocked Multi-threaded File Compression Utility
This utility uses a multithreaded blocked compression algorithm to compress files according the the BAM BGZF format. The files are also compatible with gzip and tabix.
This program can be used to reduce run time of most programs that produce BAM files by selecting no compression option and then piping the output to novoutil bgzf.
Was:
Usage:
novoutil bgzf [options]
Description:
Reads stdin and writes compressed file to stdout.
If the input is compressed (gzip, tabix or BAM) it will be expanded and recompressed based on the options. This allows a gzipped file to be recompressed to tabix format or a BAM file previously compresesd with a low compression level to be recompressed at a higher level.
Novoutil version and build date are written to stderr.
Options:
-t 9 | specifies the number of compressor theads. Defaults to the number of cores on the server. | |
-b 99 | Specifies tke block size in KBytes. Defaults to 64Kbytes as per bam specification. Block sizes greater than 64K may give better compression but are not BAM or Tabix compatible though they can still be decompressed with gunzip. | |
–1-9 | Sets the compression level from 1 to 9 as per gzip standards. Defaults to gzip default (6). Note to those benchmarking this utility, Picard defaults to level 5 compression and samtools to level 6. |
Exit Status:
0 | Normal completion | |
-1 | An error occurred and message is written to stderr |
Examples:
Convert SAM to BAM and compress…
Results from sorting a 50Gbyte bam file…
Wall Time(Secs.) | CPU Time(Secs.) | Command |
2298 | 2211 | picard SortSam SO=coordinate I=Test.unsorted.bam O=Test.bam |
1997 | 3517 | picard SortSam SO=coordinate I=Test.unsorted.bam COMPRESSION_LEVEL=0 QUIET=true O=/dev/stdout ~124~ novoutil bgzf >Test.bam |
1658 | 2828 | picard SortSam SO=coordinate I=Test.unsorted.bam COMPRESSION_LEVEL=0 QUIET=true O=/dev/stdout ~124~ novoutil bgzf -5 >Test.bam |
A 28% saving in wall time is possible at the Picard default compression level.