NovoAlign Command Options

[table width =”100%” style =”table-bordered” responsive =”false”]
[row_column]Defaults and options vary across major versions of Novoalign so please refer to the manual for the version you are using. The reference manual is included in the download archive. Also, just entering command novoalign –help should display a summary of the available the options and all the defaults for your installed version. [/row_column]

Specifying the Reference Index

-d dbname Full pathname of indexed reference sequence created by novoindex

For more details refer to:

  1. Novoindex – Indexing the Reference Genome


Options for Read processing:

-f read1 read2 Filenames for the read sequences for Side 1 & 2. If only one file is specified then single end reads are processed. If two files are specified then the program will operate in paired end mode.
File formats allowed include Solexa PRB, Sanger FASTQ, FASTA, Solexa FASTQ, Illumina FASTQ, and Illumina qseq_txt.
-F format Specifies a read file format. For Fastq ‘_sequence.txt’ files from Illumina Pipeline 1.3 please specify -F ILMFQ.
Other values for the -F option are:
FA Fasta format read files without base qualities
SLXFQ Fastq format with Solexa style quality values. 10log10(P/(1-P)) + ‘@’
STDFQ Fastq format with Sanger coding of quality values. -10log10(Perr) + ‘!’
ILMFQ Fastq with Illumina coding of quality values. -10log10(Perr) + ‘@’
PRB Illumina _prb.txt format.
PRBnSEQ Illumina _prb.txt with _seq.txt files.
QSEQ Illumina *_qseq.txt format files from Bustard.
NovoalignCS can use ABI colour space fasta with qualities or colour space fastq files. Detection of file formats should be automatic however you can still specify the format using the -F option.
For csfastq, paired end reads should be in two separate files.
CSFASTA ABI Solid colour space fasta format with optional _QV.qual file.
CSFASTQ Colour space FASTQ format as used in BFAST.
-l 99 Sets the minimum number of good quality bases for a read. Alignment will not be attempted for reads with less bases. Default log4(Ng) + 5 where Ng is the length of the reference genome. Measure uses base qualities to determine information content of the read in bits an divides by 2 to get effective length in bases.
-n 99 Truncate reads to the specified length before alignment. Default is 950.
-s 9 Turns on read trimming for single end reads only. Reads that fail to align will be progressively shortened by specified amount (defaults to 2) until they either align or length reduces to less that the length set by the -l option, in which case the shortened read fails quality control checks. This option only applies to single end reads. Use at your own discretion.
-a [adapter1] [adapter2] Strips adapter sequences from 3′ end of reads before aligning. Default is not to strip adapters. Default adapter sequence is TCGTATGCCGTCTTCTGCTTG. This is usually used when sequencing small RNA. With paired end reads it can be used to strip adapter off fragments that are shorter than the read length. In this case you can specify two adapter sequences, the first for read 1 or each pair and the second for read 2 of each pair. Default adapter sequences for paired end reads are: Read1: AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG Read2: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA
-h 99 [“99”] Sets homo-polymer and optionally, the di-nucleotide filter scores. Any read that matches a homopolymer or dinucleotide sequence with score less than or equal to this threshold will skipped. Default 20/20.
-5 sequence Strips primer sequences from 5′ end of reads before aligning. Default is not to strip 5′ primers.
-# 999[K|M] Sets a limit on the number of reads that will be processed. This is useful for test runs.
-p 99,99 [0.9,99] Sets the thresholds for the polyclonal read filter. This filter is designed to remove reads that may come from polyclonal clusters or beads. Please refer to paper: Filtering error from SOLiD Output, Ariella Sasson and Todd P. Michael.
The first pair of values (n,t) sets the number of bases and threshold for the first 20 base pairs of each read. If there are n or more bases with phred quality below t then the read is flagged as polyclonal and will not be aligned. The alignment status is ‘QC’. The second pair applies to the entire read rather than just the first 20bp and is specified as fraction of bases in the read below the given quality. Setting -p -1 disables the filter
Default for Novoalign is off.
Default for NovoalignCS is -p 7,10 0.3,10.%%i.e 7 of first 20bp below Q10 or 30% of all bases below Q10 will be flagged as a low quality read.
Low quality reads may still be used in paired end mode if the mate is not low quality.
-H Enables hard clipping of low quality bases from the 3′ end of reads.
–Q2Off Disables treatment of bases with quality 2 as Illumina “Read Segment Quality Control Indicator”. Setting Q2Off will treat Q=2 bases as normal bases with a quality of 2. When off Q=2 bases are included in quality calibration. By default it is off in NovoalignCS.

For more details refer to:

  1. Adapter Stripping


Options for alignment scoring:

-t 99 Sets the maximum alignment score acceptable for the best alignment. Default Automatic. In automatic mode the threshold is set based on read length, genome size and other factors (see manual). For pairs the threshold applies to the fragment and includes both ends and the length penalty. A mismatch at a base with a high phred quality will score 30 points so a threshold of 90 would allow at least 3 mismatches.
-g 99 Sets the gap opening penalty. Default 40
-x 99 Sets the gap extend penalty. Default 6
-u 99 Penalty for unconverted CHG or CHH cytosine in bisulfite alignment mode. Default 0. For plants 6 may be a good value.
-b mode Sets Bisulphite alignment mode. Values for mode are: 4 – Aligns in 4 possible combinations of direction and index. (Default) 2 – Aligns reads in forward direction using CT index and in reverse complement using the GA index.
-N 999 Sets the number of bp of source DNA that are not represented in the reference sequences (index). This value is used in calculation of prior probability that the read originated in sequence that we cannot align to because it is not in the reference sequence set. By default we use the number of bases coded as N’s in the reference genome. Set to zero to disable inclusion of this in quality calculations.

For more details refer to:

1. Alignment Scoring

2. Bisulphite treated reads


Options for reporting:

-o format Specifies the report format. Native, Pairwise & SAM. Default is Native.
-R 99 Sets score difference between best and second best alignment for calling a repeat. Default 5.
-r strategy [limit] Sets strategy for reporting repeats. ‘None’, ‘Random’, ‘All’, ‘Exhaustive’, or a posterior probability limit. Default None.
An optional limit on the maximum number of alignments to report can also be set. Default is no limit.
When using the ‘Exhaustive’ option a limit and an alignment threshold must be set.
-Q 99 Sets lower limit on alignment quality for reporting. Default 0.
-e 999 Sets a limit on number of alignments for a single read. This limit applies to the number of alignments with score equal to that of the best alignment. Alignment process will stop when the limit is reached. Default 1000 in default report mode, off for other modes.
-q 9 Sets number of decimal places for quality score. Default zero.
-K [file] Collects mismatch statistics for quality calibration by position in the read and called base quality. Mismatch counts are written to the named file after all reads are processed. When used with -k option the mismatch counts include any read from the input quality calibration file.
–3Prime Reports 3′ alignment location using SAM tag Z3:i: or as an extra column in Native format

For more details refer to:

  1. Report Formats
  2. Reporting Multiple Alignments per Read
  3. Quality Calibration


Paired End Options:

-i [MP|PE|++|+-|-+] 99[,|-]99 Sets approximate fragment length and standard deviation. Default for Novoalign -i PE 250,30, NovoalignCS -i MP 2500,500
PE (or ‘+-‘) is for paired end mode which usually means short contiguous fragments where the two reads are on opposite strands like ——–>……<——–
MP is for mate pair reads from long fragments and jumping libraries. In Illumina (Novoalign) the read alignments are on opposite strands like <——–……——–>, for ABI SOLiD (NovoalignCS) the reads should align on the same strand like <——–……<——–
454 Paired end reads can also be aligned using Novoalign and specifying the orientation as ++
Using a as a delimiter in fragment length specifies fragment lengths as a range rather than as a distribution. e.g. -i PE 100-300 sets a range from 100 to 300bp for fragment lengths. If a range is used then no fragment length penalties are applied.
-i MP 99[,|-]99  99[,|-]99 For Illumina mate pairs in Novoalign it’s possible to set a second fragment length and standard deviation for the secondary fragmentation step. This allows alignment of the paired end reads that are left after Biotin enrichment and also enables alignment of reads where the circularisation junction is within one read of the pair.
-v 99 Sets the structural variation penalty for chimera fragments. Default 70
-v 99 99 Sets the structural variation penalty for chimera fragments. 1) Penalty for SVs within one sequence 2) Penalty for SVs across different sequences.
-v 99 99 99 regex Sets the structural variation penalty for chimera fragments. The three values are for: 1) Penalty for SVs within a group of sequences as defined by the regular expression. 2) Penalty for SVs within a single sequence 3) Penalty for SVs different sequence and group. regex” defines a regular expression applied to headers of indexed sequences. The regular expression should define one field that is used to define sequence groups.

For more details refer to:

  1. RNASeq analysis: mRNA and the Spliceosome
  2. Paired Read Modes


Single End Options:

-m Sets miRNA mode. In this mode each alignment to a read is given an additional score based on nearby alignment to the opposite strand of the read. Setting miRNA mode changes the default report mode to ‘All’.
-s 9 Turns on read trimming and sets trimming step size. Default step size is 2bp. Unaligned reads are trimmed until they align or fail the QC tests.

For more detailes refer to:

  1. Micro RNA


Base Quality Calibration Options:

-k [infile] Enables quality calibration. The quality calibration data are either read from the named file or accumulated from actual alignments. Quality calibration does not work with reads in prb format. Default is no Calibration.
-K [file] Collects mismatch statistics for quality calibration by position in the read and called base quality. Mismatch counts are written to the named file after all reads are processed. When used with -k option the mismatch counts include any read from the input quality calibration file.

For more details refer to:

1. Quality Calibration


Performance Options:

-c 99 Sets maximum number of threads to use. Defaults to one thread per CPU as reported by sysinfo(). This is usually the number of cores or twice the number of cores if hyper-threading is turned on.

Loading posts...