Novoalign Command Line Options

Specifying the Reference Index

-d dbnameFull pathname of indexed reference sequence created by novoindex

For more details refer to:

  1. Novoindex - Indexing the Reference Genome

Options for Read processing:

-f read1 read2Filenames for the read sequences for Side 1 & 2. If only one file is specified then single end reads are processed. If two files are specified then the program will operate in paired end mode.
File formats allowed include Solexa PRB, Sanger FASTQ, FASTA, Solexa FASTQ, Illumina FASTQ, and Illumina qseq_txt.
-F formatSpecifies a read file format. For Fastq '_sequence.txt' files from Illumina Pipeline 1.3 please specify -F ILMFQ.
Other values for the -F option are:
FAFasta format read files without base qualities
SLXFQFastq format with Solexa style quality values. 10log10(P/(1-P)) + '@'
STDFQFastq format with Sanger coding of quality values. -10log10(Perr) + '!'
ILMFQFastq with Illumina coding of quality values. -10log10(Perr) + '@'
PRBIllumina _prb.txt format.
PRBnSEQIllumina _prb.txt with _seq.txt files.
QSEQIllumina *_qseq.txt format files from Bustard.
NovoalignCS can use ABI colour space fasta with qualities or colour space fastq files. Detection of file formats should be automatic however you can still specify the format using the -F option.
For csfastq, paired end reads should be in two separate files.
CSFASTAABI Solid colour space fasta format with optional _QV.qual file.
CSFASTQColour space FASTQ format as used in BFAST.
-l 99 Sets the minimum number of good quality bases for a read. Alignment will not be attempted for reads with less bases. Default log4(Ng) + 5 where Ng is the length of the reference genome. Measure uses base qualities to determine information content of the read in bits an ddivides by 2 to get effective length in bases.
-n 99Truncate reads to the specified length before alignment. Default is 80.
-s 9Turns on read trimming for single end reads only. Reads that fail to align will be progressively shortened by specified amount (defaults to 2) until they either align or length reduces to less that the length set by the -l option, in which case the shortened read fails quality control checks. This option only applies to single end reads. Use at your own discretion.
-a [adapter1] [adapter2]Strips adapter sequences from 3' end of reads before aligning. Default is not to strip adapters. Default adapter sequence is TCGTATGCCGTCTTCTGCTTG. This is usually used when sequencing small RNA. With paired end reads it can be used to strip adapter off fragments that are shorter than the read length. In this case you can specify two adapter sequences, the first for read 1 or each pair and the second for read 2 of each pair. Default adapter sequences for paired end reads are: Read1: AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG Read2: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA
-h 99 ["99"]Sets homo-polymer and optionally, the di-nucleotide filter scores. Any read that matches a homopolymer or dinucleotide sequence with score less than or equal to this threshold will skipped. Default 20/20.
-5 sequenceStrips primer sequences from 5' end of reads before aligning. Default is not to strip 5' primers.
-# 999[K|M] Sets a limit on the number of reads that will be processed. This is useful for test runs.
-p 99,99 [0.9,99]Sets the thresholds for the polyclonal read filter. This filter is designed to remove reads that may come from polyclonal clusters or beads. Please refer to paper: Filtering error from SOLiD Output, Ariella Sasson and Todd P. Michael.
The first pair of values (n,t) sets the number of bases and threshold for the first 20 base pairs of each read. If there are n or more bases with phred quality below t then the read is flagged as polyclonal and will not be aligned. The alignment status is 'QC'. The second pair applies to the entire read rather than just the first 20bp and is specified as fraction of bases in the read below the given quality. Setting -p -1 disables the filter
Default for Novoalign is off.
Default for NovoalignCS is -p 7,10 0.3,10.%%i.e 7 of first 20bp below Q10 or 30% of all bases below Q10 will be flagged as a low quality read.
Low quality reads may still be used in paired end mode if the mate is not low quality.
-HEnables hard clipping of low quality bases from the 3' end of reads.
--Q2OffDisables treatment of bases with quality 2 as Illumina "Read Segment Quality Control Indicator". Setting Q2Off will treat Q=2 bases as normal bases with a quality of 2. When off Q=2 bases are included in quality calibration. By default it is off in NovoalignCS.

For more details refer to:

  1. Adapter Stripping

Options for alignment scoring:

-t 99Sets the maximum alignment score acceptable for the best alignment. Default Automatic. In automatic mode the threshold is set based on read length, genome size and other factors (see manual). For pairs the threshold applies to the fragment and includes both ends and the length penalty. A mismatch at a base with a high phred quality will score 30 points so a threshold of 90 would allow at least 3 mismatches.
-g 99Sets the gap opening penalty. Default 40
-x 99Sets the gap extend penalty. Default 15
-u 99Penalty for unconverted CHG or CHH cytosine in bisulfite alignment mode. Default 0. For plants 6 may be a good value.
-b modeSets Bisulphite alignment mode. Values for mode are: 4 - Aligns in 4 possible combinations of direction and index. (Default) 2 - Aligns reads in forward direction using CT index and in reverse complement using the GA index.
-N 999Sets the number of bp of source DNA that are not represented in the reference sequences (index). This value is used in calculation of prior probability that the read originated in sequence that we cannot align to because it is not in the reference sequence set. By default we use the number of bases coded as N's in the reference genome. Set to zero to disable inclusion of this in quality calculations.

For more details refer to:
1. Alignment Scoring
2. Bisulphite treated reads

Options for reporting:

-o formatSpecifies the report format. Native, Pairwise & SAM. Default is Native.
-R 99Sets score difference between best and second best alignment for calling a repeat. Default 5.
-r strategy [limit]Sets strategy for reporting repeats. 'None', 'Random', 'All', 'Exhaustive', or a posterior probability limit. Default None.
An optional limit on the maximum number of alignments to report can also be set. Default is no limit.
When using the 'Exhaustive' option a limit and an alignment threshold must be set.
-Q 99Sets lower limit on alignment quality for reporting. Default 0.
-e 999Sets a limit on number of alignments for a single read. This limit applies to the number of alignments with score equal to that of the best alignment. Alignment process will stop when the limit is reached. Default 1000 in default report mode, off for other modes.
-q 9Sets number of decimal places for quality score. Default zero.
-K [file]Collects mismatch statistics for quality calibration by position in the read and called base quality. Mismatch counts are written to the named file after all reads are processed. When used with -k option the mismatch counts include any read from the input quality calibration file.
--3PrimeReports 3' alignment location using SAM tag Z3:i: or as an extra column in Native format

For more details refer to:

  1. Report Formats?
  2. Reporting Multiple Alignments per Read
  3. Quality Calibration

Paired End Options:

-i [MP|PE|++|+-|-+] 99[,|-]99 Sets approximate fragment length and standard deviation. Default for Novoalign -i PE 250,30, NovoalignCS -i MP 2500,500
PE (or '+-') is for paired end mode which usually means short contiguous fragments where the two reads are on opposite strands like -------->......<--------
MP is for mate pair reads from long fragments and jumping libraries. In Illumina (Novoalign) the read alignments are on opposite strands like <--------......-------->, for ABI SOLiD (NovoalignCS) the reads should align on the same strand like <--------......<--------
454 Paired end reads can also be aligned using Novoalign and specifying the orientation as ++
Using a - as a delimiter in fragment length specifies fragment lengths as a range rather than as a distribution. e.g. -i PE 100-300 sets a range from 100 to 300bp for fragment lengths. If a range is used then no fragment length penalties are applied.
-i MP 99[,|-]99  99[,|-]99 For Illumina mate pairs in Novoalign it's possible to set a second fragment length and standard deviation for the secondary fragmentation step. This allows alignment of the paired end reads that are left after Biotin enrichment and also enables alignment of reads where the circularisation junction is within one read of the pair.
-v 99Sets the structural variation penalty for chimera fragments. Default 70
-v 99 99Sets the structural variation penalty for chimera fragments. 1) Penalty for SVs within one sequence 2) Penalty for SVs across different sequences.
-v 99 99 99 regexSets the structural variation penalty for chimera fragments. The three values are for: 1) Penalty for SVs within a group of sequences as defined by the regular expression. 2) Penalty for SVs within a single sequence 3) Penalty for SVs different sequence and group. regex'' defines a regular expression applied to headers of indexed sequences. The regular expression should define one field that is used to define sequence groups.

For more details refer to:

  1. RNASeq analysis: mRNA and the Spliceosome
  2. Paired Read Modes

Single End Options:

-mSets miRNA mode. In this mode each alignment to a read is given an additional score based on nearby alignment to the opposite strand of the read. Setting miRNA mode changes the default report mode to 'All'.
-s 9Turns on read trimming and sets trimming step size. Default step size is 2bp. Unaligned reads are trimmed until they align or fail the QC tests.

For more detailes refer to:

  1. Micro RNA

Base Quality Calibration Options:

-k [infile]Enables quality calibration. The quality calibration data are either read from the named file or accumulated from actual alignments. Quality calibration does not work with reads in prb format. Default is no Calibration.
-K [file]Collects mismatch statistics for quality calibration by position in the read and called base quality. Mismatch counts are written to the named file after all reads are processed. When used with -k option the mismatch counts include any read from the input quality calibration file.

For more details refer to:
1. Quality Calibration

Performance Options:

-c 99Sets maximum number of threads to use. Defaults to one thread per CPU as reported by sysinfo(). This is usually the number of cores or twice the number of cores if hyper-threading is turned on.

Created by system. Last Modification: Tuesday 07 of May, 2013 15:42:29 MYT by colin.
Show HelpHelp

Show php error messages