Novoalign is quite flexible when it comes to reporting reads that have multiple alignment loci.
The default option is to report no locus if the posterior alignment probability of the best alignment is less than 0.7.
Other options are:
- Randomly selecting a single alignment locus from the set loci. Behaviour is similar to MAQ & BWA.
- Reporting all alignments where alignment score is within a certain range of the best alignment.
- Reporting all alignments below a score threshold.
For above options that report multiple alignments you can also set a limit on the maximum number of alignment loci to report.
Command Line Options
-r method [limit] [-t threshold]
Sets the rules for handling of reads with multiple alignment locations. Values are:-
|None||NA||Optional||No alignments will be reported. The read will be reported as a type R with no alignment locations. A reporting “limit” should not be set.|
|Random||NA||Optional||A single alignment location is randomly chosen from amongst all the alignment results. A reporting “limit” should not be set.|
|All||Optional||Optional||All alignment locations are reported. The ‘All’ method can optionally specify a limit for the number of lines reported. e.g. ‘-r All 10’ will report at most 10 randomly selected alignments.|
|Exhaustive||Required||Required||Reports all alignments with a alignment score, P(R|Ai), less than or equal to the threshold plus the -R setting. The ‘Exhaustive’ method requires that a limit for the number of lines reported. e.g. ‘-r E 10’ will report at most 10 randomly selected alignments per read. This is to avoid situations where high copy number repeats result in reporting millions of alignments for a read. The alignment threshold (-t option) must be set when using the -r Exhaustive option.|
Specifies a score difference between first two alignments for reporting repeats. If the difference is less than this then the read is treated as aligning to a repeat and ‘-r method’ applies. If the score difference is greater than this then the higher scoring alignment is reported as a unique alignment. Default is 5 and corresponds approximately to first alignment having a probability of 0.75 and the second a probability of 0.25.
The following table shows approximate alignment probabilities for different settings in case when two alignments have been found. If the posterior alignment probability of the first alignment is below the P(1st) value then the alignment will be classed as a repeat.
|novoalign -r Ex 10 -t 180 -d …||Reports up to 10 alignments per read having alignment score less than 180. If more than 10 alignments found then they are chosen randomly from the lowest scoring alignments.|
|novoalign -r All -d …||Reports all alignments with a score within 5 points of the best alignment.|
|novoalign -r All -R 30 -d …||Reports all alignments with a score within 30 points of the best alignment.|
|novoalign -r All 20 -R 30 -d …||Reports no more than 20 alignments with a score within 30 points of the best alignment.|
|novoalign -r Random -R30 -d …||Reports a randomly selected alignment from all alignments with a score within 30 points of the best alignment. The probability of an alignment being selected is proportional to its posterior alignment probability.|
|novoalign -d …||Default reporting option is to report only unique alignments. A unique alignment has a posterior probability > 0.75|