Loading...
 

Support Help

Forums > Support> Quesionts about novoalign mismatch and output reporting

Quesionts about novoalign mismatch and output reporting

Hello,

Going to be using Novoalign for the first time. I searched the forum and manual but didn't find an appropriate answer to my questions:

1. Is there a way to set the number of mismatches allowed and reported, or does Novoalign report all mismatches up to a certain hard coded number?

2. Is there a way to get novoalign to output only reads that can align (i.e. don't report any NM reads), or do I have to just extract these from the output files post-alignment? It seems all reads are reported to output aligned or unaligned (like the .sam format).

I have over 700 million reads of 100 nt length, and several reference sequences of variable length, so I want to keep output file size minimal and avoid trial and error.

Thanks,
ken


Hi Ken,

We limit mismatches indirectly using the alignment score and the the -t option. In Novoalign the alignment penalty for mismatches are dependant on the base quality so a base with Q=10 will get penalised 15 for a mismatch. The penalty is ~q+5 with an upper limit of 30. Because of the upper limit any mismatch at base with quality >25 will get a penalty of 30 so if you want to allow 5 mismatches you could use a threshold of 150 with proviso that you may have more than 5 mismatches if some are at low quality bases.

We don't have an option to discard the NM records. We will normally do this by piping the reads to samtools view and filtering based on the flag value

For single end:

novoalign -d .. -o SAM 2>novo.log | samtools view -S -1 -F 4 - >novo.bam

For paired end we need to make sure we always output both reads of the pair, this will filter any pair where one read failed to align:

novoalign -d .. -o SAM 2>novo.log | samtools view -S -1 -F 12 - >novo.bam


Colin


Show posts:
 
Show HelpHelp