full
border
#666666
http://www.novocraft.com/wp-content/themes/smartbox-installable/
http://www.novocraft.com/
#0397c9
style1

Long Single Reads

Long Single End Reads

Novoalign version 3.0 brings support for reads up to 900bp long. Tis should be adequate for most current NGS platforms.

Long reads have a higher information content than short reads and so it’s possible to allow more mismatches and longer gaps. With short 32bp or 36bp reads aligned against a large genome such as human the number of mismatches was effectively limited to 3 as if you go over 3 most reads will align to multiple locations and have a low alignment quality. The extra information content in long reads allow us to go to more than 3 mismatches and to longer inserts and deletes.

Performance on 75bp reads against an 86Mbp Fungus genome

Test Data – 10,000 75bp reads

pgt75bpruntime

Chart 1. Run time as a function of the alignment threshold for several different versions of Novoalign. V2.03.12 was still running over 1000reads/sec at 7 mismatches.

pgt75bpUnique

Chart 2. The number of uniquely aligned reads as a function of the number of mismatches allowed for different Novoalign versions. At 7 mismatches we have over double the number of good alignments than when only allowing 2 mismatches.

Note. Novoalign uses base qualities when scoring alignments and, bases with Phred score >=30, will score 30 points for a mismatch. Bases with a Phred score of 10 will score 15 for a mismatch so 30 points of threshold could be 1 mismatch of a high quality base position or 2 mismatches of bases with Phred score of 10. So there maybe more mismatches than indicated by a simple threshold/30 calculation.

Performance of 60bp Reads from Mus musculus

Test Data – 10,000 60bp reads

mm60bpruntime

Chart 3. Run time as a function of the alignment threshold for several different versions of Novoalign. V2.03.12 allows one extra mismatch for the same performance as V2.3.01 or a three times performance improvement at the same threshold.

mm60bpUnique

Chart 4. The number of uniquely aligned reads as a function of the number of mismatches allowed for different Novoalign versions.

Default Threshold

Novoalign calculates a default threshold for reads such that it allows the highest number of mismatches (threshold) that would still allow a high quality alignment.

Typically this allows for alignments with 85% identity or better.

Note. All tests were run on a dual-core AMD Athlon with 8Gbytes of RAM

default
Loading posts...
link_magnifier
#6E787E
on
fadeInDown
loading
#6E787E
on