Long Single Reads

Long Single End Reads

Novoalign version 3.0 brings support for reads up to 900bp long. Tis should be adequate for most current NGS platforms.

Long reads have a higher information content than short reads and so it’s possible to allow more mismatches and longer gaps. With short 32bp or 36bp reads aligned against a large genome such as human the number of mismatches was effectively limited to 3 as if you go over 3 most reads will align to multiple locations and have a low alignment quality. The extra information content in long reads allow us to go to more than 3 mismatches and to longer inserts and deletes.

Performance on 75bp reads against an 86Mbp Fungus genome

Test Data – 10,000 75bp reads

Chart 1. Run time as a function of the alignment threshold for several different versions of Novoalign. V2.03.12 was still running over 1000reads/sec at 7 mismatches.

Chart 2. The number of uniquely aligned reads as a function of the number of mismatches allowed for different Novoalign versions. At 7 mismatches we have over double the number of good alignments than when only allowing 2 mismatches.

Note. Novoalign uses base qualities when scoring alignments and, bases with Phred score >=30, will score 30 points for a mismatch. Bases with a Phred score of 10 will score 15 for a mismatch so 30 points of threshold could be 1 mismatch of a high quality base position or 2 mismatches of bases with Phred score of 10. So there maybe more mismatches than indicated by a simple threshold/30 calculation.

Performance of 60bp Reads from Mus musculus

Test Data – 10,000 60bp reads

Chart 3. Run time as a function of the alignment threshold for several different versions of Novoalign. V2.03.12 allows one extra mismatch for the same performance as V2.3.01 or a three times performance improvement at the same threshold.

Chart 4. The number of uniquely aligned reads as a function of the number of mismatches allowed for different Novoalign versions.

Default Threshold

Novoalign calculates a default threshold for reads such that it allows the highest number of mismatches (threshold) that would still allow a high quality alignment.

Typically this allows for alignments with 85% identity or better.

Note. All tests were run on a dual-core AMD Athlon with 8Gbytes of RAM

Documentation

Long Single Reads

Long Single End Reads

Performance on 75bp reads against an 86Mbp Fungus genome

Performance of 60bp Reads from Mus musculus

Default Threshold

LATEST NEWS

Novocraft and Basepair Inc. Announce Strategic Partnership to Deliver Advanced Genomic Pipelines in the Cloud

Novoalign V4.03.01

Novoalign V4.03.00 and Novosort V3.00.00

Contact Us