Long Single End Reads
Novoalign version 3.0 brings support for reads up to 900bp long. Tis should be adequate for most current NGS platforms.
Long reads have a higher information content than short reads and so it’s possible to allow more mismatches and longer gaps. With short 32bp or 36bp reads aligned against a large genome such as human the number of mismatches was effectively limited to 3 as if you go over 3 most reads will align to multiple locations and have a low alignment quality. The extra information content in long reads allow us to go to more than 3 mismatches and to longer inserts and deletes.
Performance on 75bp reads against an 86Mbp Fungus genome
Test Data – 10,000 75bp reads
Chart 1. Run time as a function of the alignment threshold for several different versions of Novoalign. V2.03.12 was still running over 1000reads/sec at 7 mismatches.
Chart 2. The number of uniquely aligned reads as a function of the number of mismatches allowed for different Novoalign versions. At 7 mismatches we have over double the number of good alignments than when only allowing 2 mismatches.
Performance of 60bp Reads from Mus musculus
Test Data – 10,000 60bp reads
Chart 3. Run time as a function of the alignment threshold for several different versions of Novoalign. V2.03.12 allows one extra mismatch for the same performance as V2.3.01 or a three times performance improvement at the same threshold.
Chart 4. The number of uniquely aligned reads as a function of the number of mismatches allowed for different Novoalign versions.
Novoalign calculates a default threshold for reads such that it allows the highest number of mismatches (threshold) that would still allow a high quality alignment.
Typically this allows for alignments with 85% identity or better.