This release contains two significant changes that affect operation of Novoalign
- The default gap extend penalty has been reduced to 6. We feel this is a more appropriate default for use with Human Whole Genome and Exome alignments. To get previous default use option -x15
- We found a problem with NovoalignCS and it's calculation of base qualities for the aligned sequence. Basically the position of colour errors was off by one base and this resulted in incorrect base qualities near the colour error. The position of alignments is not affected but the fix may have a small affect on SNP calls and should improve accuracy. The colour error problem also impacted quality calibration.
This release also introduces the ability to use unaligned BAM files for the reads.
- FIX(V2.07.14): When using unaligned BAM input and SAM output the first character was being erased from read headers.
- Limit alignment quality to a maximum value of 70 except for -r Exhaustive.
- Added an option to lock the reference genome index in RAM. Use option --LockIdx. This only applies when using a memory mapped index.
- For Paired end reads change the default insert size option from -i 250,30 to -i 250,50
- Fix Picard validation error where mate alignment locations didn't match the mate.
- To avoid Picard validation errors adapter trimming will always leave at least 1bp in a read.
- FIX: In paired end mode with all fragments exactly the same length (usually simulated data) it was possible that floating point errors cause the square root of a -ve number in the calculation of the standard deviation of the fragment lengths.
- Fixed issue with iterative alignment that was doing unnecessary iterations for one read when the other read of the pair failed to align with maximum possible alignment score or had been flagged as a low quality read. The problem was evident when aligning pairs against incomplete genomes with many contigs. This change has also improved the accuracy of alignment quality.
- Change the default gap extend penalty to 6. Set -x15 to get previous behaviour. Using the new default of -x6 may be slower than using -x 15.
- Added support for unaligned BAM input. Use option -F BAMSE or -F BAMPE. For paired end, mates are expected to be adjacent. If there is an @RG record on the BAM file it will be used for the report. Any @RG on -o SAM option overrides the @RG in the BAM file. Not supported for Colour Space reads.
- Fix: In SAM report format the NM tag was counting a mismatch between a '.' in the read and a 'N' in the reference. This could result in Picard ValidateSamFile errors.
- Fix: Novoalign uses memory mapping to load the index file and used MAP_POPULATE option to force loading of the index into RAM. Some older Linux Kernals do not support MAP_POPULATE with result that the index pages were not loaded at startup. This could cause slow operation while index pages gradually faulted into memory. We now touch each page at startup to force loading of the index.
- Improved run time performance of Bi-Seq strand specific alignments when run with option -b2.
- FIX: Changes in V2.07.14 for unaligned BAM support caused errors when passing @RG record to slave processes.
- Fix: Seg Fault could occur if using quality calibration (-k or -K option). The error was using a wrong index when counting counting mismatches to colours coded as periods '.' and was also getting the position of all colour errors off by one base. The correction improves the accuracy of mismatch counting and the quality calibration function and also changes the quality of base calls near colour errors and hence may improve SNP calls especially in low coverage projects. Alignment location and base calls are not affected.
- Added a checksum attribute to the index file. This is used to validate correct save/load of the index.
- Added conversion for alignment score tag AS:i:99
- Add option (--GZIP) to write demuxed reads in gzip format.