Novo Aligners
Questions
- What is novoalign?
- Why novoindex?
- Can I map reads that align to multiple locations on the genome?
- Does novoalign support both Illumina Genome Analyzer platforms?
- What alignment formats are supported by Novoalign?
- Can I map RNASeq transcriptome reads/tags using novoalign?
- I have Illumina sequencing data from microRNA experiments and there is a conserved adaptor sequence at the three-prime terminal end of the read. Do I need to remove these before using novoalign?
- What extra features are available in Novoalign.
- Does Novoalign support Sanger and Illumina FASTQ.
- Can I use novoalign for methylation experiments?
- The Maq (maq.sourceforge.net) program uses a mapping quality score and quality values for read alignments. How does the Novoalign scoring system compare to this?
- How does novoalign compare to Eland in terms of performance and finding alignment locations?
- Can I assemble my genome with Novoalign?
- Does Novoalign use heuristics?
- Which genomes can I use Novoalign for?
- Does novoalign map reads to genomes with more than 4 IUPAC codes?
- How does novoalign handle lowercase masking?
- Can I use Novoalign for ABI SOLiD and 454 reads?
- What are the advantages of using these programs over other tools?
- Can I identify alignments with gaps or mismatches using Novoalign?
- Do I need an expensive high-end server to run Novoalign?
- Do I need to have any proprietary software installed on my computer/server before using Novoalign?
- Is Novoalign open source?
- Which platforms are currently supported?
- I need technical support, who do I contact?
- Can I use Novoalign with named pipes e.g. mkfifo?
- Is it possible to use Novoalign output with the Picard and GATK programs?
- What is the difference between Novoalign and NovoalignCS?
- How does Novoalign compare to programs like BWA, Bowtie, ELAND and BFAST?
- Does Novoalign support SAM/BAM format?
- How could changing my current aligner to Novoalign impact SNP/Indel calling?
- What is the polyclonal filter used in Novoalign?
- What does a full, paid-for version of Novoalign/NovoalignCS entitle me to?
- What is the best way to use Novoalign on a computing cluster?
- Is Novoalign able to read compressed gzpped read files?
- When aligning SOLiD colourspace reads with NovoalignCS, do I need to convert the colourspace FASTA (CSFASTA) to colourspace FASTQ (CSFASTQ) format first?
- Can I use Novoalign on reads with variable lengths?
- Can I use Novoalign on a cloud computing system?
- What is the difference between hairpin score, alignment score and alignment/mapping quality reported by Novoalign?
- What is the meaning of the step size (-s param) in novoindex and how does it affect the index creation?
Answers
Novoalign is a short-read mapper designed to be fast and sensitive on small to large genome databases. It's primary design is based on the use of read quality information and the need to assemble genomes from resequencing experiments.
Novoalign supports fragment, paired-end and mate-pair reads from major sequencing platforms such as Illumina, SOLiD and Roche 454.
NovoalignCS is the version of Novoalign developed for SOLiD colourspace reads.
Novoalign reports results in a single-line tab-delimited format for easy parsing and sorting of large numbers of alignments. Novoalign also supports the widely used SAM
format specification for sequence alignments. Use the "-o SAM" option to your command line for SAM output. Note that there is also a command line utility script called novo2sam.pl that converts native novoalign format to SAM format.
A pairwise alignment format is also supported.
Novoalign calculates a mapping quality score that takes similar factors into account as Maq does. However, the alignment quality may vary from maq.
NovoalignCS
is the version of Novoalign specifically written to deal with SOLiD colorspace reads.
Novoalign can be used with 454 paired-end reads.
Use of base qualities and gaps in single ended and paired end reads. Full Needleman Wunsch dynamic programming alignment of short reads.
Novoalign may also used when your expecting low identity (95% or less) with small genomes as it can align with up to 8 mismatches to high quality bases. It's also one of the fastest gapped aligners available when run with threshold set for two or more mismatches.
Firstly you need an AMD/Intel X86-64 CPU with 64bit Linux or Mac OS X, most modern Intel/AMD servers or workstations should be usable.
The memory requirements depend on the size of the genome you are aligning to. For Human NCBI 36, the index can be built in 7GByte of RAM so 8GB RAM will be sufficient. Colourspace indexes are approximately the same size.
Smaller genomes require proportionally less memory.
One PC or Server should be able to align reads faster than the GA can produce them.
Linux 2.6 64-bit on X86-64 CPUs and Mac OS X operating systems are supported at this time.
Novoalign was designed to be an accurate short read aligner that combines fast K-mer index searching with dynamic programming. In terms of speed Novoalign will be slower than Burrows-Wheeler transform aligners e.g. BWA, Bowtie and in some cases faster than BFAST.
In terms of accuracy Novoalign is in most cases more sensitive than these tools because it uses full dynamic programming to find the best alignment of a short read to a genome sequence.
The full version of Novoalign/NovoalignCS enables a multitude of features such as multithreading, adaptor trimming, bisulfite alignment,etc. Dedicated support is also covered for the term of the annual license.
Depending on your level of subscription, Novoalign can be used by an unlimited number of users/computers within a site e.g. a laboratory license allows multiple users in that lab to use the software on as many computers within that lab.
No. NovoalignCS supports direct alignment of CSFASTA and their associated quality files in fragment and mate-pair mode. Files are automatically recognized by their file extensions e.g. .csfasta _QV.qual. These files may be gzipped and mapped with NovoalignCS.
Alignment score is the log probability that the mapped sequence is the same as fragment read by the sequencer.
Alignment/Mapping quality is the log probability that this is the correct alignment location given that the DNA fragment came from the reference genome. It is a Bayesian posterior probability that is calculated from all the alignments found for the read. Therefore lower scores are always better as score is -10log(P) where P is probability of the alignment. P=1.0 gives a score of zero.
Hairpin scoring is similar to alignment score but for nearby reverse complement of the read. Lower scores might help locate the source of the miRNA. Any mismatches will score around 30 depending on base quality. If there was a perfect complementary match the hairpin score should be zero.


