NovoWORX is a user-friendly and complete workbench built to cater to scientists without the need to acquire special technical skills in bioinformatics and computer science. NovoWORX comes with 6 standard pipelines, as well as customizable ones to suit your research needs.
No more working on the terminal for you! Gone is the black box with those teeny tiny letters! With NovoWORX, you will have the pleasure of working with its easy-to-use GUI and view your results directly from the workbench. Your projects and your output files will now be stored in one place and one place only.
All you need to do is to select your input files, the reference genome and the pipeline that you want to run the project with. Then you can sit back and relax while NovoWORX does your job for you.
100GB Disk Space
Very! All you need is your sequence files or input files and leave the rest to NovoWORX. [If we do not have the reference genome that you require, you can also add in your own!
We currently support FASTA, FASTQ, COLOURSPACE, BAM, VCF and many other types of files. And our list is expanding everyday.
Absolutely! All outputs from our pipelines will be viewable in NovoWORX and they will be presented in tables, images, charts, pdf and you can even browse the results on our genome browser. You will also be able to filter out results and download them as necessary.
We can provide you with the reference genomes that you need. You can also opt to use the functionality that is included in NovoWORX to add any reference genome files yourself.
We currently have Whole Genome Analysis, Exome Analysis, RNA Analysis, MiRNA Analysis, Methylation Analysis, and Basic NGS Alignment.
Absolutely! Your projects will be submitted to the queue and will be run as soon as there is a slot available. Do note that the response time will be depending on your computer / server’s hardware.
There is absolutely no need to keep checking back with the workbench for the status of your project. You can opt to receive emails on the status of your project and NovoWORX will automatically send you status updates once it has completed its run.
Absolutely! Contact us at [email protected] to discuss about your requirements!
The tools will depend on the pipelines that are installed in your copy of NovoWORX. If you require the list, feel free to contact us at [email protected]
All basic licences are included in NovoWORX’ fees.
Novoalign is a short-read mapper designed to be fast and sensitive on small to large genome databases. It’s primary design is based on the use of read quality information and the need to assemble genomes from resequencing experiments. Novoalign supports fragment, paired-end and mate-pair reads from major sequencing platforms such as Illumina, SOLiD and Roche 454. NovoalignCS is the version of Novoalign developed for SOLiD colourspace reads.
All reference genome databases to which short reads are mapped need to be formatted in a special way e.g. formatdb sequences before using NCBI BLAST. Novoindex has been designed for this purpose to preformat the DNA database. A masked human genome can be indexed in 3-5 minutes on a single CPU server with at least 8Gb or RAM for an Illumina/SOLiD colorspace index.
Yes, most definitely with the “-r” parameter. You have several options for reads with multiple alignments including reporting all, none, or a randomly chosen one. See the documentation for more information.
Yes, novoalign supports reads for all Genome Analyzer platforms, including the HISeq model. Novoalign can handle Illumina mate-pairs, single-end fragments and paired-end reads.
Novoalign reports results in a single-line tab-delimited format for easy parsing and sorting of large numbers of alignments. Novoalign also supports the widely used SAM format specification for sequence alignments. Use the “-o SAM” option to your command line for SAM output. Note that there is also a command line utility script called novo2sam.pl that converts native novoalign format to SAM format. A pairwise alignment format is also supported.
Yes. Transcriptome reads may be used with a genome or transcriptome database. Note that in cases where reads map to multiple locations the “-r All” option should be used to report all alignment locations.
I have Illumina sequencing data from microRNA experiments and there is a conserved adaptor sequence at the three-prime terminal end of the read. Do I need to remove these before using novoalign?
The current version of novoalign can strip these adaptors off for you. You can use the “-a” option to specificy the adaptor sequence.
Set the “-m” option for miRNA alignment mode to look for potentially new microRNAs when mapping reads to a reference genome. Novoalign is also capable of mapping bisulfite-sequencing (BS-Seq) reads from bisulfite conversion protocols.
Yes. Sanger and Illumina FASTQ formats are both supported. The quality values are converted to phred values using the Sanger method and used in subsequent alignment routines.
Methylation alignment is on our roadmap at the time of writing this document. It should be available soon. In mean time you could do it by preprocess of the genome sequence to convert C to Ts.
The Maq (maq.sourceforge.net) program uses a mapping quality score and quality values for read alignments. How does the Novoalign scoring system compare to this?
Novoalign calculates a mapping quality score that takes similar factors into account as Maq does. However, the alignment quality may vary from maq.
Novoalign should find all the same unambiguous mapping locations for short reads as compared to Eland. However, as it uses base qualities and can align with gaps it usually finds more alignments.
No, only short read alignment is supported at this time. The assembly component is currently in development.
Alignment process is non-heuristic and will find the optimum alignment or no alignment if an alignment with a score less than the threshold cannot be found.
There is absolutely no restriction on what genome you may want to map your short reads with Novoalign. It’s best to use a genome from which your sample originated or at least a closely related genome i.e greater than 90% sequence similarity.
Yes, certain genomes contain ambiguous IUPAC codes and novoalign has been designed with this in mind.
The indexing process can optionally ignore lowercase nucleotides in the reference genome/nucleotide database. This means the lower case letters cannot be used to initiate an alignment however the Needleman-Wunsch alignment process can extend into lower case sequence as long as an alignment was initiated in indexed upper case sequence.
NovoalignCS is the version of Novoalign specifically written to deal with SOLiD colorspace reads.
Novoalign can be used with 454 paired-end reads.
The short answer is yes. The alignment threshold can be increased to allow for finding more mismatches and/or gaps. Novoalign reports the locations of mismatches and gaps in the alignments. When using SAM format mismatches, indels and other edit operations e.g. soft-clipping, are presently coded into the CIGAR string.
Use of base qualities and gaps in single ended and paired end reads. Full Needleman Wunsch dynamic programming alignment of short reads.
Novoalign may also used when your expecting low identity (95% or less) with small genomes as it can align with up to 8 mismatches to high quality bases. It’s also one of the fastest gapped aligners available when run with threshold set for two or more mismatches.
Firstly you need an AMD/Intel X86-64 CPU with 64bit Linux or Mac OS X, most modern Intel/AMD servers or workstations should be usable. The memory requirements depend on the size of the genome you are aligning to. For Human NCBI 36, the index can be built in 7GByte of RAM so 8GB RAM will be sufficient. Colourspace indexes are approximately the same size. Smaller genomes require proportionally less memory. One PC or Server should be able to align reads faster than the GA can produce them.
Only standard C and C++ libraries are required for Novoalign to work. There are no expensive database/library requirements.
The applications are not open source in that the source code is not available for download. However, anybody may download and use these programs free of charge for their research and any other non-profit activities as long as results are published in peer-reviewed journals or some other medium. Other users may download & evaluate the software using license key before purchase.
Linux 2.6 64-bit on X86-64 CPUs and Mac OS X operating systems are supported at this time.
You may contact “support at novocraft dot com” for more information. Priority is given to holders of support contracts. You could also try posting a question on our support forum.
Yes. As of version 2.07 Novoalign is able to handle named pipes for aligning short reads to a reference genome.
NovoalignCS maps SOLiD colourspace reads to a reference genome while Novoalign does nucleotide space. We collectively refer to both tools as Novoalign in cases where this differentiation is not required. The command line options for both programs are the same with specific parameters for each tool.
Novoalign was designed to be an accurate short read aligner that combines fast K-mer index searching with dynamic programming. In terms of speed Novoalign will be slower than Burrows-Wheeler transform aligners e.g. BWA, Bowtie and in some cases faster than BFAST.
In terms of accuracy Novoalign is in most cases more sensitive than these tools because it uses full dynamic programming to find the best alignment of a short read to a genome sequence.
Yes. Use the “-o SAM” option or convert native Novoalign output to SAM format using novo2sam.pl.
The choice of aligner is an important one when considering SNP/Indel calling pipelines. The most sensitive and specific aligner will produce the most reliable sequence pileup for SNP calling against a reference genome. Novoalign has been shown to display high sensitivity according to other independent studies. See Homer and Li, 2010 and Krawitz et al., 2010.
NovoalignCS and Novoalign have a built-in polyclonal filter based on the method published by Sasson and Michael (2010). The polyclonal filter removes short reads based on the number of high phred-quality bases above T within the first N bases of a read. T and N can be set using Novoalign’s “-p” option.
The full version of Novoalign/NovoalignCS enables a multitude of features such as multithreading, adaptor trimming, bisulfite alignment,etc. Dedicated support is also covered for the term of the annual license.
Depending on your level of subscription, Novoalign can be used by an unlimited number of users/computers within a site e.g. a laboratory license allows multiple users in that lab to use the software on as many computers within that lab.
Novoalign and NovoalignCS both have a message passing interface (MPI) counterpart. These programs are called NovoalignMPI and NovoalignCSMPI. The MPI versions of Novoalign are more beneficial to organizations with large computing infrastructures e.g. genome sequencing centers and pharmaceutical companies.
Yes. Novoalign is able to read gzip-compressed short read files. Note that input files must have a “.gz” extension.
When aligning SOLiD colourspace reads with NovoalignCS, do I need to convert the colourspace FASTA (CSFASTA) to colourspace FASTQ (CSFASTQ) format first?
No. NovoalignCS supports direct alignment of CSFASTA and their associated quality files in fragment and mate-pair mode. Files are automatically recognized by their file extensions e.g. .csfasta _QV.qual. These files may be gzipped and mapped with NovoalignCS.
Yes. Reads not meeting the minimum read length requirement will be filtered and flagged as QC. Otherwise Novoalign is capable of mapping variable length reads.
Yes, Novoalign is available on the cloudbiolinux 64-bit image accessible from Amazon Web Services (AWS).
What is the difference between hairpin score, alignment score and alignment/mapping quality reported by Novoalign?
Alignment score is the log probability that the mapped sequence is the same as fragment read by the sequencer.
Alignment/Mapping quality is the log probability that this is the correct alignment location given that the DNA fragment came from the reference genome. It is a Bayesian posterior probability that is calculated from all the alignments found for the read. Therefore lower scores are always better as score is -10log(P) where P is probability of the alignment. P=1.0 gives a score of zero.
Hairpin scoring is similar to alignment score but for nearby reverse complement of the read. Lower scores might help locate the source of the miRNA. Any mismatches will score around 30 depending on base quality. If there was a perfect complementary match the hairpin score should be zero.
What is the meaning of the step size (-s param) in novoindex and how does it affect the index creation?
When indexing genomes we create an index that lists all the k-mers and where they exist in the reference genome. If the step size is one then every k-mer is indexed, with step size 2 only every second k-mer is indexed, and so on. This reduces the size of the index considerably, especially for large genomes. Novoalign automatically adjusts for any drop in sensitivity that might result from using a sparse index.