What is NovoWORX?
NovoWORX is a user-friendly and complete workbench built to cater to scientists without the need to acquire special technical skills in bioinformatics and computer science. NovoWORX comes with 6 standard pipelines, as well as customizable ones to suit your research needs.
How does NovoWORX help me with my work?
No more working on the terminal for you! Gone is the black box with those teeny tiny letters! With NovoWORX, you will have the pleasure of working with its easy-to-use GUI and view your results directly from the workbench. Your projects and your output files will now be stored in one place and one place only.
All you need to do is to select your input files, the reference genome and the pipeline that you want to run the project with. Then you can sit back and relax while NovoWORX does your job for you.
Minimum System Requirement for NovoWORX:
100GB Disk Space
How easy is it to use NovoWORX?
Very! All you need is your sequence files or input files and leave the rest to NovoWORX. [If we do not have the reference genome that you require, you can also add in your own!
What kind of files can I use with NovoWORX?
We currently support FASTA, FASTQ, COLOURSPACE, BAM, VCF and many other types of files. And our list is expanding everyday.
Can I view the results in NovoWORX?
Absolutely! All outputs from our pipelines will be viewable in NovoWORX and they will be presented in tables, images, charts, pdf and you can even browse the results on our genome browser. You will also be able to filter out results and download them as necessary.
Do you have this <reference> genome in your library?
We can provide you with the reference genomes that you need. You can also opt to use the functionality that is included in NovoWORX to add any reference genome files yourself.
What kind of pipelines are included in NovoWORX?
We currently have Whole Genome Analysis, Exome Analysis, RNA Analysis, MiRNA Analysis, Methylation Analysis, and Basic NGS Alignment.
Can I run multiple projects / pipelines simultaneously in NovoWORX?
Absolutely! Your projects will be submitted to the queue and will be run as soon as there is a slot available. Do note that the response time will be depending on your computer / server’s hardware.
Do I need to keep a constant vigilance on the status of my project?
There is absolutely no need to keep checking back with the workbench for the status of your project. You can opt to receive emails on the status of your project and NovoWORX will automatically send you status updates once it has completed its run.
Can I request additional pipelines to be added to NovoWORX?
Absolutely! Contact us at firstname.lastname@example.org to discuss about your requirements!
What kind of tools are included in NovoWORX?
The tools will depend on the pipelines that are installed in your copy of NovoWORX. If you require the list, feel free to contact us at email@example.com
Do I need to pay for the licences for the tools used in NovoWORX?
All basic licences are included in NovoWORX’ fees.
Sounds great! Where can I get a copy of NovoWORX?
Contact us at firstname.lastname@example.org and we will get you set up as soon as we can!
What is novoalign?
Novoalign is a short-read mapper designed to be fast and sensitive on small to large genome databases. It’s primary design is based on the use of read quality information and the need to assemble genomes from resequencing experiments. Novoalign supports fragment, paired-end and mate-pair reads from major sequencing platforms such as Illumina, SOLiD and Roche 454. NovoalignCS is the version of Novoalign developed for SOLiD colourspace reads.
All reference genome databases to which short reads are mapped need to be formatted in a special way e.g. formatdb sequences before using NCBI BLAST. Novoindex has been designed for this purpose to preformat the DNA database. A masked human genome can be indexed in 3-5 minutes on a single CPU server with at least 8Gb or RAM for an Illumina/SOLiD colorspace index.
Can I map reads that align to multiple locations on the genome?
Yes, most definitely with the “-r” parameter. You have several options for reads with multiple alignments including reporting all, none, or a randomly chosen one. See the documentation for more information.
Does novoalign support both Illumina Genome Analyzer platforms?
Yes, novoalign supports reads for all Genome Analyzer platforms, including the HISeq model. Novoalign can handle Illumina mate-pairs, single-end fragments and paired-end reads.
What alignment formats are supported by Novoalign?
Novoalign reports results in a single-line tab-delimited format for easy parsing and sorting of large numbers of alignments. Novoalign also supports the widely used SAM format specification for sequence alignments. Use the “-o SAM” option to your command line for SAM output. Note that there is also a command line utility script called novo2sam.pl that converts native novoalign format to SAM format. A pairwise alignment format is also supported.
Can I map RNASeq transcriptome reads/tags using novoalign?
Yes. Transcriptome reads may be used with a genome or transcriptome database. Note that in cases where reads map to multiple locations the “-r All” option should be used to report all alignment locations.
I have Illumina sequencing data from microRNA experiments and there is a conserved adaptor sequence at the three-prime terminal end of the read. Do I need to remove these before using novoalign?
The current version of novoalign can strip these adaptors off for you. You can use the “-a” option to specificy the adaptor sequence.
What extra features are available in Novoalign?
Set the “-m” option for miRNA alignment mode to look for potentially new microRNAs when mapping reads to a reference genome. Novoalign is also capable of mapping bisulfite-sequencing (BS-Seq) reads from bisulfite conversion protocols.
Does Novoalign support Sanger and Illumina FASTQ?
Yes. Sanger and Illumina FASTQ formats are both supported. The quality values are converted to phred values using the Sanger method and used in subsequent alignment routines.
Can I use novoalign for methylation experiments?
Methylation alignment is on our roadmap at the time of writing this document. It should be available soon. In mean time you could do it by preprocess of the genome sequence to convert C to Ts.
The Maq (maq.sourceforge.net) program uses a mapping quality score and quality values for read alignments. How does the Novoalign scoring system compare to this?
Novoalign calculates a mapping quality score that takes similar factors into account as Maq does. However, the alignment quality may vary from maq.
How does novoalign compare to Eland in terms of performance and finding alignment locations?
Novoalign should find all the same unambiguous mapping locations for short reads as compared to Eland. However, as it uses base qualities and can align with gaps it usually finds more alignments.
Can I assemble my genome with Novoalign?
No, only short read alignment is supported at this time. The assembly component is currently in development.
Does Novoalign use heuristics?
Alignment process is non-heuristic and will find the optimum alignment or no alignment if an alignment with a score less than the threshold cannot be found.
Which genomes can I use Novoalign for?
There is absolutely no restriction on what genome you may want to map your short reads with Novoalign. It’s best to use a genome from which your sample originated or at least a closely related genome i.e greater than 90% sequence similarity.
Does novoalign map reads to genomes with more than 4 IUPAC codes?
Yes, certain genomes contain ambiguous IUPAC codes and novoalign has been designed with this in mind.
How does novoalign handle lowercase masking?
The indexing process can optionally ignore lowercase nucleotides in the reference genome/nucleotide database. This means the lower case letters cannot be used to initiate an alignment however the Needleman-Wunsch alignment process can extend into lower case sequence as long as an alignment was initiated in indexed upper case sequence.
Can I use Novoalign for ABI SOLiD and 454 reads?
NovoalignCS is the version of Novoalign specifically written to deal with SOLiD colorspace reads.
Novoalign can be used with 454 paired-end reads.
What are the advantages of using these programs over other tools?
The short answer is yes. The alignment threshold can be increased to allow for finding more mismatches and/or gaps. Novoalign reports the locations of mismatches and gaps in the alignments. When using SAM format mismatches, indels and other edit operations e.g. soft-clipping, are presently coded into the CIGAR string.
Can I identify alignments with gaps or mismatches using Novoalign?
Use of base qualities and gaps in single ended and paired end reads. Full Needleman Wunsch dynamic programming alignment of short reads.
Novoalign may also used when your expecting low identity (95% or less) with small genomes as it can align with up to 8 mismatches to high quality bases. It’s also one of the fastest gapped aligners available when run with threshold set for two or more mismatches.
Do I need an expensive high-end server to run Novoalign?
Firstly you need an AMD/Intel X86-64 CPU with 64bit Linux or Mac OS X, most modern Intel/AMD servers or workstations should be usable. The memory requirements depend on the size of the genome you are aligning to. For Human NCBI 36, the index can be built in 7GByte of RAM so 8GB RAM will be sufficient. Colourspace indexes are approximately the same size. Smaller genomes require proportionally less memory. One PC or Server should be able to align reads faster than the GA can produce them.
Do I need to have any proprietary software installed on my computer/server before using Novoalign?
Only standard C and C++ libraries are required for Novoalign to work. There are no expensive database/library requirements.
Is Novoalign open source?
The applications are not open source in that the source code is not available for download. However, anybody may download and use these programs free of charge for their research and any other non-profit activities as long as results are published in peer-reviewed journals or some other medium. Other users may download & evaluate the software using license key before purchase.
Which platforms are currently supported?
Linux 2.6 64-bit on X86-64 CPUs and Mac OS X operating systems are supported at this time.
I need technical support, who do I contact?
You may contact “support at novocraft dot com” for more information. Priority is given to holders of support contracts. You could also try posting a question on our support forum.
Can I use Novoalign with named pipes e.g. mkfifo?
Yes. As of version 2.07 Novoalign is able to handle named pipes for aligning short reads to a reference genome.
Is it possible to use Novoalign output with the Picard and GATK programs?
Yes. Novoalign follows the full SAM format specification and it has been tested with Picard and the Genome analysis toolkit (GATK). See the Novoalign documentation guide for converting novoalign SAM output to BAM format. Novoalign has options for creating SAM readgroup names required by Picard.
What is the difference between Novoalign and NovoalignCS?
NovoalignCS maps SOLiD colourspace reads to a reference genome while Novoalign does nucleotide space. We collectively refer to both tools as Novoalign in cases where this differentiation is not required. The command line options for both programs are the same with specific parameters for each tool.
How does Novoalign compare to programs like BWA, Bowtie, ELAND and BFAST?
Novoalign was designed to be an accurate short read aligner that combines fast K-mer index searching with dynamic programming. In terms of speed Novoalign will be slower than Burrows-Wheeler transform aligners e.g. BWA, Bowtie and in some cases faster than BFAST.
In terms of accuracy Novoalign is in most cases more sensitive than these tools because it uses full dynamic programming to find the best alignment of a short read to a genome sequence.
Does Novoalign support SAM/BAM format?
Yes. Use the “-o SAM” option or convert native Novoalign output to SAM format using novo2sam.pl.
How could changing my current aligner to Novoalign impact SNP/Indel calling?
The choice of aligner is an important one when considering SNP/Indel calling pipelines. The most sensitive and specific aligner will produce the most reliable sequence pileup for SNP calling against a reference genome. Novoalign has been shown to display high sensitivity according to other independent studies. See Homer and Li, 2010 and Krawitz et al., 2010.
What is the polyclonal filter used in Novoalign?
NovoalignCS and Novoalign have a built-in polyclonal filter based on the method published by Sasson and Michael (2010). The polyclonal filter removes short reads based on the number of high phred-quality bases above T within the first N bases of a read. T and N can be set using Novoalign’s “-p” option.
What does a full, paid-for version of Novoalign/NovoalignCS entitle me to?
The full version of Novoalign/NovoalignCS enables a multitude of features such as multithreading, adaptor trimming, bisulfite alignment,etc. Dedicated support is also covered for the term of the annual license.
Depending on your level of subscription, Novoalign can be used by an unlimited number of users/computers within a site e.g. a laboratory license allows multiple users in that lab to use the software on as many computers within that lab.
What is the best way to use Novoalign on a computing cluster?
Novoalign and NovoalignCS both have a message passing interface (MPI) counterpart. These programs are called NovoalignMPI and NovoalignCSMPI. The MPI versions of Novoalign are more beneficial to organizations with large computing infrastructures e.g. genome sequencing centers and pharmaceutical companies.
Is Novoalign able to read compressed gzpped read files?
Yes. Novoalign is able to read gzip-compressed short read files. Note that input files must have a “.gz” extension.
When aligning SOLiD colourspace reads with NovoalignCS, do I need to convert the colourspace FASTA (CSFASTA) to colourspace FASTQ (CSFASTQ) format first?
No. NovoalignCS supports direct alignment of CSFASTA and their associated quality files in fragment and mate-pair mode. Files are automatically recognized by their file extensions e.g. .csfasta _QV.qual. These files may be gzipped and mapped with NovoalignCS.
Can I use Novoalign on reads with variable lengths?
Yes. Reads not meeting the minimum read length requirement will be filtered and flagged as QC. Otherwise Novoalign is capable of mapping variable length reads.
Can I use Novoalign on a cloud computing system?
Yes, Novoalign is available on the cloudbiolinux 64-bit image accessible from Amazon Web Services (AWS).
What is the difference between hairpin score, alignment score and alignment/mapping quality reported by Novoalign?
Alignment score is the log probability that the mapped sequence is the same as fragment read by the sequencer.
Alignment/Mapping quality is the log probability that this is the correct alignment location given that the DNA fragment came from the reference genome. It is a Bayesian posterior probability that is calculated from all the alignments found for the read. Therefore lower scores are always better as score is -10log(P) where P is probability of the alignment. P=1.0 gives a score of zero.
Hairpin scoring is similar to alignment score but for nearby reverse complement of the read. Lower scores might help locate the source of the miRNA. Any mismatches will score around 30 depending on base quality. If there was a perfect complementary match the hairpin score should be zero.
What is the meaning of the step size (-s param) in novoindex and how does it affect the index creation?
When indexing genomes we create an index that lists all the k-mers and where they exist in the reference genome. If the step size is one then every k-mer is indexed, with step size 2 only every second k-mer is indexed, and so on. This reduces the size of the index considerably, especially for large genomes. Novoalign automatically adjusts for any drop in sensitivity that might result from using a sparse index.