- Introduction
- Requirements
- Getting Started
- Running Novoalign
- Using Novoalign on Bisulphite treated DNA
- Using Novoalign on SOLiD(TM) Colorspace reads
The two key programs that come with Novoalign package are:
novoindex | A utility to construct an index for the reference sequences. Typically creates a k-mer index that can be loaded into shared memory for access by multiple search processes. |
novoalign | An alignment tool for aligning short sequences against an indexed set of reference sequences. Typically used for aligning Illumina single end and paired end reads. |
novoalignCS | An alignment tool for aligning color space sequences against an indexed set of reference sequences. Typically used for aligning SOLiD(TM) single end and mate-pair reads. |
A computer with an Intel/AMD X86-64 CPU running a 64-bit Linux with 2.6 Kernel or MAC OS-X.
RAM requirements depend on the size of the genome you are aligning against. General rule is about 3 times the size of the reference genome. A minimum of 8Gbyte RAM is recommended for alignments against Human genome.
- Download Novoalign tar file from www.novocraft.com, just click the I agree button and then look for the latest release named like “Novo Package V2.xx.xx for X86-64 Linux (Static LInk)”
- Untar with command:
tar -xzf novo*tar.gz
This will create a folder ./novocraft with Novoalign programs and some documentation files and perl scripts.
- Download the Sample Data tar file that is attached to this page.
- Untar with command:
tar -xzf sampledata.tar.gz
This will create a folder ./sampledata with some files that we’ll use to run Novoalign
Build the indexed genome
-
./novocraft/novoindex ssuis.nix ./sampledata/S_suis.dna
Run Novoalign for Single End Reads
-
./novocraft/novoalign -d ssuis.nix -f ./sampledata/s_1_sequence.txt
Run Novoalign for Paired End Reads
-
./novocraft/novoalign -d ssuis.nix -f ./sampledata/s_1_0000.1.fastq ./sampledata/s_1_0000.2.fastq
Build the indexed genome
-
./novocraft/novoindex -b ssuis.nbx ./sampledata/S_suis.dna
Run Novoalign for Single End Bi-Seq Reads
-
./novocraft/novoalign -d ssuis.nbx -f ./sampledata/sim_biseq.1.fastq
Run Novoalign for Paired End Bi-Seq Reads
-
./novocraft/novoalign -d ssuis.nbx -f ./sampledata/sim_biseq.1.fastq ./sampledata/sim_biseq.2.fastq
Build the indexed genome for SOLiD colorspace alignment
-
./novocraft/novoindex -c ssuis.ncx ./sampledata/S_suis.dna
Run Novoalign for Single End SOLiD Reads
-
./novocraft/novoalignCS -d ssuis.ncx -f reads.csfastq
Run Novoalign for mate-pair SOLiD Reads
-
./novocraft/novoalignCS -d ssuis.ncx -f file_F3.csfastq file_R3.csfastq
Specify the library insert size and standard deviation of working with mate-pair libraries
-
./novocraft/novoalignCS -d ssuis.ncx -f file_F3.csfastq file_R3.csfastq -i 3000 200
Note that novoalignCS accepts reads in .csfasta and .csfastq formats.