
Getting Started

The two key programs that come with Novoalign package are:

novoindex A utility to construct an index for the reference sequences. Typically creates a k-mer index that can be loaded into shared memory for access by multiple search processes.
novoalign An alignment tool for aligning short sequences against an indexed set of reference sequences. Typically used for aligning Illumina single end and paired end reads.
novoalignCS An alignment tool for aligning color space sequences against an indexed set of reference sequences. Typically used for aligning SOLiD(TM) single end and mate-pair reads.

A computer with an Intel/AMD X86-64 CPU running a 64-bit Linux with 2.6 Kernel or MAC OS-X.

RAM requirements depend on the size of the genome you are aligning against. General rule is about 3 times the size of the reference genome. A minimum of 8Gbyte RAM is recommended for alignments against Human genome.

  1. Download Novoalign tar file from www.novocraft.com(external link), just click the I agree button and then look for the latest release named like “Novo Package V2.xx.xx for X86-64 Linux (Static LInk)”
  2. Untar with command:
    tar -xzf novo*tar.gz

    This will create a folder ./novocraft with Novoalign programs and some documentation files and perl scripts.

  3. Download the Sample Data tar file that is attached to this page.
  4. Untar with command:
    tar -xzf sampledata.tar.gz

    This will create a folder ./sampledata with some files that we’ll use to run Novoalign

Build the indexed genome

./novocraft/novoindex ssuis.nix ./sampledata/S_suis.dna

Run Novoalign for Single End Reads

./novocraft/novoalign -d ssuis.nix -f ./sampledata/s_1_sequence.txt

Run Novoalign for Paired End Reads

./novocraft/novoalign -d ssuis.nix -f ./sampledata/s_1_0000.1.fastq ./sampledata/s_1_0000.2.fastq

Build the indexed genome

./novocraft/novoindex -b ssuis.nbx ./sampledata/S_suis.dna

Run Novoalign for Single End Bi-Seq Reads

./novocraft/novoalign -d ssuis.nbx -f ./sampledata/sim_biseq.1.fastq

Run Novoalign for Paired End Bi-Seq Reads

./novocraft/novoalign -d ssuis.nbx -f ./sampledata/sim_biseq.1.fastq ./sampledata/sim_biseq.2.fastq

Build the indexed genome for SOLiD colorspace alignment

./novocraft/novoindex -c  ssuis.ncx ./sampledata/S_suis.dna

Run Novoalign for Single End SOLiD Reads

./novocraft/novoalignCS -d ssuis.ncx -f reads.csfastq

Run Novoalign for mate-pair SOLiD Reads

./novocraft/novoalignCS -d ssuis.ncx -f file_F3.csfastq file_R3.csfastq

Specify the library insert size and standard deviation of working with mate-pair libraries

./novocraft/novoalignCS -d ssuis.ncx -f file_F3.csfastq file_R3.csfastq -i 3000 200

Note that novoalignCS accepts reads in .csfasta and .csfastq formats.

That’s it!


Loading posts...