full
border
#666666
http://www.novocraft.com/wp-content/themes/smartbox-installable/
http://www.novocraft.com/
#0397c9
style1

NovoAlignCS

NovoalignCS is our aligner for ABI SOLiD colour space reads, operation is similar to standard Novoalign with most command line options from Novoalign working in a similar fashion. The major difference is that the current version of NovoalignCS does support adapter trimming, miRNA mode, or bisulphite mode.

 

Novoindex

You need to build a colour space index for colour space reads. This index uses a hash table with colour space seeds rather than nucleotide seeds.

To construct a colour space index just add option -c to the Novoindex command, as in

novoindex -c genome.ncx *.fa

 

NovoalignCS

NovoalignCS command line options are generally the same as Novoalign, commonly used options are:

Option

Description

-d dbname

Full pathname of indexed reference sequence from novoindex -c

-f seqfile1 [seqfile2]

NovoalignCS accepts ABI Solid *.csfasta files with _QV.qual quality files or .csfastq files.

-t 99

Sets the threshold or highest alignment score acceptable for the best alignment. A default threshold is calculated from read length and genome size such that an alignment to a non-repeat should have a quality higher than 30.

-s 1

If a read is unaligned then shorten by 1 base and try again. This is useful for aligning short RNA reads.

Suggested parameters for short RNA against Human are:

novoalignCS -d …. -s 1 -l 14 -t 40 -f …..

-p 99,99 99,99

Sets thresholds for polyclonal filter. This filter is designed to remove reads that may come from polyclonal clusters or beads. Please refer to paper:
Filtering SOLiD Output, Sasson and Todd P. Michael,

The first pair of values (n,t) sets the number of bases and threshold for the first 20 base pairs of each read. If there are n or more bases with phred quality below t then the read is flagged as polyclonal and will not be aligned. The alignment status is ‘QC’. The second pair applies to the entire read rather than just the first 20bp and is specified as the fraction of bases below a base quality. Setting -p -1 disables the filter. Default is -p 7,10 0.3,10 for 7 of first 20bp below Q10 or 30% of all bases below Q10.

-o format [readgroup]

Specifies the report format. Native, SAM, Pairwise. Default is Native.

eg.

novoalign -o SAM

-i 99,99

Sets approximate fragment length and standard deviation. Default to Mate pair mode with mean fragment length of 2500bp with standard deviation of 500.

-i PE 99,99 Sets paired end mode, mean frgment length and standard deviation.

-k

Enables quality calibration. This is worth trying!

-K file

Colour Error counts are written to the named file after all reads are processed.

This file is useful for charting colour errors by base position in the read.

 

File Formats

CSFASTA

NovoalignCS supports ABI SOLiD csfasta and qual input files with no user preprocessing required.

  1. Polyclonal filter (-p option) used to detect and stop alignment of reads with excessive low quality bases.
  2. In paired end mode Csfasta header is used to identify pairs and match reads, allowing mixed single and paired reads.
  3. If a csfasta file is specified as input NovoalignCS will look in the same folder for a quality file by replacing the .csfasta file extension with _QV.qual. If a quality file is not found a quality of 20 (1% colour error rate) is assumed for all bases.

>2_14_26_F3
T011213122200221123032111221021210131332222101
>2_14_192_F3
T110021221100310030120022032222111321022112223

*_QV.qual

>2_14_26_F3
24 24 22 27 23 10 13 13 20 19 19 18 24 20 22 12 14 5 20 17 14 20 18 17 19 11 21 19 13 13 12 25 9 19 19 6 5 12 20 13 11 8 12 7 14
>2_14_192_F3
14 19 21 13 24 17 18 18 25 21 8 12 21 8 7 11 14 7 19 23 11 24 7 11 29 12 28 17 7 19 7 11 5 11 5 14 13 9 24 8 7 20 0 8 9

Color Space FASTQ

There are two variations of colour space fastq files being used by other aligners.

  1. BWA uses a csfastq format that includes a quality value for the primer base. This is typically coded as a ‘!’ and is not used in alignment scoring.

  2. BFAST has a fastq format that is similar to BWA format except that it does not have a quality for the primer base and hence the quality line is one letter shorter than the read line. NovoalignCS does not support paired end reads in a single BFAST fastq file, it requires two files for paired end.

Novoalign supports both formats and can automatically distinguish the two types based on the length of the quality string on the first line. You can also specify format on the command line using -F BFASTQ or -F CSFASTQ

Colour space qualities are phred quality plus ascii ‘!’.

BWA Type CSFASTQ

 

@SRR015241.1 CLARA_20071207_2_CelmonAmp7797_16bit_26_88_34_F3 length=50

T32322133300002330031001022230020232002203222030231

+SRR015241.1 CLARA_20071207_2_CelmonAmp7797_16bit_26_88_34_F3 length=50

!21(()+#’+#40*.##**)$#$*$###$###############(+####’

@SRR015241.2 CLARA_20071207_2_CelmonAmp7797_16bit_26_88_269_F3 length=50

T01212120333223322020022322232232232222022232033230

+SRR015241.2 CLARA_20071207_2_CelmonAmp7797_16bit_26_88_269_F3 length=50

!,*+*()+*(#’+)###$#+$##’####################’+#####

BFAST Type CSFASTQ

@SRR015241.1 CLARA_20071207_2_CelmonAmp7797_16bit_26_88_34_F3 length=50

T32322133300002330031001022230020232002203222030231

+SRR015241.1 CLARA_20071207_2_CelmonAmp7797_16bit_26_88_34_F3 length=50

21(()+#’+#40*.##**)$#$*$###$###############(+####’

@SRR015241.2 CLARA_20071207_2_CelmonAmp7797_16bit_26_88_269_F3 length=50

T01212120333223322020022322232232232222022232033230

+SRR015241.2 CLARA_20071207_2_CelmonAmp7797_16bit_26_88_269_F3 length=50

,*+*()+*(#’+)###$#+$##’####################’+#####

Two files can be specified for paired end mode. In this case Novoalign parses the header records looking for a header in standard ABI format (eg. >2_14_26_F3). If found then headers from the two files are assumed to be in order and matched for purpose of identifying paired reads. Reads that exist in only one file will be aligned in single end mode.

 

Report Formats

SAM Format

SAM format follows SAM specifications including colour space specific tags.

CS:Z: Original Colour space read
CQ:Z: Colour qualities
CM:i: Number of colour mismatches

Example:

@HD VN:1.0 SO:unsorted

@PG ID:novoalignCS VN:V1.00.11 CL:novoalignCS -d ../ecoli.ncx -f /export/home/zayed/service_projects/solid_ecoli_test//Rosalind_20080729_2_Chris5_F3.csfasta /export/home/zayed/service_projects/solid_ecoli_test/Rosalind_20080729_2_Chris5_R3.csfasta -r R -oSAM

@SQ SN:NC_004431 AS:ecoli.nix LN:5231428

469_29_17_F3 16 NC_004431 3712099 150 1S49M * 0 0 TTCGGTACCAGCAATAGACAGCGTTGCACGATCGGCGTAGTTAACGGCGG #27;JJ$”>L9″”HP6)9STI=8JYIHG6=MTRWQROHWULTSTLJ[]UI PG:Z:novoalignCS

AS:i:21 UQ:i:21 NM:i:0 MD:Z:49 CS:Z:T20330310301231330323231131013321122333132121310320 CQ:Z:[email protected]?=.?6>[email protected]?4>:9<2-,=->+46?5&&2?+.’3:%0565’1# CM:i:2

469_29_25_F3 16 NC_004431 404723 150 50M * 0 0 TGGATGCTCTTAGCCGTTTGTTGATGCTTAACGCTCCACAAGGAACGATC :NABN#!%>[email protected]″9OU””BTPTSQZMJYYTUVW[\KMZ`VWPLR PG:Z:novoalignCS

AS:i:41 UQ:i:41 NM:i:0 MD:Z:50 CS:Z:T12323102020111022331030231321111000303230200313201 CQ:Z:!7<[email protected]?B<?/=?>>>:=9<>?<6>7:;(<>88#[email protected])9<<1.’587-69 CM:i:4

 

Native Format

The colour space read and quality values are inserted in the report line before the Nucleotide sequence and qualities. All other fields are as per Novoalign documentation

Example:

@7_418_678 L T22010302130021000210112203201031000 78&:6<47=1>71>5=&%<)776)&::&,5(15/* CCCCAAAGTCGCTCACCATCCCAGGGCAGGCCAAG .AM%!!!!G!!5GVWH!!”J\XQ^XWTY[YGEX U 59 150 >Chr6 159937789 R . 159937674 F

@7_418_678 R T30322221320200131003000212300010020201000200000020 967,356)63<442?:41.<;<$+>9)72;45-‘086$6(.5&-)540&. AATCTCTGCTTCCCATGGGCCCTCAGCCCCAAACCTTGGGGAGTGGGTGG XVKGQTGGRXYQOZbWNHS`!!4Q`JHRVXR*[email protected]@F>!!!!F”!!;3 U 79 150 >Chr6 159937674 F . 159937789 R

[documentation_list_children child_of={current}]

default
Loading posts...
link_magnifier
#6E787E
on
fadeInDown
loading
#6E787E
on