Loading...
 

SNP and MicroIndel detection


Novoalign detects mismatches and short indels in short reads from next-generation sequencing platforms. The recipes below contain command line usage to generate list of single nucleotide polymorphisms SNPs and Indels for further analysis.

Generate sorted alignments in BAM format

Initially we need to align our file(s) to the reference genome:

Illumina:

novoalign -d hg18.nix -f reads1.fastq  -o SAM 2> stats.txt > novo.sam
samtools view -bS novo.sam | samtools sort - reads1

SOLiD

novoalignCS -d hg18.ncx -f reads1.csfastq -o SAM 2> stats.txt > novocs.sam
samtools view -bS novocs.sam | samtools sort - reads1
For paired-end or mate-pair reads adjust the parameters accordingly.
If we have multiple alignment files these all need to be sorted individually and merged with either samtools(external link) or Picard(external link).

Remove PCR duplicates

It is highly recommended to remove PCR duplicates before proceeding with the SNP and Indel calls. We use samtools as an example to remove PCR duplicates.

#remove PCR duplicates from single-end reads
samtools rmdup -s reads1.bam reads1.sorted.bam
#For Paired reads
samtools rmdup reads1.bam reads1.pe.sorted.bam

SNP and Microindel calling

In this example we use the samtools(external link) pileup SNP caller and accept reads with a minimum mapping quality of 20:

#The hg18.fasta reference genome sequence is required. samtools.pl  is also required
samtools view -u -q 20 reads1.bam  | samtools pileup -vc -f hg18.fasta - | samtools.pl varFilter -D 1000 -G 50 > variations.list

Filtering and final list generation

A simple awk script can be used to filter our list of variations where the indel SNP quality is greater than 50 or 60 respectively:

awk '($3=="*"&&$6>=50)||($3!="*"&&$6>30)' variations.list > variations.filtered.list

Exporting to other formats

The samtools distribution contains a helper script called sam2vcf.pl that may be used to convert to the Variant Call Format (VCF(external link)):

sam2vcf.pl  variations.filtered.list -r hg18.fasta > variations.vcf

Created by zayed. Last Modification: Tuesday 30 of April, 2013 14:32:05 MYT by colin.
Show HelpHelp