Novoalign detects mismatches and short indels in short reads from next-generation sequencing platforms. The recipes below contain command line usage to generate list of single nucleotide polymorphisms SNPs and Indels for further analysis.
Generate sorted alignments in BAM format
Initially we need to align our file(s) to the reference genome:
Illumina:
novoalign -d hg18.nix -f reads1.fastq -o SAM 2> log.txt > novo.sam samtools view -uS novo.sam | novosort -m 8G --markduplicates -i -o reads1.bam -
SOLiD
novoalignCS -d hg18.ncx -f reads1.csfastq -o SAM 2> stats.txt > novocs.sam samtools view -uS novocs.sam | novosort --markduplicates -m 4G -i -o reads1.bam -
For paired-end or mate-pair reads adjust the parameters accordingly.
Novosort can also be used to merge multiple BAM files.
Mark PCR duplicates
It is highly recommended to mark or remove PCR duplicates before proceeding with the SNP and Indel calls. We use novosort as an example to mark PCR duplicates, it should take the unsorted BAM file as input.
#remove PCR duplicates novosort --markduplicates --keeptags reads1.bam -i -o reads1.sorted.bam
SNP and Microindel calling
In this example we use the samtools pileup SNP caller and accept reads with a minimum mapping quality of 20:
samtools mpileup -q 20 -d 1000 -A -ugf genome.fa dedup_alignments.bam | bcftools call -vmO z variants.vcf.gz tabix -p variants.vcf.gz
You may also use other variant callers such as GATK HaplotypeCaller or Freebayes. Novoalign’s SAM/BAM format is standardized and is compatible with all tools that adhere to the standard.
Suggested reading
Once you have made a sorted-deduped bam file then you may process it in various ways.
- HTSLIB Workflow
- EdgeBio blog commentary
- Illumina technical sequencing note lists Novoalign as a suggested third-party alignment program.