We have done some work with simulated data to show some appropriate parameters for using novoalign with Ion Torrent reads. The results are shown in light of comparison to other aligners. This is by no means a detailed aligner comparison but merely a refresher on what can be accomplished with using novoalign with Ion Torrent data.
The major modifications to novoalign parameters for Ion Torrent reads affect the following:
- Gap open/extension penalties (“-g/-x”)
- Hard clipping (“-H”)
- Insert size distribution and orientation for paired-ends (“-i”)
- Read truncation (“-n”)
The dwgsim program version 0.1.10 was used to simulate 200,000 180bp paired-end reads (i.e. 100K pairs) using the Drosophila Melanogaster genome sequence. The full parameters for dwgsim are shown below:
dwgsim -e 0.02 -d 250 -s 30 -N 100000 -c 2 -1 180 -2 180 -f TACGTACGTCTGAGCATCGATCGATGTACAGC drosM.fa 100Kreads
Appropriate insert size median and standard deviations for the reads were supplied to all aligners.
Bowtie2 was run with the –very-sensitive option to achieve the most senstive result from this aligner. TMAP was run according to specifications published in the Ion Torrent manual.
To evaluate the effect of real quality values we extracted these from a 18-399 PGM chip and replaced the simulated quality values with these using a perl program.
Each aligner produced SAM output format. Counts of correct and incorrectly mapped reads were calculated using the dwgsim_eval program. All novoalign alignments were run with “-r Random” to report a random alignment for any read that mapped multiple times.
Figure 1. Receiver operating characteristic (ROC) curve for 200,000 180bp Ion Torrent PGM reads simulated against the Drosophila melanogaster genome. ROC curves have been previously shown to be quite useful in comparing aligner sensitivity and specificity. The three ROC plots for Novoalign in figure1 clearly show the effect of altered gap open/extension penalties on the number of correctly and incorrectly mapped read pairs. Furthermore the ROC curves show a higher percentage of correctly placed alignments than TMAP and Bowtie2. In terms of cumulative number of incorrectly mapped alignments Bowtie2 reported the highest number (29.9%), followed by TMAP (22.9%), Novoalign-x4g15 (14.5%), Novoalign-x6g15 (14,1%) and Novoalign-x6g20 (13.2%).
Figure 2. Receiver operating characteristic (ROC) curve for 200,000 180bp Ion Torrent PGM reads simulated against the Drosophila melanogaster genome with real Ion Torrent 318-99 quality values. We also replaced the simulated quality values with real QVs from the 318-99 Ion Torrent chip. With the use of real qualities we observed the same trend of aligner performance in terms of correct/incorrectly mapped reads.
Bowtie2 exceeded our expectations with Ion Torrent data specifically because this tool is so popular for other sequencing platforms. Bowtie was overall the fastest aligner given the modest computing resources available to us. Alignment run times are not reported as our focus was to compare alignment accuracy.
All alignment evaluation output files from dwgsim_eval are provided in the downloads section.
The results show that on a simulated set of Ion Torrent reads simulated against the drosophila genome, lowering novoalign’s gap open and extension penalties produce a higher number of correctly placed read alignments than TMAP and Bowtie2 with the most sensitive settings selected.
|Note.Version 2 of Novoalign is limited to reads less than 250bp long and requires -n option to align reads over 150bp. Novoalign version 3.00 raises this limit to 950bp an dgreatly improves sensitivity for reads over 100bp long.|
We would like to extend this study to a full 30x coverage of the D. melanogaster genome.
- TMAP. Nils Homer. Unpublished. TMAP technical documentation (requires free Ion Community login credentials).
- Langmead B and Salzberg S. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012, 9:357-359.
- Dwgsim. Nils Homer. Unpublished.