Loading...
 

Support Help

Forums > Support> Quality Calibration questions

Quality Calibration questions

Hey all,

1. According to urlhttp://novocraft.com/wiki/tiki-index.php?page=Quality+Calibration/url we can use novoutil IUPAC to generate a reference sequence with appropriate IUPAC codes, but I don't seen IUPAC as an option for novoutil. I'm hesitant to download the linked reference sequence from UCSC because I am aligning to a reference sequence generated elsewhere currently, and I'm not sure they would be the same. Is there anywhere I can find the novoutil IUPAC script? With the GATK base quality recalibration they state it is critical to have a table of known variants, so I'm a bit concerned about this.

2. Is there any data indicating how many reads need to be aligned for quality calibration to work effectlively? Due to the smoothing aspects employed here I'm guessing fewer reads are needed than with GATK, but they recommend at least 100 million bases aligned and preferably at least 1 billion bases aligned. I'm working on a project where I will have possibly less than 2 million reads per sample (2x101 bp PE reads HiSeq 2000) and I have no idea if 400 million aligned bases will be appropriate for the quality calibration here.

Thank you for your time.


Hi Heisman,

1. The IUPAC function was first released in V2.07.16 only last week and unfortunately it has a problem that is stopping it writing the new reference. I'm posting a new release later today with a fix. It also fixes the case where there are multiple SNPs for one locus. The first release only processed the first SNP at each locus. For quality calibration it helps to have the SNPs encoded as IUPAC codes or in GATK as a VCF file as without this, the SNPs are counted as mismatches and this lowers the overall quality of bases which can lower the quality of your SNP calls from mpileup of UG. The effect is probably most noticeable if your running low coverage and/or using mixed samples.

2. For Novoalign quality calibration we do some smoothing quality values and base position so it starts to have an impact after a few thousand reads especially on the more frequently called quality values. Also Novoalign doesn't use prior base as a covariate so it's one less factor contributing to aligned bases requirement. Wiith Novolaign calibration it's also possible to save the mismatch counts from one run and then rerun the alignments using calibration. This can eliminate startup effects that you get from just using -k on a single run.
Run 1 use options -k -K qcal.csv to save the stats and for 2nd run -k qcal.csv so all reads get the benefit of the calibration.

Kind Regards, Colin


Hey Colin,

Thank you very much. I have another question that I should have thought of earlier. Would it be possible for your development team to employ an option so that a user can set quality score threshold where any call under that quality score would be ignored in the quality calibration? Perhaps they would be included in the alignment process but not included when outputting the new qualities? I ask because it would be unrealistic I think to have bases with quality scores at say, 4, get move up to quality scores above 10 or something like that.


At the moment Q=2 isn't increased in quality unless you set --Q2Off. Initially we didn't do this and when we realised q=2 bases were getting upped to 10 or so we stopped it. This caused several complaints from users who had found that the Q=2 bases were actually quite good and the change had had a negative effect on their alignment yield and SNP calling.
Our experience suggests that there's no need for a quality threshold.


Oh, interesting, I did not realize that. Thanks for the response.

Show posts:
 
Show HelpHelp