1. According to urlhttp://novocraft.com/wiki/tiki-index.php?page=Quality+Calibration/url we can use novoutil IUPAC to generate a reference sequence with appropriate IUPAC codes, but I don't seen IUPAC as an option for novoutil. I'm hesitant to download the linked reference sequence from UCSC because I am aligning to a reference sequence generated elsewhere currently, and I'm not sure they would be the same. Is there anywhere I can find the novoutil IUPAC script? With the GATK base quality recalibration they state it is critical to have a table of known variants, so I'm a bit concerned about this.
2. Is there any data indicating how many reads need to be aligned for quality calibration to work effectlively? Due to the smoothing aspects employed here I'm guessing fewer reads are needed than with GATK, but they recommend at least 100 million bases aligned and preferably at least 1 billion bases aligned. I'm working on a project where I will have possibly less than 2 million reads per sample (2x101 bp PE reads HiSeq 2000) and I have no idea if 400 million aligned bases will be appropriate for the quality calibration here.
Thank you for your time.