S Suis 36bp single end reads.
Data
One lane of 36bp S.Suis reads downloaded from Sanger Institute in *_prb.txt format. Converted to Sanger FASTQ format for calibration testing.
Novoalign
Novoalign Command:
- novoalign -h120 -t120 -c12 -u6 -d ssuis.nix -f ..all reads from lane 8.. -k
Uncalibrated
|
Calibrated
|
|||
# Read Sequences: 2726374 # Aligned: 2366033 # Unique Alignment: 2302712 # Gapped Alignment: 17698 # Quality Filter: 83461 # Homopolymer Filter: 1667 # Elapsed Time: 270,826s |
# Read Sequences: 2726374 # Aligned: 2374394 # Unique Alignment: 2310998 # Gapped Alignment: 17953 # Quality Filter: 75385 # Homopolymer Filter: 2007 # Elapsed Time: 280,615s |
Calibration had minimal effects on the number of alignments found.
With a small genome and 36bp reads alignment was expected to be accurate.
Alignment Scores
The effects can also be seen in histograms (first 50,000 reads) of the number of alignments as a function of the alignment score:
The main effect of calibration was an increase in alignments with a score of zero. This is attributed to increase in calibrated qualities over “as called” qualities.
Quality Profile
The quality profiles for the two reads are shown below. Called quality is calculated from the expected number of mismatches given the base qualities over the total calls. Actual base quality is the total number of mismatches over the total calls with both scaled using formula -10log(Perr).
“AS Called” and calibrated qualities are very close.
Q40 Calls
The following charts look at Q 40 base calls and the calibrated alignment penalties across the length of the reads.
Base
|
Base
|
|||
A
|
C
|
|||
G
|
T
|
Q40 held their quality across the read.
Quality along the Reads
These charts show the calibrated penalty as function of uncalibrated or “As Called” quaility for several different positions in the read. The calibrated penalty is just -10log10(mismatches/total calls) adjusted by a prior when mismatch count is low. When penalty is below the uncalibrated value it means there were more mismatches than expected for the quality. When above the line it indicates less mismatches than expected and that the “as called” quality is underestimated.
Base
|
Call
|
Call
|
||
0
|
A
|
C
|
||
G
|
T
|
|||
20
|
A
|
C
|
||
G
|
T
|
|||
40
|
A
|
C
|
||
G
|
T
|
The difference between calibrated qualities and the “as called” qualities goes above 10 in quite a few instances, and as we use a 10log10 scale, a difference of 10 in alignment score is a factor of 10 difference in the probability of the alignment P(Read|Ai)