phiX Lane 4 from same run as Bisulphite Treated Data with Quality Problem in Read1
Data
45bp paired end reads from phiX control lane 4, same run and slide as the Bisulphite example. For the following calibration tests we used the first 250,000 reads from the lane.
Novoalign V2.04.03
Novoalign Command:
- novoalign -h120 -t120 -c2 -u6 -d phix.nix -f ..first 250K reads from lane 4.. -k
Uncalibrated
|
Calibrated
|
||
# Paired Reads: 250000 # Pairs Aligned: 242090 # Read Sequences: 500000 # Aligned: 484861 # Unique Alignment: 480484 # Gapped Alignment: 1576 # Quality Filter: 203 # Homopolymer Filter: 40 # Elapsed Time: 81,221s |
# Paired Reads: 250000 # Pairs Aligned: 242306 # Read Sequences: 500000 # Aligned: 486258 # Unique Alignment: 481276 # Gapped Alignment: 2238 # Quality Filter: 70 # Homopolymer Filter: 33 # Elapsed Time: 73,049s |
Calibration had minimal effects on the number of alignments found, most noticeable was an increase in gapped alignments attributed to higher calibrated qualities making gaps a more likely call.
With a small genome and 45bp reads alignment was expected to be accurate.
Alignment Scores
The effects can also be seen in histograms of the number of alignments as a function of the alignment score:
Uncalibrated | Calibrated | ||
The main effect of calibration was to reduce alignment scores.
Quality Profile
The quality profiles for the two reads are shown below. Called quality is calculated from the expected number of mismatches given the base qualities over the total calls. Actual base quality is the total number of mismatches over the total calls with both scaled using formula -10log(Perr).
Actual qualities are consistently higher than the called qualities.
Q40 Calls
The following charts look at Q 40 base calls and the calibrated alignment penalties across the length of the reads.
Read 1 & Read 2 are very similar, there is no evidence of the quality problems seen in the Lane 3 Bisulphite data. This would indicate that this phiX lane is not suitable for calibrating Lane 3.
Quality along the Reads
Base
|
Call
|
Read 1
|
Read 2
|
0
|
A
|
||
0
|
C
|
||
0
|
G
|
||
0
|
T
|
||
20
|
A
|
||
20
|
C
|
||
20
|
G
|
||
20
|
T
|
||
40
|
A
|
||
40
|
C
|
||
40
|
G
|
||
40
|
T
|
With exception of Gs being miscalled as T’s in the first base of reads the charts show that “called” qualities are underestimates of the true quality as reflected by the mismatch rates.
It’s also clear that this phiX lane is not suitable for calibrating the Lane 3 Bisulphite reads.