The phiX Lane

phiX Lane 4 from same run as Bisulphite Treated Data with Quality Problem in Read1

Data

45bp paired end reads from phiX control lane 4, same run and slide as the Bisulphite example. For the following calibration tests we used the first 250,000 reads from the lane.

Novoalign V2.04.03

Novoalign Command:

: novoalign -h120 -t120 -c2 -u6 -d phix.nix -f ..first 250K reads from lane 4.. -k

Uncalibrated	Calibrated
# Paired Reads: 250000 # Pairs Aligned: 242090 # Read Sequences: 500000 # Aligned: 484861 # Unique Alignment: 480484 # Gapped Alignment: 1576 # Quality Filter: 203 # Homopolymer Filter: 40 # Elapsed Time: 81,221s	# Paired Reads: 250000 # Pairs Aligned: 242306 # Read Sequences: 500000 # Aligned: 486258 # Unique Alignment: 481276 # Gapped Alignment: 2238 # Quality Filter: 70 # Homopolymer Filter: 33 # Elapsed Time: 73,049s

Uncalibrated

Calibrated

#       Paired Reads:   250000
#      Pairs Aligned:   242090
#     Read Sequences:   500000
#            Aligned:   484861
#   Unique Alignment:   480484
#   Gapped Alignment:     1576
#     Quality Filter:      203
# Homopolymer Filter:       40
#       Elapsed Time: 81,221s

#       Paired Reads:   250000
#      Pairs Aligned:   242306
#     Read Sequences:   500000
#            Aligned:   486258
#   Unique Alignment:   481276
#   Gapped Alignment:     2238
#     Quality Filter:       70
# Homopolymer Filter:       33
#       Elapsed Time: 73,049s

Calibration had minimal effects on the number of alignments found, most noticeable was an increase in gapped alignments attributed to higher calibrated qualities making gaps a more likely call.

With a small genome and 45bp reads alignment was expected to be accurate.

Alignment Scores

The effects can also be seen in histograms of the number of alignments as a function of the alignment score:

Uncalibrated	Calibrated

The main effect of calibration was to reduce alignment scores.

Quality Profile

The quality profiles for the two reads are shown below. Called quality is calculated from the expected number of mismatches given the base qualities over the total calls. Actual base quality is the total number of mismatches over the total calls with both scaled using formula -10log(Perr).

Actual qualities are consistently higher than the called qualities.

Q40 Calls

The following charts look at Q 40 base calls and the calibrated alignment penalties across the length of the reads.

Base Called	Read 1	Read 2
A
C
	The drop in quality of “C” calls towards the 3′ ends of the reads was result of a drop in calls with a base quality of 40 while the number of mismatches stayed steady.
G
T

Read 1 & Read 2 are very similar, there is no evidence of the quality problems seen in the Lane 3 Bisulphite data. This would indicate that this phiX lane is not suitable for calibrating Lane 3.

Quality along the Reads

Base	Call	Read 1	Read 2
0	A
0	C
0	G
0	T

20	A
20	C
20	G
20	T

40	A
40	C
40	G
40	T

With exception of Gs being miscalled as T’s in the first base of reads the charts show that “called” qualities are underestimates of the true quality as reflected by the mismatch rates.

Note. Calculation of calibrated penalties uses a prior probability that will keep calibrated qualities close to called qualities when we don’t have enough samples. This is the cause of some of the dips around Q25 in the above charts.

It’s also clear that this phiX lane is not suitable for calibrating the Lane 3 Bisulphite reads.

Documentation

The phiX Lane

phiX Lane 4 from same run as Bisulphite Treated Data with Quality Problem in Read1

Data

Novoalign V2.04.03

Alignment Scores

Quality Profile

Q40 Calls

Quality along the Reads

LATEST NEWS

Novocraft and Basepair Inc. Announce Strategic Partnership to Deliver Advanced Genomic Pipelines in the Cloud

Novoalign V4.03.01

Novoalign V4.03.00 and Novosort V3.00.00

Contact Us