Streptococcus suis Quality Calibration

S Suis 36bp single end reads.

Data

One lane of 36bp S.Suis reads downloaded from Sanger Institute in *_prb.txt format. Converted to Sanger FASTQ format for calibration testing.

Novoalign

Novoalign Command:

: novoalign -h120 -t120 -c12 -u6 -d ssuis.nix -f ..all reads from lane 8.. -k

Uncalibrated	Calibrated
# Read Sequences: 2726374 # Aligned: 2366033 # Unique Alignment: 2302712 # Gapped Alignment: 17698 # Quality Filter: 83461 # Homopolymer Filter: 1667 # Elapsed Time: 270,826s	# Read Sequences: 2726374 # Aligned: 2374394 # Unique Alignment: 2310998 # Gapped Alignment: 17953 # Quality Filter: 75385 # Homopolymer Filter: 2007 # Elapsed Time: 280,615s

Uncalibrated

Calibrated

#     Read Sequences:  2726374
#            Aligned:  2366033
#   Unique Alignment:  2302712
#   Gapped Alignment:    17698
#     Quality Filter:    83461
# Homopolymer Filter:     1667
#       Elapsed Time: 270,826s

#     Read Sequences:  2726374
#            Aligned:  2374394
#   Unique Alignment:  2310998
#   Gapped Alignment:    17953
#     Quality Filter:    75385
# Homopolymer Filter:     2007
#       Elapsed Time: 280,615s

Calibration had minimal effects on the number of alignments found.

With a small genome and 36bp reads alignment was expected to be accurate.

Alignment Scores

The effects can also be seen in histograms (first 50,000 reads) of the number of alignments as a function of the alignment score:

The main effect of calibration was an increase in alignments with a score of zero. This is attributed to increase in calibrated qualities over “as called” qualities.

Quality Profile

The quality profiles for the two reads are shown below. Called quality is calculated from the expected number of mismatches given the base qualities over the total calls. Actual base quality is the total number of mismatches over the total calls with both scaled using formula -10log(Perr).

“AS Called” and calibrated qualities are very close.

Q40 Calls

The following charts look at Q 40 base calls and the calibrated alignment penalties across the length of the reads.

Base		Base
A		C
G		T

Q40 held their quality across the read.

Quality along the Reads

These charts show the calibrated penalty as function of uncalibrated or “As Called” quaility for several different positions in the read. The calibrated penalty is just -10log10(mismatches/total calls) adjusted by a prior when mismatch count is low. When penalty is below the uncalibrated value it means there were more mismatches than expected for the quality. When above the line it indicates less mismatches than expected and that the “as called” quality is underestimated.

Base	Call	Call
0	A	C
	G	T

20	A	C
	G	T

40	A	C
	G	T

The difference between calibrated qualities and the “as called” qualities goes above 10 in quite a few instances, and as we use a 10log₁₀ scale, a difference of 10 in alignment score is a factor of 10 difference in the probability of the alignment P(Read|Ai)

Documentation

Streptococcus suis Quality Calibration

S Suis 36bp single end reads.

Data

Novoalign

Alignment Scores

Quality Profile

Q40 Calls

Quality along the Reads

LATEST NEWS

Novocraft and Basepair Inc. Announce Strategic Partnership to Deliver Advanced Genomic Pipelines in the Cloud

Novoalign V4.03.01

Novoalign V4.03.00 and Novosort V3.00.00

Contact Us