Note. The PBAT protocol avoids the problems discussed here by adding sequencing adapters after bisulphite treatment. Amplification-free whole-genome bisulfite sequencing by post-bisulfite adaptor tagging, Fumihito Miura et.al. |
How you do your bisulphite treatment is going to be critical to the success of your project. Now we are not experts in this but we’ve seen a few Bi-seq datasets that showed some major problems so it’s best we share what we know.
The basic problem is that Bisulphite treatment can cause strand breaks during depurination. As ligation of Illumina adapters is done before denaturation and bisulphite treatment, the strand breaks result in fragments with only one (or no) Illumina adapter. This will result in complete failure to PCR. The effect is that any broken fragments will not get sequenced.
Breaks are more likely to occur in sequence with more cytosines as the breaks happen at cytosines when the are depurinated. In severe cases you can end up with very uneven cover of the genome and unbalanced alignments between the Watson & Crick strand making it almost impossible to call methylation status. Below is an example where read cover is heavily biased to Watson or Crick strand depending on the Cytosine content.
We can also look at this as a histogram of read density, how many reads cover every base. This histogram is from a subset of reads from Zemach et al. – Genome-wide evolutionary analysis of eukaryotic DNA methylation. Quite a large proportion of the genome has zero read cover on either the CT or the GA strand.
If we simulate reads from the same genome then we see a near binomial distribution of read depth as we might expect. The Q30 lines on the histogram took reads with alignment quality > 30 and illustrates effect of repeats and reduced sequence complexity from bisulphite treatment.
As mentioned earlier we suspect the reason for the uneven cover is the bisulphite treatment step. As described in Wikipedia, longer bisulphite treatment results in more strand breaks. The depurination step results in the strand breaks. Therefore, the longer the treatment and the more cytosine’s, the more likely you will get breaks. Bisulphite treatment is done after denaturation of the DNA therefore the two strands have been separated from each other and breakage in one strand does not affect breakage of the complimentary strand.
Since the library already contains adapters at the time of bisulfite treatment, any strand breakage will result in the inability to amplify that library member. Strong bisulfite treatment in an attempt to get full conversion is the likely culprit for the gaps in cover.
From Wikipedia……
Degradation of DNA during bisulfite treatment
A major challenge in bisulfite sequencing is the degradation of DNA that takes place concurrently with the conversion. The conditions necessary for complete conversion, such as long incubation times, elevated temperature, and high bisulfite concentration, can lead to the degradation of about 90% of the incubated DNA.[22] Given that the starting amount of DNA is often limited, such extensive degradation can be problematic. The degradation occurs as depurinations resulting in random strand breaks.[23] Therefore the longer the desired PCR amplicon, the more limited the number of intact template molecules will likely be. This could lead to the failure of the PCR amplification, or the loss of quantitatively accurate information on methylation levels resulting from the limited sampling of template molecules. Thus, it is important to assess the amount of DNA degradation resulting from the reaction conditions employed, and consider how this will affect the desired amplicon. Techniques can also be used to minimize DNA degradation, such as cycling the incubation temperature.[23]