Loading...
 

Support Help

Forums > Support> Novomethyl output Interpretation

Novomethyl output Interpretation

To whom it may concern,

This is the first time I Have used novoalign and novomethyl for 454 bisulfite sequencing data.

This may be a stupid question but I was wondering what the best way is of interpretting the novomethyl output. I used the consensus reporting method. I just really want to know whether CpGs are methylated in the amplicon or not.As this is all a very new area for me, is it also possible to get the meaning of the Context column. What does CHH mean? If i don't have U or M at the end does that just mean the quality is poor and it can't determine the status of this particular call.

What columns should I be using to unterpret this data? Your help is much appreciated.


Here is an example of the report format:

R NAME BASE POSITION REFERENCE SEQUENCE CONSENSUS BASE QUALITY OF CALL CONTEXT WATSON-CRICK STRAND QUALITY OF METHYLATION STATUS % CYTOSINES NOT CONVERTED TO t NUMBER OF UNCONVERTED CYTOSINES TOTAL # CYTOSINES NUMBER OF READS ALIGNING ON WATSON STRAND CT BASES CT QUALITIES
chr7 27204732 C A/C 150 CpGu + 150 11 3 28 37 TTTTATCTTATCCTTAATTTTTATATTTTTTATTATA ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0
chr8 70983214 C C/G 78 CHHu + 150 7 1 14 16 TTCTTTTTTTTGTTGT ~~~~~~~~~~~~~~~~ 0
chr8 70982104 C C/G 78 CpGu + 150 12 1 8 11 TTTTGGGCTTT ~~~~~~~~~~~ 0
chr8 70983591 T C/G 67 CHHu + 150 3 1 37 39 TTTTTTTTTTTCGTTTTTGTTTTTTTTTTTTTTTTTTTT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0
chr7 27205812 C C 61 CpGu + 69 15 9 59 59 TTTTTTTTTCCCCTTTCCTTTTTTCTTTTTCTTCTTTTTTTTTTTTTTTTTTTTTTTTT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0
chr7 27205784 C C 59 CpGu + 94 13 8 62 62 TTTTTTTTTCCCTTTTTCTTTTTTTTTTTTTTTTTTTTTCTTTTTTT+1CTTCCTTTCTTTTTTT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0
chr7 27205816 C C 59 CpGu + 85 14 8 59 59 TTCTTTTTTCCCTTTTTCTTTTTTTTTTTTCTTCTTTTTTTTTTTTTTTTTTCTTTTTT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0
chr7 27206025 C C 59 CpGu + 79 14 8 57 57 TCTTTTTTTTTTTTTTCCTCTTTTTTTTTTTTTTTTCTTTTTTTTTTTTTTTCCTTC ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0
chr8 70983520 C C 58 CpGu + 67 15 8 53 53 TTTTTTCTTTTTTTTTTTTTTTTTTTTTTTCTTTTTTTTTTCTCCTCTCCTTT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0
chr8 70983532 C C 58 CpGu + 67 15 8 53 53 TTTTTTTTCTCTTTTTCTTTTTTTTTTTTTTTTTCTTCTTTTCCCTTTTTTTT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0
chr7 27206044 C C 57 CpGu + 60 16 9 56 56 TTTTTTTTTTTTCCTTTTTTTTTTTTTTTTCTTTTTCTTTTTTCTCTTCTCCTTTT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0
chr7 27205810 C C 56 CpGu + 104 12 7 60 60 TTTTTTTTTCCCTTTTTCTTTTTTT$TTTTTTTCTTTTTTTTTTCTTTTTTTTTCTTTTTT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0
chr7 27206033 C C 56 CpGu + 95 12 7 57 57 TTCTCTTTTTTTTTTTTTTTTTTTTTTTTTTCTTTTCTTTTTTTTTCTTTTTCTTTC ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0
chr7 27204400 C C 56 CpGu + 68 15 7 48 48 TT+1TTT+1TCTT+1CTTTTTTTCTTTTTT$T+1TCTT+1CTTTTTTTTTTCTCTTTCTTTCTT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0


Hi,

First, in your example I can't see any data for the GA strand alignments. After Bi-seq alignment we split the bam file into two parts and then use samtools mpileup as input to novomethyl as explained in our wiki on Novomethyl(external link). Note that the samtools mpileup will merge the CT & GA alignments back together if the sam files have an @RG record with an SM tag. You need to make sure there is no SM tag.

Back to your question, the context column and methylation status quality may be all you need. The context column shows if the base is methylated or not. You'll get a CpGu, CHHU, CHGu for unmethylated and CpGm, CHGm, CHHm for methylated.

Other columns are:
1. RNAME Reference sequence name from pileup
2. POS 1-based Coordinate
3. REF Reference Sequence Base. May include IUB ambiguous codes if present in the mpileup.
4. CONSENSUS Consensus base, may show heterozygous base such as A/T (Novomethyl can detect SNPs)
5. QUAL Quality of the consensus call
6. CONTEXT If it is a Cytosine we show the context as CpG, CHG or CHH together with methylation status indicator 'u' or 'm' if methylation quality is gretaer than 3.
7. STRAND Watson-Crick location of the cytosine is shown as +/-
8. MQUAL The quality of the methylation status given that we have a cytosine at this locus
9. MPCNT The % of cytosines that were not converted to T's.
10. NU The number of unconverted cytosines.
11. NC The total number of cytosines. If a site is called as heterozygous C/T then the lesser of half the total C&T bases or the total T bases are assumed to result from the heterozygous T and are not included in the NC count.
12. NCT The number of reads aligning on the Watson strand. The following 6 fields are directly from samtools mpileup file and we refer you to samtools documentation for further information.
13. CTBASES The bases and read mapping qualities from Watson strand.
14. CTQUALITIES The base qualities for Watson strand.
15. NGA Next 3 fields are as above for Crick strand.
16. GABASES
17. GAQUALITIES

First I'd make sure the CT/GA split is occurring and that you have reads aligned to both strands.

Then to pick up methylated cytosine's you can look for entries with a context of CpGm, CHGm & CHHm and possibly further filter by the MQUAL column, perhaps only counting entries with a quality, say, > 20.

Some users create wiggle tracks where they look at % of CpG sites methylated over a 1Kbp region. To do this you'd need a script to do the calculations.

Kind Regards, Colin


> To whom it may concern,
>
> This is the first time I Have used novoalign and novomethyl for 454 bisulfite sequencing data.
>
> This may be a stupid question but I was wondering what the best way is of interpretting the novomethyl output. I used the consensus reporting method. I just really want to know whether CpGs are methylated in the amplicon or not.As this is all a very new area for me, is it also possible to get the meaning of the Context column. What does CHH mean? If i don't have U or M at the end does that just mean the quality is poor and it can't determine the status of this particular call.
>
> What columns should I be using to unterpret this data? Your help is much appreciated.
>
>
> Here is an example of the report format:
>
> R NAME BASE POSITION REFERENCE SEQUENCE CONSENSUS BASE QUALITY OF CALL CONTEXT WATSON-CRICK STRAND QUALITY OF METHYLATION STATUS % CYTOSINES NOT CONVERTED TO t NUMBER OF UNCONVERTED CYTOSINES TOTAL # CYTOSINES NUMBER OF READS ALIGNING ON WATSON STRAND CT BASES CT QUALITIES
> chr7 27204732 C A/C 150 CpGu + 150 11 3 28 37 TTTTATCTTATCCTTAATTTTTATATTTTTTATTATA ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0
> chr8 70983214 C C/G 78 CHHu + 150 7 1 14 16 TTCTTTTTTTTGTTGT ~~~~~~~~~~~~~~~~ 0
> chr8 70982104 C C/G 78 CpGu + 150 12 1 8 11 TTTTGGGCTTT ~~~~~~~~~~~ 0
> chr8 70983591 T C/G 67 CHHu + 150 3 1 37 39 TTTTTTTTTTTCGTTTTTGTTTTTTTTTTTTTTTTTTTT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0
> chr7 27205812 C C 61 CpGu + 69 15 9 59 59 TTTTTTTTTCCCCTTTCCTTTTTTCTTTTTCTTCTTTTTTTTTTTTTTTTTTTTTTTTT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0
> chr7 27205784 C C 59 CpGu + 94 13 8 62 62 TTTTTTTTTCCCTTTTTCTTTTTTTTTTTTTTTTTTTTTCTTTTTTT+1CTTCCTTTCTTTTTTT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0
> chr7 27205816 C C 59 CpGu + 85 14 8 59 59 TTCTTTTTTCCCTTTTTCTTTTTTTTTTTTCTTCTTTTTTTTTTTTTTTTTTCTTTTTT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0
> chr7 27206025 C C 59 CpGu + 79 14 8 57 57 TCTTTTTTTTTTTTTTCCTCTTTTTTTTTTTTTTTTCTTTTTTTTTTTTTTTCCTTC ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0
> chr8 70983520 C C 58 CpGu + 67 15 8 53 53 TTTTTTCTTTTTTTTTTTTTTTTTTTTTTTCTTTTTTTTTTCTCCTCTCCTTT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0
> chr8 70983532 C C 58 CpGu + 67 15 8 53 53 TTTTTTTTCTCTTTTTCTTTTTTTTTTTTTTTTTCTTCTTTTCCCTTTTTTTT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0
> chr7 27206044 C C 57 CpGu + 60 16 9 56 56 TTTTTTTTTTTTCCTTTTTTTTTTTTTTTTCTTTTTCTTTTTTCTCTTCTCCTTTT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0
> chr7 27205810 C C 56 CpGu + 104 12 7 60 60 TTTTTTTTTCCCTTTTTCTTTTTTT$TTTTTTTCTTTTTTTTTTCTTTTTTTTTCTTTTTT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0
> chr7 27206033 C C 56 CpGu + 95 12 7 57 57 TTCTCTTTTTTTTTTTTTTTTTTTTTTTTTTCTTTTCTTTTTTTTTCTTTTTCTTTC ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0
> chr7 27204400 C C 56 CpGu + 68 15 7 48 48 TT+1TTT+1TCTT+1CTTTTTTTCTTTTTT$T+1TCTT+1CTTTTTTTTTTCTCTTTCTTTCTT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0


Hi slp,

Could you please post your question as a new topic rather than adding to an earlier question.

Thanks, Colin


Show posts:
 
Show HelpHelp