March 9, 2016 at 3:15 pm #1842
I am working on bisulfite data, and would like to find out differentially methylated region using bsseq. I was wondering whether there is some tool I could use to extract the information form Novomethly output and get a input matrix of methlyation and coverage for(column is each sample, row is each position).
Basically, I need the number of methlyated reads and coverage for each position. But I am not sure how to calculate that. I think I would need to combine column 9,10,11,12,15.(MPCNT, NU, NC, NCT, NGA) information somehow.
Is the coverage equals to the sum of NCT and NGA? How to calculate the number of methlyated reads? Should I just use NU? MPCNT=NU/NC?
Could you give me some suggestion?
Thanks in advance,
GuangMarch 9, 2016 at 3:19 pm #1843
By the way, do I just need to use the rows whose CONTEXT end with m?March 9, 2016 at 4:10 pm #1844March 10, 2016 at 8:23 am #1845
First, we don’t have any tools to do this.
I’ll explain some of the fields.
CONTEXT will end in m or u to show methylation status. If there’s no m or u then there was insufficient data to determine a methylation status. This is based on a Posterior Probability of whether the base is fully methylated or not methylated using a 6 base calling model.
For differential methylation you could just look for differences in m/u state of the cytosines where MQUAL is over some limit.
Using NU & NC counts may be more accurate.
10. NU The number of unconverted cytosines.
11. NC The total number of cytosines. If a site is called as heterozygous C/T then the lesser of half the total C&T bases or the total T bases are assumed to result from the heterozygous T and are not included in the NC count.
If you were doing differential expression over a region you could just sum NU & NC for the region and compare these.
You could also look at MPCNT over a region and compare this.
Avoid any sites that are called as heterozygous.
Kind Regards, ColinMarch 11, 2016 at 9:55 pm #1846
Thanks so much for your explanation! It helps a lot.
I am extracting the methylated positions from the Cmethyl text file. When I count the number of methylated locations, it doesn’t match the summary in the log file. The number of my extracted entry is always larger than it (I check each category, CHHm, CpGm, CHGm; the difference is only several hundred). I was wondering whether I need to filter out the low coverage reads or low quality one to get the same statistic number.
Thanks a lot for your help!
GuangMarch 14, 2016 at 1:04 am #1849
Novomethyl filters out ones with an mquality <= 5 Kind Regards, Colin
You must be logged in to reply to this topic.