Topic: Extract information from NovoMethyl Cmethyl output

Tagged: Novomethyl

This topic contains 5 replies, has 2 voices, and was last updated by Colin Hercus 10 years, 2 months ago.

Viewing 6 posts - 1 through 6 (of 6 total)

Author

Posts
March 9, 2016 at 3:15 pm #1842

gsmzxc
Participant

Hi,

I am working on bisulfite data, and would like to find out differentially methylated region using bsseq. I was wondering whether there is some tool I could use to extract the information form Novomethly output and get a input matrix of methlyation and coverage for(column is each sample, row is each position).

Basically, I need the number of methlyated reads and coverage for each position. But I am not sure how to calculate that. I think I would need to combine column 9,10,11,12,15.(MPCNT, NU, NC, NCT, NGA) information somehow.

Is the coverage equals to the sum of NCT and NGA? How to calculate the number of methlyated reads? Should I just use NU? MPCNT=NU/NC?

Could you give me some suggestion?

Thanks in advance,
Guang

March 9, 2016 at 3:19 pm #1843

gsmzxc
Participant

By the way, do I just need to use the rows whose CONTEXT end with m?

March 9, 2016 at 4:10 pm #1844

gsmzxc
Participant

March 10, 2016 at 8:23 am #1845

Colin Hercus
Keymaster

Hi Guang,

First, we don’t have any tools to do this.

I’ll explain some of the fields.

CONTEXT will end in m or u to show methylation status. If there’s no m or u then there was insufficient data to determine a methylation status. This is based on a Posterior Probability of whether the base is fully methylated or not methylated using a 6 base calling model.

For differential methylation you could just look for differences in m/u state of the cytosines where MQUAL is over some limit.

Using NU & NC counts may be more accurate.
10. NU The number of unconverted cytosines.
11. NC The total number of cytosines. If a site is called as heterozygous C/T then the lesser of half the total C&T bases or the total T bases are assumed to result from the heterozygous T and are not included in the NC count.

If you were doing differential expression over a region you could just sum NU & NC for the region and compare these.

You could also look at MPCNT over a region and compare this.

Avoid any sites that are called as heterozygous.

Kind Regards, Colin

March 11, 2016 at 9:55 pm #1846

gsmzxc
Participant

Hi Colin,

Thanks so much for your explanation! It helps a lot.

I am extracting the methylated positions from the Cmethyl text file. When I count the number of methylated locations, it doesn’t match the summary in the log file. The number of my extracted entry is always larger than it (I check each category, CHHm, CpGm, CHGm; the difference is only several hundred). I was wondering whether I need to filter out the low coverage reads or low quality one to get the same statistic number.

Thanks a lot for your help!
Guang

March 14, 2016 at 1:04 am #1849

Colin Hercus
Keymaster

Hi Guang,

Novomethyl filters out ones with an mquality <= 5 Kind Regards, Colin
Author

Posts

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic.

Extract information from NovoMethyl Cmethyl output

Search Forums

User Login

Forums

LATEST NEWS

Novocraft and Basepair Inc. Announce Strategic Partnership to Deliver Advanced Genomic Pipelines in the Cloud

Novoalign V4.03.01

Novoalign V4.03.00 and Novosort V3.00.00

Contact Us