DNA CpG Methylation Motifs (CpGMMs)

Gene expression regulation is gated by promoter methylation states modulating transcription factor regulation. The known DNA methylation/unmethylation mechanisms are sequence unspecific, but different cells with the same genome have different methylomes. Thus, additional processes bringing specificity to the methylation/unmethylation mechanisms are required. Searching for such processes, we demonstrated that CpGs methylation states are influenced by the sequence context surrounding the CpGs. We used such property to develop a CpG methylation motif discovery algorithm. The newly discovered motifs reveal "methylation/unmethylation factors" that could recruit the "methylation/unmethylation machinery" to the loci specified by the motifs. Our methylation motif discovery algorithm provides a synergistic approach to the differently methylated region algorithms. Since our algorithm searches for commonly methylated regions inside the same sample, it requires only a single sample to operate. The motifs found discriminate between hypomethylated and hypermethylated regions. The hypomethylation-associated motifs have high CG content, their targets appear in conserved regions near transcription start sites, tend to co-occur within transcription factor binding sites, are involved in breaking the H3K4me3/H3K27me3 bivalent balance, and transit the enhancers from repressive H3K27me3 to active H3K27ac during ES cell differentiation. The new methylation motifs characterize the pluripotent state shared between ES and iPS cells. Additionally, we found a collection of motifs associated with the somatic memory inherited by the iPS from the initial fibroblast cells, thus revealing the existence of epigenetic somatic memory on a fine methylation scale.

We term the DNA methylation motifs centered in each CpG as CpGMM. This page helps to navigate through the differerent DNA methylation motifs found by our algorithm in the context of human pluripotent cells.

All found CpGMMs

Here we show the logos of all the motifs found for each cell line. Each CpGMM appears (in a table per cell line) with its motif ID, its motif logo and with the scanning distributions of the CpGMM binding energies, where the distributions are split into two histograms based on the methylation ratio of the CpG target:

Each CpGMM is qualified with two quality scores:

And in the last column of the tables, each CpGMM is annotated with the list of gene targets of the motif and with the statistically significant gene ontologies (GO) associated to such targets. In the GO ontology annotation [MF] stands for Molecular Function, [BP] for Biological Process, and [CC] for Cellular Component.

Fibroblast CpGMMs

iPS cells CpGMMs

ES cells CpGMMs

ADS are adipose-derived stem cells.

Somatic memory specific CpGMMs

The following lists provide the associations found by our algorithm the Somatic memory specific CpGMMs in the positive (+) and negative (-) strands (S).

Co-occurrence of CpGMM targets and TFBSs

We hypothesized that the CpGMMs can be used by some DNA sequence specific binding proteins to recruit the “methylation/unmethylation machinery” to specific loci. The best potential recruiting candidates are the transcription factors. Thus, we search for transcription factors that share binding specificity with the CpGMMs. Such search is not straightforward since the CpGMMs have a central CpG anchor but the transcription factor binding motifs (TFBMs) do not necessarily have it. Even if we focus on TFBMs with a strong CpG signal, it is not always clear to what extent such a signal is in the middle of the TFBM. If the target loci of a CpGMM co-occur with a TFBS, we consider that the CpG methytlation motif and the TFBMs are associated. Then, to compare CpGMMs and TFBMs, we designed a technique based on detecting co-occurrences between the targets of both types of motifs. The following lists provide the associations found by our algorithm between transcription factors and CpGMMs binding in the positive (+) and negative (-) strands (S).

Pluripotent specific CpGMMs

Methylation-prone CpGMM specific of pluripotent cells
Methylation-prone CpGMM specific of pluripotent cells.

Methylation-resistant CpGMM specific of pluripotent cells
Methylation-resistant CpGMM specific of pluripotent cells.