MotifHyades: expectation maximization for de novo DNA motif pair discovery on paired sequences
KC Wong - Bioinformatics, 2017 - academic.oup.com
Bioinformatics, 2017•academic.oup.com
Motivation In higher eukaryotes, protein–DNA binding interactions are the central activities
in gene regulation. In particular, DNA motifs such as transcription factor binding sites are the
key components in gene transcription. Harnessing the recently available chromatin
interaction data, computational methods are desired for identifying the coupling DNA motif
pairs enriched on long-range chromatin-interacting sequence pairs (eg promoter–enhancer
pairs) systematically. Results To fill the void, a novel probabilistic model (namely …
in gene regulation. In particular, DNA motifs such as transcription factor binding sites are the
key components in gene transcription. Harnessing the recently available chromatin
interaction data, computational methods are desired for identifying the coupling DNA motif
pairs enriched on long-range chromatin-interacting sequence pairs (eg promoter–enhancer
pairs) systematically. Results To fill the void, a novel probabilistic model (namely …
Motivation
In higher eukaryotes, protein–DNA binding interactions are the central activities in gene regulation. In particular, DNA motifs such as transcription factor binding sites are the key components in gene transcription. Harnessing the recently available chromatin interaction data, computational methods are desired for identifying the coupling DNA motif pairs enriched on long-range chromatin-interacting sequence pairs (e.g. promoter–enhancer pairs) systematically.
Results
To fill the void, a novel probabilistic model (namely, MotifHyades) is proposed and developed for de novo DNA motif pair discovery on paired sequences. In particular, two expectation maximization algorithms are derived for efficient model training with linear computational complexity. Under diverse scenarios, MotifHyades is demonstrated faster and more accurate than the existing ad hoc computational pipeline. In addition, MotifHyades is applied to discover thousands of DNA motif pairs with higher gold standard motif matching ratio, higher DNase accessibility and higher evolutionary conservation than the previous ones in the human K562 cell line. Lastly, it has been run on five other human cell lines (i.e. GM12878, HeLa-S3, HUVEC, IMR90, and NHEK), revealing another thousands of novel DNA motif pairs which are characterized across a broad spectrum of genomic features on long-range promoter–enhancer pairs.
Availability and implementation
The matrix-algebra-optimized versions of MotifHyades and the discovered DNA motif pairs can be found in http://bioinfo.cs.cityu.edu.hk/MotifHyades.
Supplementary information
Supplementary data are available at Bioinformatics online.
Oxford University Press
Showing the best result for this search. See all results