Skip to content
@ML4GLand

Machine Learning for Genomics

A collection of tools for investigating how DNA encodes function with machine learning

Welcome to the land of machine learning for genomics!

ML4GLand is a community for that develops and maintains tools (primarily in Python) for genomics sequence based machine learning.

Why?

Deep learning has become a popular tool for investigating gene regulation, including DNA and RNA protein binding specificity, chromatin state and architecture, and transcriptional activity. However, executing a typical workflow for building and interpreting deep learning models remains a challenge. Training nuances specific to genomics data along with complex preprocessing and interpretation methods create an especially high learning curve, and heterogeneity in implementations of most code associated with publications hinders reproducibility and extensibility. A tool for exposing existing data, models and methods to computational scientists, that can also serve as a platform for development, will greatly improve our ability to use sequence-based machine learning to interrogate gene regulatory mechanisms.

We aim to build a framework for developing sequence-to-function deep learning models

Previous work has shown the utility of such frameworks. DeepChem and scverse are excellent examples. Our mission is to put together a similar ecosystem for sequence based genomics.

Core packages

  • SeqPro -- a Python package for processing DNA/RNA sequences for machine learning.
  • SeqData -- a Python package for preparing machine learning-ready genomic sequence datasets.
  • SeqExplainer -- a Python package for interpreting sequence-to-function machine learning models.
  • EUGENe -- a Python package for streamlining and customizing end-to-end deep-learning sequence analyses in regulatory genomics.

Ecosystem packages

  • SeqDatasets -- a repository for downloading datasets and loading them with SeqData.
  • MotifData -- a Python package for handling motifs.

Usage repositories

  • tutorials - a repository of tutorials for ML4GLand tools.
  • use cases -- a repositoy of use cases that showcase ML4GLand tools potential ecosystem packages.

Pinned Loading

  1. EUGENe Public

    Elucidating the Utility of Genomic Elements with Neural Nets

    Jupyter Notebook 66 5

  2. SeqExplainer Public

    Interpreting sequence-to-function machine learning models

    Jupyter Notebook 4 1

  3. SeqPro Public

    Genomic sequence preprocessing toolkit

    Python 11 1

  4. tutorials Public

    A set of tutorials for how to use all the tools in ML4GLand

    Jupyter Notebook 3

  5. SeqData Public

    Annotated sequence data

    Jupyter Notebook 11 1

  6. use_cases Public

    Repository documenting applications of the ML4GLand suite on published datasets

    Jupyter Notebook 1

Repositories

Showing 10 of 13 repositories
  • chrombpnet Public

    Templates for running official ChromBPNet repository pipelines using SLURM scheduler

    Jupyter Notebook 0 0 0 0 Updated Mar 4, 2025
  • tutorials Public

    A set of tutorials for how to use all the tools in ML4GLand

    Jupyter Notebook 3 0 1 0 Updated Feb 6, 2025
  • SeqData Public

    Annotated sequence data

    Jupyter Notebook 11 MIT 1 5 2 Updated Feb 2, 2025
  • SeqPro Public

    Genomic sequence preprocessing toolkit

    Python 11 MIT 1 1 0 Updated Jan 9, 2025
  • EUGENe Public

    Elucidating the Utility of Genomic Elements with Neural Nets

    Jupyter Notebook 66 MIT 5 8 1 Updated Dec 2, 2024
  • SeqDatasets Public

    Datasets for benchmarking, testing and developing in EUGENe

    Python 1 MIT 0 0 0 Updated Dec 2, 2024
  • SeqExplainer Public

    Interpreting sequence-to-function machine learning models

    Jupyter Notebook 4 MIT 1 0 0 Updated Jan 25, 2024
  • use_cases Public

    Repository documenting applications of the ML4GLand suite on published datasets

    Jupyter Notebook 1 0 0 0 Updated Jan 3, 2024
  • .github Public
    0 0 0 0 Updated Nov 9, 2023
  • EUGENe_paper Public

    Code for generating the figures and results presented in the manuscript EUGENe: A Python toolkit for sequence activity prediction and analysis

    Jupyter Notebook 4 CC0-1.0 0 0 0 Updated Sep 18, 2023

People

This organization has no public members. You must be a member to see who’s a part of this organization.