chgl

🧊

chgl

🧊

43 followers · 31 following

Achievements

x2 x3

Achievements

x2 x3

Stars

🪠 Data Integration

120 repositories

thriving-dev / kafka-streams-cassandra-state-store

'Drop-in' Kafka Streams State Store implementation that persists data to Apache Cassandra / ScyllaDB

Java 24 5 Updated Feb 12, 2025

fugue-project / fugue

A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.

Python 2,040 93 Updated Sep 21, 2024

timeplus-io / proton

High-performance, low-footprint SQL database written in C++. Process millions of rows per second from Kafka, Pulsar, or ClickHouse, and seamlessly write results back. Supports powerful features lik…

C++ 1,653 74 Updated Feb 14, 2025

canimus / cuallee

Possibly the fastest DataFrame-agnostic quality check library in town.

Python 181 21 Updated Feb 11, 2025

trinodb / trino-gateway

Java 179 80 Updated Feb 14, 2025

apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.

Java 2,638 1,026 Updated Feb 17, 2025

StarRocks / starrocks-kubernetes-operator

Kubernetes Operator for StarRocks

Go 145 71 Updated Feb 14, 2025

delta-io / delta-sharing

An open protocol for secure data sharing

Scala 804 182 Updated Jan 30, 2025

datahub-project / datahub

The Metadata Platform for your Data and AI Stack

Java 10,294 3,045 Updated Feb 17, 2025

numaproj / numaflow

Kubernetes-native platform to run massively parallel data/streaming jobs

Go 1,790 124 Updated Feb 17, 2025

apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.

Java 8,282 1,902 Updated Feb 17, 2025

apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics

C++ 14,999 3,621 Updated Feb 17, 2025

pathwaycom / pathway

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

Python 13,638 304 Updated Feb 17, 2025

unitycatalog / unitycatalog

Open, Multi-modal Catalog for Data & AI

Python 2,658 436 Updated Feb 13, 2025

ydataai / ydata-synthetic

Synthetic data generators for tabular and time-series data

Jupyter Notebook 1,498 246 Updated Feb 14, 2025

drizzle-team / drizzle-orm

Headless TypeScript ORM with a head. Runs on Node, Bun and Deno. Lives on the Edge and yes, it's a JavaScript ORM too 😅

TypeScript 26,318 766 Updated Feb 17, 2025

apache / opendal

Apache OpenDAL: One Layer, All Storage.

Rust 3,776 524 Updated Feb 17, 2025

apache / polaris

Apache Polaris, the interoperable, open source catalog for Apache Iceberg

Java 1,339 179 Updated Feb 16, 2025

airtai / faststream

FastStream is a powerful and easy-to-use Python framework for building asynchronous services interacting with event streams such as Apache Kafka, RabbitMQ, NATS and Redis.

Python 3,462 187 Updated Feb 10, 2025

holistics / dbml

Database Markup Language (DBML), designed to define and document database structures

JavaScript 3,016 179 Updated Feb 17, 2025

CrunchyData / pg_parquet

Copy to/from Parquet in S3 or Azure Blob Storage from within PostgreSQL

Rust 427 15 Updated Jan 31, 2025

sfu-db / connector-x

Fastest library to load data from DB to DataFrames in Rust and Python

Rust 2,118 165 Updated Feb 14, 2025

beekeeper-studio / beekeeper-studio

Modern and easy to use SQL client for MySQL, Postgres, SQLite, SQL Server, and more. Linux, MacOS, and Windows.

TypeScript 17,421 1,133 Updated Feb 15, 2025

michelin / kstreamplify

Swiftly build and enhance your Kafka Streams applications.

Java 108 21 Updated Feb 15, 2025

DataExpert-io / data-engineer-handbook

This is a repo with links to everything you'd ever want to learn about data engineering

Jupyter Notebook 26,595 5,441 Updated Jan 6, 2025

readysettech / readyset

Readyset is a MySQL and Postgres wire-compatible caching layer that sits in front of existing databases to speed up queries and horizontally scale read throughput. Under the hood, ReadySet caches t…

Rust 4,816 136 Updated Feb 17, 2025

flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.

Go 6,008 690 Updated Feb 17, 2025

turbot / steampipe

Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No DB required.

Go 7,180 285 Updated Feb 11, 2025

apache / amoro

Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.

Java 917 310 Updated Feb 10, 2025

BemiHQ / BemiDB

Postgres read replica optimized for analytics

Go 1,273 27 Updated Feb 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chgl

Achievements

Achievements

Block or report chgl

🪠 Data Integration

thriving-dev / kafka-streams-cassandra-state-store

fugue-project / fugue

timeplus-io / proton

canimus / cuallee

trinodb / trino-gateway

apache / paimon

StarRocks / starrocks-kubernetes-operator

delta-io / delta-sharing

datahub-project / datahub

numaproj / numaflow

apache / seatunnel

apache / arrow

pathwaycom / pathway

unitycatalog / unitycatalog

ydataai / ydata-synthetic

drizzle-team / drizzle-orm

apache / opendal

apache / polaris

airtai / faststream

holistics / dbml

CrunchyData / pg_parquet

sfu-db / connector-x

beekeeper-studio / beekeeper-studio

michelin / kstreamplify

DataExpert-io / data-engineer-handbook

readysettech / readyset

flyteorg / flyte

turbot / steampipe

apache / amoro

BemiHQ / BemiDB