Skip to content
View chgl's full-sized avatar
🧊
🧊

Block or report chgl

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

🪠 Data Integration

115 repositories

Conduit streams data between data stores. Kafka Connect replacement. No JVM required.

Go 401 50 Updated Nov 27, 2024

Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days

TypeScript 4,125 295 Updated Nov 27, 2024

An orchestration platform for the development, production, and observation of data assets.

Python 11,938 1,493 Updated Nov 27, 2024

Build data pipelines, the easy way 🛠️

TypeScript 4,083 259 Updated Jun 6, 2023

Upserts, Deletes And Incremental Processing on Big Data.

Java 5,463 2,430 Updated Nov 27, 2024

PipelineDP is a Python framework for applying differentially private aggregations to large datasets using batch processing systems such as Apache Spark, Apache Beam, and more.

Python 276 77 Updated Oct 17, 2024

Apache Iceberg

Java 6,514 2,252 Updated Nov 27, 2024

Nessie: Transactional Catalog for Data Lakes with Git-like semantics

Java 1,044 130 Updated Nov 27, 2024

This is an engine that converts data of one structure to another, based on a configuration file which describes how. There is an accompanying syntax to make writing mappings easier and more robust.

Java 213 66 Updated Nov 21, 2024

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team co…

TypeScript 5,631 1,056 Updated Nov 27, 2024

Conversion utility to translate legacy data formats into FHIR

Liquid 413 180 Updated Nov 26, 2024

🔥 🔥 🔥 Open Source Airtable Alternative

TypeScript 49,935 3,427 Updated Nov 27, 2024

A simple mock implementation of the AWS S3 API startable as Docker image, TestContainer, JUnit 4 rule, JUnit Jupiter extension or TestNG listener

Kotlin 844 181 Updated Nov 21, 2024

What's in your data? Extract schema, statistics and entities from datasets

Python 1,434 163 Updated Nov 13, 2024

Apache Camel Karavan a Low-code Data Integration Platform

TypeScript 454 157 Updated Nov 27, 2024

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

Python 10,013 1,634 Updated Nov 27, 2024

Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.

Rust 1,302 121 Updated Nov 26, 2024

Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.

Rust 15,276 445 Updated Nov 27, 2024

Dataframes powered by a multithreaded, vectorized query engine, written in Rust

Rust 30,562 1,979 Updated Nov 27, 2024

Fancy stream processing made operationally mundane

Go 8,147 840 Updated Nov 26, 2024

Dagger is an easy-to-use, configuration over code, cloud-native framework built on top of Apache Flink for stateful processing of real-time streaming data.

Java 267 41 Updated Aug 29, 2023

CrateDB is a distributed and scalable SQL database for storing and analyzing massive amounts of data in near real-time, even with complex queries. It is PostgreSQL-compatible, and based on Lucene.

Java 4,127 567 Updated Nov 27, 2024

Redpanda Console is a developer-friendly UI for managing your Kafka/Redpanda workloads. Console gives you a simple, interactive approach for gaining visibility into your topics, masking data, manag…

TypeScript 3,836 352 Updated Nov 27, 2024

An Open Standard for lineage metadata collection

Java 1,777 309 Updated Nov 27, 2024

lakeFS - Data version control for your data lake | Git for data

Go 4,461 359 Updated Nov 26, 2024

KNet is a comprehensive .NET suite for Apache Kafka™ providing all features: Producer, Consumer, Admin, Streams, Connect, backends (ZooKeeper and Kafka)

C# 40 6 Updated Nov 25, 2024

Open-Source Web UI for Apache Kafka Management

Java 9,878 1,195 Updated Jul 26, 2024

Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.

Rust 3,901 489 Updated Nov 26, 2024

Open source SQL Query Assistant service for Databases/Warehouses

JavaScript 1,182 372 Updated Nov 27, 2024

Apache Doris is an easy-to-use, high performance and unified analytics database.

Java 12,791 3,294 Updated Nov 27, 2024