🪠 Data Integration
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days
An orchestration platform for the development, production, and observation of data assets.
Upserts, Deletes And Incremental Processing on Big Data.
PipelineDP is a Python framework for applying differentially private aggregations to large datasets using batch processing systems such as Apache Spark, Apache Beam, and more.
Nessie: Transactional Catalog for Data Lakes with Git-like semantics
This is an engine that converts data of one structure to another, based on a configuration file which describes how. There is an accompanying syntax to make writing mappings easier and more robust.
OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team co…
Conversion utility to translate legacy data formats into FHIR
A simple mock implementation of the AWS S3 API startable as Docker image, TestContainer, JUnit 4 rule, JUnit Jupiter extension or TestNG listener
What's in your data? Extract schema, statistics and entities from datasets
Apache Camel Karavan a Low-code Data Integration Platform
dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
Neon: Serverless Postgres. We separated storage and compute to offer autoscaling, code-like database branching, and scale to zero.
Dataframes powered by a multithreaded, vectorized query engine, written in Rust
Fancy stream processing made operationally mundane
Dagger is an easy-to-use, configuration over code, cloud-native framework built on top of Apache Flink for stateful processing of real-time streaming data.
CrateDB is a distributed and scalable SQL database for storing and analyzing massive amounts of data in near real-time, even with complex queries. It is PostgreSQL-compatible, and based on Lucene.
Redpanda Console is a developer-friendly UI for managing your Kafka/Redpanda workloads. Console gives you a simple, interactive approach for gaining visibility into your topics, masking data, manag…
An Open Standard for lineage metadata collection
lakeFS - Data version control for your data lake | Git for data
KNet is a comprehensive .NET suite for Apache Kafka™ providing all features: Producer, Consumer, Admin, Streams, Connect, backends (ZooKeeper and Kafka)
Open-Source Web UI for Apache Kafka Management
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
Open source SQL Query Assistant service for Databases/Warehouses
Apache Doris is an easy-to-use, high performance and unified analytics database.