Starred repositories
Techniques and numbers for estimating system's performance from first-principles
A curated collection of free Machine Learning related eBooks
This repo is meant to serve as a guide for Machine Learning/AI technical interviews.
ScyllaDB cluster setup guide using podman
Grafana panel to integrate with any kind of HTTP/REST API
OpenMLDB is an open-source machine learning database that provides a feature platform computing consistent features for training and inference.
QuestDB is a high performance, open-source, time-series database
🔥LeetCode solutions in any programming language | 多种编程语言实现 LeetCode、《剑指 Offer(第 2 版)》、《程序员面试金典(第 6 版)》题解
Awesome LeetCode resources to learn Data Structures and Algorithms and prepare for Coding Interviews.
Examples on how to use the command line tools in Avro Tools to read and write Avro files
𝗗𝗮𝘁𝗮, 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 & 𝗔𝗜. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
Deepchecks: Tests for Continuous Validation of ML Models & Data. Deepchecks is a holistic open-source solution for all of your AI & ML validation needs, enabling to thoroughly test your data and mo…
A distributed block-based data storage and compute engine
What's in your data? Extract schema, statistics and entities from datasets
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
An orchestration platform for the development, production, and observation of data assets.
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
A collection of Kotlin Multiplatform cryptographic hashing functions.
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collec…
VictoriaMetrics: fast, cost-effective monitoring solution and time series database
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
⚡ Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
Visual analysis and diagnostic tools to facilitate machine learning model selection.
A light-weight, flexible, and expressive statistical data testing library
Visualizer for neural network, deep learning and machine learning models
A curated list of awesome open source tools and commercial products for monitoring data quality, monitoring model performance, and profiling data 🚀