Best Open Source Linux Data Management Systems 2026

The Julia Programming Language

High-level, high-performance dynamic language for technical computing

Julia is a fast, open source high-performance dynamic language for technical computing. It can be used for data visualization and plotting, deep learning, machine learning, scientific computing, parallel computing and so much more. Having a high level syntax, Julia is easy to use for programmers of every level and background. Julia has more than 2,800 community-registered packages including various mathematical libraries, data manipulation tools, and packages for general purpose computing. Libraries from Python, R, C/Fortran, C++, and Java can also be used.

Downloads: 16 This Week

Last Update: 7 days ago

See Project

dlib

Toolkit for making machine learning and data analysis applications

Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real world problems. It is used in both industry and academia in a wide range of domains including robotics, embedded devices, mobile phones, and large high performance computing environments. Dlib's open source licensing allows you to use it in any application, free of charge. Good unit test coverage, the ratio of unit test lines of code to library lines of code is about 1 to 4. The library is tested regularly on MS Windows, Linux, and Mac OS X systems. No other packages are required to use the library, only APIs that are provided by an out of the box OS are needed. There is no installation or configure step needed before you can use the library. All operating system specific code is isolated inside the OS abstraction layers which are kept as small as possible.

Downloads: 12 This Week

Last Update: 2025-05-28

See Project

Recommenders

Best practices on recommendation systems

The Recommenders repository provides examples and best practices for building recommendation systems, provided as Jupyter notebooks. The module reco_utils contains functions to simplify common tasks used when developing and evaluating recommender systems. Several utilities are provided in reco_utils to support common tasks such as loading datasets in the format expected by different algorithms, evaluating model outputs, and splitting training/test data. Implementations of several state-of-the-art algorithms are included for self-study and customization in your own applications. Please see the setup guide for more details on setting up your machine locally, on a data science virtual machine (DSVM) or on Azure Databricks. Independent or incubating algorithms and utilities are candidates for the contrib folder. This will house contributions which may not easily fit into the core repository or need time to refactor or mature the code and add necessary tests.

Downloads: 4 This Week

Last Update: 2024-12-23

See Project

AutoGluon

AutoGluon: AutoML for Image, Text, and Tabular Data

AutoGluon enables easy-to-use and easy-to-extend AutoML with a focus on automated stack ensembling, deep learning, and real-world applications spanning image, text, and tabular data. Intended for both ML beginners and experts, AutoGluon enables you to quickly prototype deep learning and classical ML solutions for your raw data with a few lines of code. Automatically utilize state-of-the-art techniques (where appropriate) without expert knowledge. Leverage automatic hyperparameter tuning, model selection/ensembling, architecture search, and data processing. Easily improve/tune your bespoke models and data pipelines, or customize AutoGluon for your use-case. AutoGluon is modularized into sub-modules specialized for tabular, text, or image data. You can reduce the number of dependencies required by solely installing a specific sub-module via: python3 -m pip install <submodule>.

Downloads: 3 This Week

Last Update: 2025-12-19

See Project

FinMind

Open Data, more than 50 financial data

In the era of big data, data is the foundation of everything. We collect more than 50 kinds of Taiwan stock related information and provide download, online analysis, and backtesting. Regardless of the program, you can download data through the api provided by FinMind, or you can download data directly from the website. After data is available, statistical analysis, regression analysis, time series analysis, machine learning, and deep learning can be performed. For individual stocks, provide visual analysis of technical, fundamental, and chip levels. According to different strategies, back-test analysis is performed to provide performance, profit and loss, and stock selection targets of different strategy investment portfolios.

Downloads: 2 This Week

Last Update: 2026-01-04

See Project

Metaflow

A framework for real-life data science

Metaflow is a human-friendly Python library that helps scientists and engineers build and manage real-life data science projects. Metaflow was originally developed at Netflix to boost productivity of data scientists who work on a wide variety of projects from classical statistics to state-of-the-art deep learning.

Downloads: 1 This Week

Last Update: 1 day ago

See Project

Amazon SageMaker Examples

Jupyter notebooks that demonstrate how to build models using SageMaker

Welcome to Amazon SageMaker. This projects highlights example Jupyter notebooks for a variety of machine learning use cases that you can run in SageMaker. If you’re new to SageMaker we recommend starting with more feature-rich SageMaker Studio. It uses the familiar JupyterLab interface and has seamless integration with a variety of deep learning and data science environments and scalable compute resources for training, inference, and other ML operations. Studio offers teams and companies easy on-boarding for their team members, freeing them up from complex systems admin and security processes. Administrators control data access and resource provisioning for their users. Notebook Instances are another option. They have the familiar Jupyter and JuypterLab interfaces that work well for single users, or small teams where users are also administrators. Advanced users also use SageMaker solely with the AWS CLI and Python scripts using boto3 and/or the SageMaker Python SDK.

Downloads: 0 This Week

Last Update: 2021-09-14

See Project

Deep Learning course

Slides and Jupyter notebooks for the Deep Learning lectures

Slides and Jupyter notebooks for the Deep Learning lectures at Master Year 2 Data Science from Institut Polytechnique de Paris. This course is being taught at as part of Master Year 2 Data Science IP-Paris. Note: press "P" to display the presenter's notes that include some comments and additional references. This lecture is built and maintained by Olivier Grisel and Charles Ollion.

Downloads: 0 This Week

Last Update: 2022-08-17

See Project

Deep Learning with PyTorch

Latest techniques in deep learning and representation learning

This course concerns the latest techniques in deep learning and representation learning, focusing on supervised and unsupervised deep learning, embedding methods, metric learning, convolutional and recurrent nets, with applications to computer vision, natural language understanding, and speech recognition. The prerequisites include DS-GA 1001 Intro to Data Science or a graduate-level machine learning course. To be able to follow the exercises, you are going to need a laptop with Miniconda (a minimal version of Anaconda) and several Python packages installed. The following instruction would work as is for Mac or Ubuntu Linux users, Windows users would need to install and work in the Git BASH terminal. JupyterLab has a built-in selectable dark theme, so you only need to install something if you want to use the classic notebook interface.

Downloads: 0 This Week

Last Update: 2021-10-12

See Project

DeepLearningProject

An in-depth machine learning tutorial

This tutorial tries to do what most Most Machine Learning tutorials available online do not. It is not a 30 minute tutorial that teaches you how to "Train your own neural network" or "Learn deep learning in under 30 minutes". It's a full pipeline which you would need to do if you actually work with machine learning - introducing you to all the parts, and all the implementation decisions and details that need to be made. The dataset is not one of the standard sets like MNIST or CIFAR, you will make you very own dataset. Then you will go through a couple conventional machine learning algorithms, before finally getting to deep learning! In the fall of 2016, I was a Teaching Fellow (Harvard's version of TA) for the graduate class on "Advanced Topics in Data Science (CS209/109)" at Harvard University. I was in charge of designing the class project given to the students, and this tutorial has been built on top of the project I designed for the class.

Downloads: 0 This Week

Last Update: 2022-08-03

See Project

Seldon Server

Machine learning platform and recommendation engine on Kubernetes

Seldon Server is a machine learning platform and recommendation engine built on Kubernetes. Seldon reduces time-to-value so models can get to work faster. Scale with confidence and minimize risk through interpretable results and transparent model performance. Seldon Core focuses purely on deploying a wide range of ML models on Kubernetes, allowing complex runtime serving graphs to be managed in production. Seldon Core is a progression of the goals of the Seldon-Server project but also a more restricted focus to solving the final step in a machine learning project which is serving models in production. Seldon Server is a machine learning platform that helps your data science team deploy models into production. It provides an open-source data science stack that runs within a Kubernetes Cluster. You can use Seldon to deploy machine learning and deep learning models into production on-premise or in the cloud (e.g. GCP, AWS, Azure).

Downloads: 0 This Week

Last Update: 2022-04-05

See Project

Synapse Machine Learning

Simple and distributed Machine Learning

SynapseML (previously MMLSpark) is an open source library to simplify the creation of scalable machine learning pipelines. SynapseML builds on Apache Spark and SparkML to enable new kinds of machine learning, analytics, and model deployment workflows. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with the Open Neural Network Exchange (ONNX), LightGBM, The Cognitive Services, Vowpal Wabbit, and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of data sources. SynapseML also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users can embed any web service into their SparkML models. For production-grade deployment, the Spark Serving project enables high throughput, sub-millisecond latency web services, backed by your Spark cluster.

Downloads: 0 This Week

Last Update: 2025-10-30

See Project

Open Source Linux Data Management Systems

Data Management Systems for Linux

The Julia Programming Language

dlib

Recommenders

AutoGluon

FinMind

Metaflow

Amazon SageMaker Examples

Deep Learning course

Deep Learning with PyTorch

DeepLearningProject

Seldon Server

Synapse Machine Learning

Related Searches