1.
ML Foundations
ML Concepts:
Supervised Learning: Learning
from labeled data with input- ML Algorithms
output pairs to predict outcomes Regression: Predicting continuous
for new data. values, e.g., linear regression, ML Workflows
Unsupervised Learning: Finding polynomial regression. Data Collection and Preparation: Gathering and
patterns and structures in Classification: Assigning labels to preprocessing data, including cleaning, normalization,
unlabeled data without data points, e.g., logistic regression, and feature extraction.
predefined outputs. decision trees, support vector Model Selection and Training: Choosing appropriate
Reinforcement Learning: machines. algorithms and training models using labeled data.
Learning through trial and error to Clustering: Grouping similar data Evaluation: Assessing model performance using metrics
maximize cumulative reward, points together, e.g., k-means like accuracy, precision, recall, and F1-score.
suitable for dynamic clustering, hierarchical clustering. Deployment: Integrating trained models into production
environments. Dimensionality Reduction: environments, ensuring scalability, efficiency, and
Reducing the number of features in reliability.
data while retaining key Monitoring and Maintenance: Continuously monitoring
information, e.g., principal model performance, retraining models with new data,
component analysis (PCA). and maintaining model versions. @Sandip Das
2. Tools and Platforms
Version Control: CI/CD Tools: Containerization & Orchestration
Git: Distributed version control system Jenkins: Open-source automation server used Docker: Platform for developing, shipping,
for tracking changes in source code for building, testing, and deploying software. and running applications in containers,
during software development. ensuring consistency across different
GitHub Actions: Integrated CI/CD service by computing environments.
GitHub/GitLab: Platforms built on Git for GitHub for automating workflows directly
hosting repositories, managing from repositories. Kubernetes: Open-source container
collaborative development, and orchestration platform for automating the
facilitating version control workflows. GitLab CI: Built-in CI/CD service in GitLab for deployment, scaling, and management of
automating the testing and deployment of containerized applications.
applications.
@Sandip Das
3. Data Engineering Data Processing and Transformation:
Data Collection: Data Storage: ETL Pipelines:
APIs: Interfaces for accessing data SQL Databases: Relational databases
Apache Airflow: Open-source tool for
from external services or systems. for structured data storage, supporting
orchestrating complex data workflows, including
ACID transactions.
Extract, Transform, Load (ETL) processes.
Databases: Structured storage
systems like MySQL, PostgreSQL, or NoSQL Databases: Non-relational
AWS Glue: Fully managed ETL service by AWS,
NoSQL databases like MongoDB, databases for flexible and scalable
simplifying the process of preparing and loading
Cassandra. storage, suitable for semi-structured or
data for analytics.
unstructured data (e.g., MongoDB,
Web Scraping: Automated extraction Cassandra). Data Cleaning and Preprocessing:
of data from websites, often using
Pandas: Python library for data manipulation and
frameworks like BeautifulSoup or Data Lakes: Centralized repositories for
analysis, offering data structures and operations
Scrapy. storing structured, semi-structured, and
for manipulating numerical tables and time series
unstructured data at scale, typically
data.
using distributed file systems like
Hadoop Distributed File System
NumPy: Fundamental package for numerical
(HDFS) or cloud-based object storage
computing in Python, providing support for large,
like Amazon S3.
multi-dimensional arrays and matrices, along
with a collection of mathematical functions to
operate on these arrays. @Sandip Das
4. Model Development
Model Training: Validation: Hyperparameter Tuning: Optimizing parameters that
control model training to improve performance.
TensorFlow: Open-source framework MLflow: Open-source
developed by Google for building and platform for managing the Techniques:
training machine learning models, end-to-end machine Grid Search: Exhaustive search over a manually specified
especially neural networks. learning lifecycle, including subset of the hyperparameter space.
experiment tracking,
PyTorch: Open-source machine reproducibility, and Random Search: Sampling hyperparameters randomly from a
learning library developed by deployment. predefined distribution.
Facebook's AI Research lab, known
for its dynamic computation graph Weights & Biases: Platform Bayesian Optimization: Sequential model-based optimization
and ease of use. for experiment tracking, technique that uses Bayesian inference to direct the search
visualization, and for optimal hyperparameters.
Scikit-Learn: Python library for collaboration, helping teams Tools:
machine learning built on NumPy, to track their machine Optuna: Open-source hyperparameter optimization
SciPy, and matplotlib, providing learning experiments in framework, particularly efficient for black-box optimization
simple and efficient tools for data real-time. tasks.
mining and data analysis.
Hyperopt: Python library for optimizing over awkward search
spaces using randomized search.
@Sandip Das
5. Model Deployment
API’s Model Serving Platforms: These platforms Containerizing Models: Containerization is
specialize in managing and serving machine crucial for deploying machine learning models
Flask: A lightweight and flexible microframework for Python. Ideal
for building web applications and APIs with minimal boilerplate
learning models. For example: consistently across different environments. This
and easy integration with machine learning models. TensorFlow Serving: Specifically designed
phase involves:
FastAPI: A modern, fast (asynchronous), and web framework for for serving TensorFlow models with high- Dockerizing: Creating Docker images for machine
building APIs with Python 3.6+. Known for its automatic performance inferencing. learning models encapsulates the model, its
interactive API documentation, type safety, and high performance. TorchServe: An open-source model dependencies, and any required pre-processing or
Django REST Framework (DRF): Provides a comprehensive post-processing logic into a portable package.
serving library for PyTorch models,
toolkit for building Web APIs in Django. Offers serialization, Deploying on Kubernetes: Kubernetes provides
facilitating scalable inference.
authentication, and other essential features for creating robust orchestration and scaling capabilities for
Seldon: Provides tools to deploy and
APIs quickly.
manage machine learning models on containerized applications. Key tools and practices
[Link] with [Link]: Popular for building APIs in JavaScript,
Kubernetes. include:
leveraging non-blocking I/O and middleware support, making it
MLflow Models: MLflow supports model 1. Helm charts: Kubernetes package manager that
suitable for high-throughput API handling.
Ruby on Rails: Integrates Action Controller for building RESTful packaging and serving through its model simplifies the deployment and management of
APIs alongside web applications. Offers conventions over registry and REST API. Kubernetes applications.
configuration and rapid development capabilities. 2. Kustomize: Allows customization of Kubernetes
Spring Boot (Java): Uses Spring MVC for building enterprise- YAML configurations without direct editing,
grade RESTful APIs in Java, with features like dependency enabling easier management of multiple
injection, transaction management, and security. environments (e.g., dev, staging, prod).
[Link] Core (C#): A cross-platform, high-performance 3. ArgoCD: A declarative, GitOps continuous delivery
framework for building APIs in C#. Integrates seamlessly with
tool for Kubernetes. It automates deployment,
.NET ecosystem, offering strong typing and performance
monitoring, and lifecycle management of
optimizations.
applications in Kubernetes clusters, ensuring
Go with Gin: A web framework written in Go (Golang) known for
its minimalistic and lightweight design. Provides robust features
consistency and scalability.
for building scalable APIs with a focus on performance. @Sandip Das
6. Monitoring and Maintenance
Logging and Alerting
Monitoring Models
Prometheus: A monitoring and
Performance Metrics: Metrics such as
alerting toolkit designed for
accuracy, precision, recall, F1 score, and
monitoring metrics and alerting on
AUC-ROC (Area Under the Receiver
various system aspects.
Operating Characteristic Curve) are crucial
for assessing how well the model performs
Grafana: A visualization tool that
over time. These metrics help in
works with Prometheus and other
understanding if the model is meeting
data sources to create dashboards
expected performance levels and detecting
for monitoring and analyzing
any degradation.
metrics.
Drift Detection: Techniques for detecting
ELK Stack (Elasticsearch, Logstash,
data drift (changes in input data
Kibana): Elasticsearch for indexing
distribution) and model drift (changes in
and searching data, Logstash for
model predictions over time). Drift detection
collecting, processing, and
ensures that models remain accurate and
forwarding logs, and Kibana for
reliable in production environments where
visualization and exploration of log
data characteristics may change.
data.
7. Scaling and Automation
Automated Pipelines: Scaling Infrastructure: Scaling Infrastructure involves expanding computational resources dynamically to
CI/CD for ML: accommodate increasing demands for training and deploying machine learning models efficiently across
Jenkins: Automation server used for distributed environments.
building, testing, and deploying machine Cloud Services: Distributed Training:
learning models with pipelines tailored for Horovod: Framework for distributed
AWS SageMaker: Managed service for
ML workflows.
building, training, and deploying ML training of deep learning models across
GitHub Actions: Provides CI/CD
models at scale on AWS, integrating with multiple GPUs and machines, optimizing
capabilities integrated with GitHub
other AWS services for end-to-end ML performance and reducing training time.
repositories, enabling automated
workflows for ML model training and workflows.
deployment. Kubernetes: Container orchestration
GitLab CI for ML: GitLab's CI/CD pipelines Google AI Platform: Offers tools and platform that supports distributed training
designed for ML projects, facilitating services for developing, training, and by managing and scaling containers across
automated testing, packaging, and
deploying ML models on Google Cloud clusters efficiently.
deployment of models.
Platform, supporting scalable and cost-
Automated Retraining: Dask: Python library for parallel computing
effective ML operations.
Scheduling Retraining Jobs: Automates that scales out to multiple nodes in a
the scheduling of retraining jobs based on
Azure ML: Provides a comprehensive set cluster, suitable for distributed data
predefined triggers or conditions, ensuring
of services to build, train, and deploy ML processing and training ML models.
models are updated with fresh data.
Managing Model Versions: Tracks and models on Azure, with capabilities for
manages multiple versions of models, scalable infrastructure and integrated
facilitating comparison, rollback, and development tools.
governance of model updates.
@Sandip Das
8. Governance and Compliance
Model Governance
Security and Compliance:
Model Versioning: Tracking and
Data Privacy: Addressing regulations like
managing different versions of
GDPR (General Data Protection Regulation)
machine learning models to
and HIPAA (Health Insurance Portability
facilitate comparison, rollback, and
and Accountability Act) to ensure data
auditability of model changes over
handling complies with privacy standards.
time.
Model Security: Implementing measures to
Audit Trails: Maintaining
secure model endpoints against
comprehensive audit trails that
unauthorized access and adversarial
capture data lineage and record
attacks, ensuring the integrity and
model changes, ensuring
confidentiality of deployed models.
transparency, accountability, and
compliance with regulatory
requirements.
@Sandip Das
@Sandip Das
Kubeflow
Kubeflow is an open-source machine learning (ML) toolkit designed to make deploying ML workflows on Kubernetes simple, portable,
and scalable. It provides a set of integrated components that facilitate end-to-end ML workflows, including:
1. Jupyter Notebooks: For interactive development and experimentation with ML models.
2. TensorFlow Extended (TFX): For deploying production-ready ML pipelines.
3. Katib: For hyperparameter tuning and optimization.
4. Kubeflow Pipelines: For building and deploying portable and scalable end-to-end ML workflows.
Kubeflow leverages Kubernetes' strengths in orchestration and scalability, making it easier for teams to manage ML workloads across
different environments from development to production. It's particularly useful in MLOps (Machine Learning Operations) for
maintaining consistency, reproducibility, and scalability in machine learning projects.
MLOps Course Link Click Here Watch
@LearnTechWithSandip
For staying till the end
Follow on: Sandip Das