DevOps and Microservices
Syllabus:
UNIT I
INTRODUCTION
Software Engineering traditional and Agile process models DevOps -Definition Practices
-DevOps life cycle process - need for DevOps-Barriers
UNIT II
DEVOPS PLATFORM AND SERVICES
KNOWLEDGE
9+6
Cloud as a platform laaS, PaaS, SaaS Virtualization Containers -Supporting Multiple Data
Centers Operation Services Hardware provisioning software Provisioning IT services -
SLA-capacity planning-security Service Transition - Service Operation Concepts.
UNIT III
BUILDING, TESTING AND DEPLOYMENT
9+6
Microservices architecture coordination model building and testing Deployment pipeline
Development and Pre-commit Testing -Build and Integration Testing continuous integration
monitoring-security- Resources to Be Protected - Identity Management
UNIT IV
DEVOPS AUTOMATION TOOLS
9+6
Infrastructure Automation- Configuration Management Deployment Automation Performance.
Management - Log Management -Monitoring.
UNIT V
MLOPS
77
9+6
MLOps Definition Challenges -Developing Models Deploying to production Model
Governance Real world examples
Unit I: Introduction to Software Engineering and DevOps
Software Engineering Fundamentals
• Definition: Software engineering is an engineering branch focused on developing software
products using defined scientific principles, techniques, and procedures to create effective and
reliable software.
• Need for Software Engineering:
◦ To manage large, complex software projects.
◦ For scalability of software.
◦ To manage costs effectively.
◦ To handle the dynamic nature of software requirements.
◦ For better quality management and ensuring reliable products.
• Characteristics of a Good Software Engineer: Familiarity with software engineering principles,
domain knowledge, programming abilities, strong communication skills (oral, written,
interpersonal), high motivation, sound computer science fundamentals, intelligence, ability to
work in a team, and discipline.
• Importance: Reduces complexity by breaking down large problems, minimizes software cost
by decreasing unneeded elements, decreases development time through planning, helps
handle big projects with planning and management, ensures reliable software through testing
and maintenance, and improves effectiveness by adhering to standards.
Software Development Life Cycle (SDLC) Models
• Definition: A process for building or improving information systems, and the
models/methodologies used to create, maintain, and replace them. Structured SDLC offers
advantages like easing system building, reducing failures, and meeting user needs.
• Traditional Methods:
◦ Waterfall Model (Linear Cycle): Sequential process with distinct phases (requirements,
design, implementation, testing, deployment, maintenance). Suited for projects with clearly
defined requirements. Review at the end of each phase before moving to the next.
◦ Spiral Model (Evolutionary Design): System developed through a series of prototypes, with
each iteration adding improvements. Suitable for experimental systems. Consists of Planning,
Risk Analysis, Engineering, and Evaluation phases, continuously iterating through them.
◦ V-Model: Extension of the Waterfall model, following a downward linear process from
requirements to coding, then inclining upwards for testing phases (forming a 'V'). Widely used,
e.g., in defense forces.
• Agile Model:
◦ Definition: Iterative development approach breaking tasks into smaller iterations (1-4
weeks). Project scope and requirements defined early, but plans for iterations are flexible. Each
iteration involves planning, requirements analysis, design, coding, and testing, delivering a
working product to the client.
◦ Phases: Requirements gathering, Design, Construction/Iteration, Testing/Quality Assurance,
Deployment, Feedback.
◦ Agile Testing Methods (Examples): Scrum, Crystal, DSDM (Dynamic Software Development
Method), FDD (Feature Driven Development), Lean Software Development, eXtreme
Programming (XP). * Scrum: Team-based task management with roles: Scrum Master, Product
Owner, Scrum Team. * eXtreme Programming (XP): Used when customer demands or
requirements are constantly changing or uncertain. * Crystal: Involves Chartering, Cyclic
Delivery (team updates release plan, integrated product delivery), and Wrap-up (deployment,
post-deployment). * DSDM: Rapid application development focusing on active user involvement
and team decision-making. Techniques: Time Boxing, MoSCoW Rules, Prototyping. Seven
stages. * FDD: Focuses on "Designing and Building" features in small steps. * Lean Software
Development: Principles of "just-in-time production" to increase speed and reduce costs
(Eliminating Waste, Amplifying Learning, Defer Commitment, Early Delivery, Empowering the
Team, Building Integrity, Optimize the Whole).
◦ When to Use Agile: Frequent changes required, highly qualified/experienced team available,
customer ready for constant meetings, small project size.
◦ Advantages: Frequent delivery, face-to-face communication, efficient design, fulfills business
requirements, acceptable changes anytime, reduced development time.
◦ Disadvantages: Shortage of formal documents leads to confusion, maintenance difficulties
due to lack of documentation post-project.
DevOps
• Definition: A culture focused on continuous project delivery by emphasizing people, processes,
and automation. It encourages cross-functional teams to work together effectively, integrating
development and operations.
• DevOps Life Cycle Process: Seven phases ensuring continuity and automation.
1. Continuous Development: Planning and coding. Tools: Git for code maintenance.
2. Continuous Integration: Developers frequently commit code changes. Each commit triggers
automated build, unit testing, integration testing, code review, and packaging. Tools: Jenkins.
3. Continuous Testing: Automated testing for bugs using tools like TestNG, JUnit, Selenium.
Docker Containers can simulate test environments.
4. Continuous Monitoring: Real-time monitoring of application operational factors, recording
information to identify trends and problems. Resolves system errors, maintains security and
availability. Observability covers full-stack monitoring. Tools: Dynatrace, Splunk, AppDynamics.
5. Continuous Feedback: Continuous improvement of application development by analyzing
operational results, feeding back into the next version.
6. Continuous Deployment: Code deployed to production servers, ensuring correct usage on
all servers. Configuration management and containerization tools (Chef, Puppet, Ansible,
SaltStack, Vagrant, Docker) are crucial for consistency and scalability.
7. Continuous Operations: Automation of the release process, accelerating time to market.
Continuity is key to efficiency.
• Need for DevOps: Shorter development cycles, increased innovation, better collaboration and
communication between teams, reduced deployment failures and faster recovery, improved
resource management through automation, and focus on people.
• Barriers to DevOps Implementation: Culture (how to cultivate it), Test Automation (often
forgotten), Legacy Systems (integration difficulty), Application Complexity, No Plan (lack of
strategy/goals), Managing Environments (lack of standardization/automation), Skillset (finding
collaborative professionals), Budget (lack of dedicated budget), Tools (fragmented adoption).
• DevOps Principles:
1. Collaboration: Development and operations teams own features from ideation to
deployment for higher quality.
2. Automation: Automate manual work to reduce errors and respond quickly to customer
needs.
3. Continuous Improvement: Ongoing research and development to improve features, add
new ones, and minimize waste, making processes cost-effective and time-efficient.
4. Customer-centric action: Use feedback loops (e.g., Beta programs) from end-users to
continuously refine software and fix bugs before general availability.
5. Create with the end in mind: Define project goals with the customer's real-world problems
in mind, ensuring the software is useful and leads to better business outcomes.
• DevOps Methods:
◦ Adopting agile practices (small iterations/sprints).
◦ Applying a growth mindset (using Telemetry for complete observability).
◦ Focusing on Mean Time to Mitigate (MTTM) and Mean Time to Remediate (MTTR) rather
than preventing failures.
◦ Thinking in terms of competencies, not roles (developers responsible for application health).
• DevOps Practices (Best Practices):
◦ Continuous Integration (CI): Collaborative development using a central repository (Git).
Automated building, testing, reporting, and releasing. Benefits: developer productivity, faster bug
detection, quicker updates.
◦ Continuous Delivery (CD): Extension of CI, involving more automation for testing and
deployment. Software is deploy-ready after automated tests and pre-production stages.
Benefits: automated releases, improved productivity, faster defect finding, swift delivery.
◦ Microservices: Architecture where applications are built as small, independent services.
Each performs a specific task and can be deployed discreetly. Benefits: easy bug fixes,
lightweight, flexibility.
◦ Infrastructure as Code (IaC): Operations teams provision infrastructure using code (JSON,
YAML). Ensures standardized patterns, dynamic resource allocation, and includes configuration
management. Benefits: dynamic resource allocation, consistent environments.
◦ Monitoring and Logging: Real-time application monitoring and logging to detect issues
quickly. Proactive alerting for Ops teams. Observability for full-stack monitoring. Benefits: faster
issue detection and resolution (MTTD, MTTR).
◦ Communication and Collaboration: Promotes transparency and effective communication
between teams, reducing wait times and fostering brainstorming.
Unit II: DevOps Platform and Services (Cloud, Virtualization, Containers, Multi-Data Centers,
Operations)
Cloud Service Models
• Infrastructure as a Service (IaaS): Provides virtualized computing resources over the internet
(e.g., virtual machines, storage, networks). Users manage OS, applications, and data.
◦ Characteristics: Resources as service, highly scalable, dynamic, GUI/API access,
automated admin tasks.
◦ Advantages: Avoids physical server costs, scalability, diverse resource options, handles
immense users, cost savings, flexible.
◦ Disadvantages: Security issues, service/network delays.
• Platform as a Service (PaaS): Provides a platform for developers to build, test, run, and
manage applications without managing underlying infrastructure.
◦ Characteristics: Accessible to multiple users, integrates with web services/databases, builds
on virtualization for scalability, supports multiple languages/frameworks, auto-scale ability.
◦ Advantages: Developers focus on design, flexible, portable, affordable, manages app
development phases efficiently.
◦ Disadvantages: Data security risks, data mismatch with local storage.
• Software as a Service (SaaS): Applications hosted by a cloud service provider and accessible
to users over the internet via a web browser.
◦ Characteristics: Centralized management, hosted remotely, internet accessible, automatic
updates, pay-per-use.
◦ Advantages: Lower cost of ownership (no hardware/licenses), easily accessible, no initial
setup cost, low maintenance, less installation time.
◦ Disadvantages: Low performance, limited customization, security/data concerns.
Virtualization
• Definition: Creation of a virtual version of something (server, desktop, storage, OS, network
resources) from a single physical instance, shared among multiple users/organizations.
• Concept: Creates a virtual machine (Guest Machine) over an existing OS and hardware (Host
Machine), logically separated from underlying hardware.
• Types:
◦ Hardware Virtualization: VMM directly on hardware; controls processor, memory, etc. (for
server platforms).
◦ Operating System Virtualization: VMM on Host OS (for testing applications on different OS
platforms).
◦ Server Virtualization: VMM directly on server (dividing a physical server into multiple virtual
servers, load balancing).
◦ Storage Virtualization: Grouping physical storage from multiple network devices into a single
logical device (for backup/recovery).
• Role in Cloud Computing: Enables sharing of infrastructure, providing standard application
versions to users, and managing updates more cost-effectively.
Containers
• Definition: Executable software units packaging application code, libraries, and dependencies
to run consistently anywhere. Leverage OS virtualization features (namespaces, cgroups) to
isolate processes and control resources.
• Containers vs. VMs: Containers virtualize the OS, while VMs virtualize underlying hardware
and include a guest OS. Containers are lighter and faster because they share the host OS
kernel.
• Benefits: Lightweight (smaller size, quick spin-up, supports horizontal scaling), Portable and
platform-independent (run anywhere without reconfiguration), Supports modern development
(DevOps, serverless, microservices), Improves utilization (granular scaling of components).
• Use Cases: Microservices, DevOps, Hybrid/Multi-cloud environments, Application
modernizing/migration.
• Containerization: Designing and packaging software to take advantage of containers, resulting
in a container image.
• Container Orchestration (Kubernetes): Manages large volumes of containers throughout their
lifecycle (provisioning, redundancy, health monitoring, resource allocation, scaling, load
balancing, moving between hosts). Kubernetes is open-source and standard in the industry,
enabling declarative state management via YAML files.
Supporting Multiple Data Centers (MDC)
• Purpose: Distribute load and address disaster recovery for large organizations. Allows
transparent transfer of user session details.
• Components: Protected applications, WebGate agents, Access Manager servers, identity
stores, databases.
• Master-Clone Deployment: First data center is Master, others are Clones. Clones mirror
Master. Session adoption policies determine request routing if Master is down. Data replication
via Automated Policy Sync (APS) or manually.
• Cookies for MDC:
◦ OAM_ID: SSO cookie, holds attributes for MDC behavior across data centers (clusterid,
sessionid, latest_visited_clusterid). Host-scoped.
◦ OAMAuthn: WebGate cookie. On successful authentication/authorization, browser gets valid
WebGate cookie with clusterid:sessionid. Remote session details retrieved if authorization
spans data centers.
◦ OAM_GITO (Global Inactivity Time Out): Domain cookie for time out tracking across
WebGate agents. Contains Data Center Identifier, Session Identifier, User Identifier, Last Access
Time, Token Creation Time.
• Session Adoption During Authorization: If a request is routed to a different data center, Access
Manager checks for a valid remote session and triggers session adoption based on policy (e.g.,
retrieving session data via OAP, invalidating previous session).
• Session Indexing: New session created in local data center during authorization if conditions
met (MDC enabled, no matching session locally, valid remote session).
• Supported Topologies:
◦ Active-Active Mode: Master and Clone data centers are exact replicas and active
simultaneously, catering to different user sets (e.g., by geography). Load balancer routes traffic.
If Master is overloaded/down, Clone handles requests, potentially requiring re-authentication if
session sync fails.
◦ Active-Passive Mode: Primary data center is operable, clone is not but can be brought up in
reasonable time if primary fails. No full MDC setup needed, but policy data syncs.
◦ Active-Hot Standby Mode: One data center is in hot standby, traffic not routed there unless
active data center fails. Syncs data and is ready for failover.
• MDC Deployments: Each data center has full Access Manager installation, WebLogic Server
domains do not span data centers. Global load balancers maintain user-to-data center affinity.
◦ Session Adoption Scenarios: Different configurations for re-authentication, session
invalidation, and data retrieval (e.g., without re-authentication, invalidation, or data retrieval; with
invalidation and data retrieval; with on-demand data retrieval).
◦ Authentication and Authorization by Different DCs: Involves OAM_ID, OAMAuthn, and GITO
cookies for seamless operations.
◦ Logout and Session Invalidation: Ensures all server-side sessions and authentication
cookies are cleared across data centers.
◦ Stretch Cluster Deployments: Single OAM cluster stretched across geographically close
data centers (< 5ms latency). Treated as a single cluster, policy database in one DC. Less
reliable than traditional MDC due to single point of failure and higher cross-DC chatter.
Traditional MDC is recommended.
• Load Balancing between Access Management Components: Use local/global load balancers
(e.g., Oracle HTTP Servers) to front-end clusters and simplify configuration with virtual host
names. Constraints on OAP connections and uniform distribution.
• Time Outs and Session Syncs:
◦ Maximum Session Constraints: Honored by Credential Collector user affinity.
◦ Multi-Data Center Policy Configurations for Idle Timeout: Uses OAM_ID and OAM_GITO
cookies for calculation. Configuration depends on whether OAM_GITO is set.
◦ Expiring Multi-Data Center Sessions: Managed by the data center with user affinity.
◦ Session Synchronization and Failover: Remote session attributes migrated and synced to
the servicing data center based on policies. WebGate failover across data centers supported.
• Replicating a Multi-Data Center Environment:
◦ Initial Setup: Manual replication using WLST.
◦ Regular Sync: WLST commands or Automated Policy Synchronization (APS) via REST
APIs. APS designed for ongoing sync, not initial replication.
• MDC Recommendations:
◦ Use a common domain for WebGates and OAM Server Credential Collectors for GITO
cookie sharing.
◦ Adjust WebGate cookie validity lower than session idle timeout if no common domain.
◦ OAM_GITO cookie not applicable with DCC.
◦ Use external TCP-based load balancers for OAP endpoints.
◦ Maximum sessions may not be honored if user authentication/session creation spans
member DCs bypassing affinity.
◦ WebGate cookie cannot be refreshed during authorization; set validity low.
Cloud Computing Operations
• Definition: Delivering superior cloud services, performing business operations over the Internet
using web/mobile applications.
• Managing Cloud Operations: Employ right tools/resources, timely and cost-effective actions,
appropriate resource selection, standardization/automation of repetitive tasks, efficient
processes, and maintaining quality of service.
Hardware Provisioning
• Recommendations for Spark:
◦ Storage Systems: Run Spark on same nodes as HDFS or different nodes in same LAN. For
low-latency stores (HBase), different nodes may be preferable.
◦ Local Disks: 4-8 disks per node, no RAID, mount with noatime. Configure [Link] as
comma-separated list.
◦ Memory: Allocate max 75% for Spark, rest for OS/buffer cache. Determine needs via
Spark's monitoring UI. Consider multiple worker JVMs for >200GB RAM.
◦ Network: 10 Gigabit or higher network for network-bound applications.
◦ CPU Cores: At least 8-16 cores per machine.
Software Provisioning
• Process: Includes deployment, environment provisioning, promotion between environments,
testing, QA, and reporting.
• Importance of Test Environments: Essential for coding and testing. Ensures software runs
properly on fresh environments. Easier provisioning increases developer comfort and efficiency.
• Tracking Metrics: Answer questions like bug occurrence during promotion, deployment
frequency, and deployment failure rates to estimate quality.
• Team Efficiency: Optimally designed test environments improve team performance and
psychological safety.
• Lower Costs: High-quality test environments lead to higher-quality software, fewer bugs, less
debugging time, and more optimal settings/configurations.
• Test Environment Management: Building complex solutions or using specialized tools (like
Plutora) to manage processes, feedback, and monitoring.
SLA (Service Level Agreements)
• Definition: Documented understanding between service provider and recipient, defining service
expectations, metrics, responsibilities, and consequences for unmet obligations. Can be internal
or external. Not immutable, framework for revisions needed.
• Who Provides: Service vendor, customizable, can have multiple levels for different price points.
Clients should review.
• Importance: Ensures parties are on the same page, provides clear metrics, offers recourse for
unmet obligations, improves provider-business partnerships.
• Benefits: Improved customer experience (safety net), improved employee experience (clear
expectations), established/trusted information source, increased productivity/performance.
• Types:
◦ Customer SLAs: Between provider and external customer.
◦ Internal SLAs: Within an organization (between teams/departments).
◦ Multilevel SLAs: Multiple providers/users, contract divided into levels.
• What to Include: Agreement summary, party objectives, review schedules, points of contact,
compensation for unmet goals, cancellation conditions.
• Indemnification Clause: One party (indemnitor) takes responsibility for liabilities if contract
breached (often one-sided).
• Metrics to Consider: Easy to follow, accurately collected, within provider's control, reasonable
baselines. Examples: Availability of service, Technical quality, Error rates, Security, Business
results (KPIs).
• Service Level Management (with ServiceNow): Platform for managing SLAs, OLAs, key
definitions, and real-time analytics. Visual dashboards, automation, integration.
Capacity Planning
• Definition: Determining production capacity needed to meet changing demand. Balances
available resources to satisfy customer demand or project needs.
• Importance: Crucial for resource, time, team, and work management. Helps address future
issues, improve team performance, streamline tasks.
• Strategies:
◦ Lag Strategy: Sufficient resources to fulfill demand, not planned estimates. For smaller
firms.
◦ Lead Strategy: Enough resources to satisfy demand estimates. Extra capacity for rising
demand.
◦ Match Strategy: Combines lead and lag, monitors actual demand, estimates, and market to
adjust capacity.
• Types:
◦ Workforce Capacity Planning: Ensures enough workers/hours.
◦ Product Capacity Planning: Ensures enough products/resources to fulfill deliverables.
◦ Tool Capacity Planning: Ensures necessary tools (machinery, vehicles).
• When Required: Anytime supply needs to meet demand (week, month, quarter, year). Helps
managers handle complex environments (autonomous teams, changing priorities, unpredictable
tasks, remote workers).
• Help in Sprint Planning: Makes sprint planning tighter, more efficient, and achievable by
preparing for possibilities like missed deadlines or bottlenecks. Provides contingency plans.
• In Operations Management: System for balancing demand-supply of goods. Evaluates
production aspects (machinery, staff, work centers) to meet current and future customer
demand. Helps understand production speed, explore options for efficiency (e.g., extra vs. fewer
people), and anticipate future business scenarios.
Service Operation Concepts
• Overview: Ensures services are provided efficiently and effectively as per SLAs. Involves
monitoring, incident resolution, request fulfillment, and operational tasks.
• Key Points: Five processes, four functions. Deals with day-to-day activities and infrastructure.
Where design and transition plans are executed and measured. Where actual value is seen by
customers.
• Processes:
1. Event Management: Monitor Configuration Items (CIs), filter/categorize events, decide on
actions.
2. Incident Management: Restore service to previous state as early as possible.
3. Request Fulfillment: Handle requests (password changes, new users).
4. Access Management: Grant rights to authorized users.
5. Problem Management: Find root cause, prevent recurrence.
• Functions:
1. Service Desk: First and single point of contact, coordinates end-user and IT, owns logged
requests, ensures closure. Types: Central, Local/Distributed, Virtual, Specialized.
2. IT Operation Management: Manages day-to-day operational activities.
3. Technical Management: Technical experts managing IT infrastructure.
4. Application Management: Manages applications and software throughout service lifecycle.
• Service Concept: How a service provider realizes value and desired outcomes, and how
services are perceived by stakeholders. Holistic combination of four dimensions:
1. Service Operation: How service is delivered.
2. User Experience: User's direct experience.
3. Service Outcome: Benefits/results for user.
4. Value: Perceived benefits vs. cost.
• Importance of Service Concept: Process-oriented nature of services (chain of activities).
Weakest link impacts value. Overarching directive for all services, aligning service offerings
toward common objectives. Directs design of multiple services. Forms foundation for service
portfolio management.
Unit III: Building, Testing, and Deployment
Microservices Architecture
• Definition: Collection of small, autonomous, self-contained, loosely coupled services. Each
implements a single business capability within a bounded context.
• Characteristics: Small, independent, loosely coupled, separate codebase per service,
independently deployable, responsible for own data persistence, communicate via well-defined
APIs, supports polyglot programming.
• Components:
◦ Management/Orchestration: Places services, identifies failures, rebalances (e.g.,
Kubernetes).
◦ API Gateway: Entry point for clients, forwards calls to backend services. Benefits: decouples
clients from services, handles non-web-friendly protocols, performs cross-cutting functions
(authentication, logging, load balancing, throttling, caching, transformation, validation).
• Benefits: Agility (faster bug fixes, feature releases, easy rollbacks), Small/focused teams
(greater agility), Small code base (minimizes dependencies, easier new features), Mix of
technologies, Fault isolation (individual service failure doesn't disrupt whole app if designed
correctly), Scalability (independent scaling of subsystems), Data isolation (easier schema
updates).
• Challenges: Complexity (more moving parts, system as whole more complex), Development
and Testing (different approach, difficult refactoring/testing across service boundaries), Lack of
Governance (too many languages/frameworks, hard to maintain), Network Congestion/Latency
(interservice communication, long dependency chains), Data Integrity (consistency challenge
with private data stores), Management (requires mature DevOps, correlated logging
challenging), Versioning (updates must not break dependencies, backward/forward
compatibility), Skill Set (requires distributed systems experience).
• Best Practices: Model services around business domain, decentralize everything (avoid
sharing code/data schemas), private data storage per service, communicate via well-designed
APIs (avoid leaking implementation details), avoid coupling, offload cross-cutting concerns to
gateway, keep domain knowledge out of gateway, loose coupling/high cohesion, isolate failures
(resiliency strategies).
Coordination Models
• Purpose: Approaches to improve how people work together. Categorized as Centralized,
Role-based, and Team.
• Centralized Coordination Models: For global problems, alignment, and coordinating the entire
system.
◦ Product strategy, Standards, Tenets, Goal frameworks, Metrics, Product council.
• Role-Based Coordination Models: Limited scope, used when appropriate.
◦ Program manager, Integrator, Controller, Standard definer.
• Team Coordination Models: Used most regularly for particular situations.
◦ Service provider, Consultant, Self-service, Independent executor, Liaison, Embedded,
Single-threaded-owner, Rotation, Centralized liaison, Merged group (DevOps pattern), Task
force, Away team, Tiger team, Objective expert, Work demander, Cousin team, Community of
practice.
Deployment Pipeline
• Definition: Automated processes to quickly and accurately move new code from version
control to production. Reduces manual tasks, human error, speeds up deployments.
• Main Stages:
1. Version Control: Developer commits code to source control (e.g., GitHub). Triggers
automated compilation, unit testing, analysis, installer creation. Binaries stored in artifact
repository.
2. Acceptance Tests: Compiled code verified against predefined acceptance criteria
(custom-written tests).
3. Independent Deployment: Code automatically deployed to a development environment
(mirroring production) for functionality tests and bug squashing.
4. Production Deployment: Code made live for users. DevOps/operations aim for zero
downtime. Uses Blue/Green Drops or Canary Releases for quick updates and rollbacks.
• Benefits: Faster product updates/features, less human error, developers focus on innovation,
faster troubleshooting/rollbacks, better response to user needs with smaller, frequent releases.
• How to Build: Unique to company/user needs. Essential components: Build Automation
(Continuous Integration - CI), Test Automation (custom-written tests), Deploy Automation
(Continuous Deployment/Delivery - CD). Primary goal: eliminate manual steps.
• CI/CD Pipelines: Continuously compiling, validating, and deploying new code updates as
written. Key for efficient deployment pipeline. Code flows through stages (build, test, staging,
production). Each stage acts as a gate.
• Deployment Pipeline Tools: Source control, build/compilation, containerization, configuration
management, monitoring tools. Examples: Jenkins, Azure DevOps, CodeShip, PagerDuty.
Building and Testing
• Infrastructure Requirements: Easy integration of code, easy deployment to various
environments (testing, staging, production), easy/full testing without affecting production, close
supervision of deployed system, availability of older versions, rollback capability.
• Deployment Pipeline Steps: Pre-commit tests (local), Commit (to versioning system),
Integration tests (triggered by commit), Staging tests (quasi-production), Production (close
supervision), Normal production.
• Definitions:
◦ Continuous Integration (CI): Automatic triggers up to integration tests.
◦ Continuous Delivery (CD): Automated triggers as far as the staging system.
◦ Continuous Deployment: Next-to-last step (production deployment) is automated.
• Moving System Through Pipeline:
◦ Traceability: Determine exactly how system reached production (source code, commands,
tools). Infrastructure as Code: scripts, config, tests under version control.
◦ Environment: Code, environment, configuration, external systems, data. Each pipeline step
has a different environment: * Pre-commit: Local (laptop/desktop), stubbed external systems,
limited data, debug-level config. * Build & Integration Testing: CI server, test database,
integration testing config. * UAT/Staging/Performance Testing: Close to production, automated
acceptance/stress tests, subset of production data, test environment config (no production DB
access). * Production: Live database, sufficient resources, production config.
◦ Configuration Differences: Logging (detailed in dev), Credentials (confidential for
production). Minimize config changes.
• Crosscutting Aspects:
◦ Test Harnesses: Software and data to test program units, automate tests, generate reports
(identify failures).
◦ Negative Tests: Test system behavior when assumptions don't hold (wrong order,
connectivity issues). Expect graceful degradation/failure, meaningful error messages.
◦ Regression Testing: Uncover new bugs/regressions after changes. Ensure fixed bugs aren't
reintroduced. Automated creation from staging failures.
◦ Traceability of Errors: Lineage/provenance, easily obtain source of alert and triggering user
requests.
• Development and Pre-commit Testing:
◦ Version Control & Branching: Identify versions, share revisions, track changes. CVS/SVN
(centralized), Git (distributed). Branches for independent work (e.g., bug fixes). Problems: too
many branches, merge difficulties. Alternative: all developers work on trunk directly, frequent
small commits.
◦ Feature Toggles: "If" statement around immature code, disables features. Dangers: reuse
toggle names, integrate and remove promptly. Manage with specialized tools.
◦ Configuration Parameters: Externally settable variables changing system behavior. Values
can be same, different by environment, or confidential (production credentials). Techniques for
confidentiality: meta-rights to pipeline, network config in virtual environment.
• Testing During Development and Pre-commit Tests:
◦ Test-driven Development (TDD): Develop automated test before writing code.
◦ Unit Tests: Code-level tests for individual classes/methods. Pre-commit tests: automatically
run before commit, include unit and smoke tests to find bugs early.
• Build and Integration Testing:
◦ Build: Create executable artifact from source code/config (compiling, packaging).
◦ Build Scripts: Input to CI server, single command invocation for repeatability/traceability.
◦ Packaging: Methods depend on production environment: runtime-specific (Java archives),
OS packages, VM images, lightweight containers. Heavily baked (immutable) vs. lightly baked
images.
◦ Continuous Integration and Build Status: CI server performs build/tests.
◦ Integration Testing: Test built executable artifact. Environment includes connections to
external services (surrogate database). Mechanisms to distinguish production vs. test requests.
• UAT/Staging/Performance Testing:
◦ Staging: Mirrors production. Tests: User Acceptance Tests (UATs) (prospective users with
UI), Automated Acceptance Tests (automated UATs), Smoke Tests (subset of automated
acceptance tests for core functions), Nonfunctional Tests (performance, security, capacity,
availability).
• Production:
◦ Early Release Testing: * Beta Release: Selected users access prerelease. * Canary Testing:
Deploy new version to few servers, gradually ramp up traffic based on performance metrics. *
A/B Testing: Determine which version performs better based on business KPIs.
◦ Error Detection: Monitoring for poor behavior (response timing, queue lengths). Logs for
tracking source. Regression tests from diagnosed errors.
◦ Live Testing: Perturb running system (Netflix Simian Army).
◦ Incidents: Post-deployment errors (dev/config errors, version dependencies, dependent
system changes, incorrect parameters).
Integration Testing
• Definition: Combining individual units/modules and testing if they work as intended when
integrated. Tests interfaces between modules.
• Why Needed: Unit testing alone is insufficient (different developer logic, changing
requirements, missed issues like data formatting/hardware interfaces).
• Advantages: Integrated modules work properly, early testing, detects interface errors, helps
with APIs/third-party tools, large system coverage, increased test coverage/reliability.
• How Done: After unit testing, combine and test. Main goal: test interfaces. Steps: plan, choose
approach, design test cases, deploy, track defects, repeat.
• Stubs and Drivers: Dummy programs for modules under development. Stubs are "called
programs" (simulate lower-level modules), Drivers are "calling programs" (simulate higher-level
modules).
• Types of Integration Testing:
◦ Big Bang: All modules integrated at once and tested. Advantage: suitable for smaller
systems. Disadvantages: tough fault localization, delayed testing, critical issues not prioritized,
difficult root cause finding.
◦ Incremental: Connects logically related modules, then adds more. * Top-Down: Starts with
top modules, moves down. Uses stubs for lower-level modules. Advantages: easier fault
localization, consistent product, stubs faster to write, critical modules tested early, major design
flaws detected early. Disadvantages: requires several stubs, poor early release support, basic
functionality tested late. * Bottom-Up: Starts with lowest units, moves up. Uses drivers for
higher-level modules. Advantages: dev & testing together (efficient product), easy test
conditions. Disadvantages: requires several drivers, late data flow testing, drivers complicate
data management, poor early release support, key interface defects detected late. * Sandwich
(Mixed): Hybrid of top-down and bottom-up. System in three layers (target middle, above,
below). Top-down from top to middle, bottom-up from bottom to middle, big bang for middle.
Advantages: parallel testing, useful for large projects. Disadvantages: high cost, not for small
interdependent systems, different skill sets needed.
Continuous Integration (CI)
• Definition: Developers merge changes to main branch many times daily. Each merge triggers
automated build/test (ideally <10 min). Blocks progress if build fails, team repairs quickly.
• Why Needed: Predictable/reliable SDLC, iterative feature building, faster bug fixes. Long
feedback loops are risky. Automates integration steps, avoids repetitive work/human error.
• Extension with Continuous Delivery: CI often extended to CD, where code changes are
automatically prepared for release. Can be fully automated (Continuous Deployment) or
partially. Always results in a release artifact.
• CI/CD Pipeline: Code enters, flows through stages (build, test, staging, production). Each
stage is a logical unit, can have subunits. Each stage acts as a gate.
• Prerequisites: Automated builds, automated testing, frequent commits to single source code
repository, transparency of process and real-time access to CI status.
• Typical Development Workflow (e.g., with GitHub/Semaphore): Developer creates branch,
commits changes, pushes to GitHub. Semaphore builds/tests. Alerts if errors. Deploys to
staging if successful. Merges to master after peer review. Final build/test on master, deploys to
production.
• Benefits: Improved developer productivity, delivers working software more often, finds bugs
earlier/fixes faster (various automated checks: correctness, behavior, style, security, test
coverage), instant feedback.
• Tools: CI tool (e.g., Semaphore, Jenkins), build/testing tools (code style/complexity analyzers,
build/task automation, testing frameworks, browser testing engines, security/performance tools).
Choose widely adopted, well-documented, actively maintained tools.
• Best Practices: Treat master build as release-ready (no commenting out failing tests, no
check-ins on broken builds). Keep build fast (<10 min). Parallelize tests. Frequent commits to
master (10x/day). Use feature flags for WIP. Wait for tests before PR. Test in production clone
(Docker image). Use CI for code maintenance (scheduled library updates). Track metrics (build
time, master red frequency).
Identity Management
• Definition: Software managing user access to information to increase security and efficiency.
Controls access by IP address or other factors.
• Purpose: IT admin/managers control access to applications, databases, IT assets. Some
control internal systems, others customer access to public content. Includes IAM, user
provisioning, SSO, password management, MFA, RBA.
• Key Benefits: Improve data security, control app permissions, improve end-user productivity
(less password entry), speed up onboarding, reduce compliance risk, centralize info storage,
customize access, monitor access/privilege changes, monitor threats.
• Why Use: Data security, organizational hierarchy structure, improved user experience.
◦ Permissions and Management: Create user accounts, grant customized access, track
histories, view privileges. Simplifies onboarding/offboarding.
◦ Data Security: Meets security requirements, ensures compliance, prevents unauthorized
access/misuse. Identifies user behavior anomalies.
◦ Productivity and Efficiency: SSO for centralized access, password managers for simplified
logins, enables complex passwords.
• Kinds of IAM Software:
◦ Customer Identity and Access Management (CIAM): For multiple customer accounts, create
identities, set permissions, monitor activity.
◦ Privileged Access Management (PAM): Controls employee/third-party access to sensitive
info/systems.
◦ Multi-Factor Authentication (MFA): Adds multiple verification levels (text, security questions,
biometrics).
◦ Password Manager Software: Business/personal, remembers passwords, simplifies logins,
adds security features (required updates, complex passwords, MFA).
◦ Risk-Based Authentication (RBA): MFA powered by machine learning, monitors user
behavior, prompts extra verification for unusual activity.
◦ Single Sign-On (SSO): Users log in once, gain access to multiple applications/tools.
• Features: Authentication options (user experience, local/partner/mobile support, BYOD, target
system support), Multi-Factor Authentication (required access/system support), Identity
Directories (on-premise, app as profile master, cloud directory), Provisioning (self-service,
smart/automated, bidirectional sync, attribute transformation, role management, policy
management, access termination, approval workflows), Governance (threat identification/alerts,
compliance audits), Administration (console, easy setup, bulk changes, self-service password),
Platform (customization, reliability, performance/scalability, security, logging/reporting,
Federation/SAML, cross-browser support, APIs).
• Trends:
◦ Zero Trust Security: Access restricted until thorough verification, moves away from
perimeter-based security. For disparate networks/globalized workforce.
◦ GDPR Compliance: Addresses increasing data privacy regulations. Many tools simplify
compliance.
• Potential Issues:
◦ Security: Concerns with cloud-based products, but IAM centralizes/fortifies credentials.
◦ Cloud Application Integration: Connecting many applications, user credentials, access
privileges. Pre-integrated apps vs. customization.
◦ Mobile Compatibility: Some tools web-only, others support Android/iOS.
Unit IV: DevOps Automation Tools
Infrastructure Automation
• Definition: Reducing human interaction with IT systems by creating repeatable
scripts/functions.
• Purpose: Control IT elements (servers, storage, network, OS) to improve efficiency of IT
operations/staff, aiming for hands-off operation in hybrid IT.
• Why Important: Key for IT orchestration, efficiency, digital transformation. Manages complexity
in growing IT orgs. Increases agility, enhances productivity, reduces security attack surfaces by
standardizing processes and policies.
• Benefits for Specific Tasks:
◦ Provisioning: Reduces VM, storage, networking provisioning time (weeks/months to
minutes/hours).
◦ Cost Reduction: Highlights all cost components, prevents runaway workloads, turns IT into
asset.
◦ Capacity Planning: Prevents under/over-provisioning, eliminates inconsistencies, identifies
areas of incorrect resource allocation.
◦ VM Sprawl: Identifies unused workloads, automates decommissioning, saves money.
• How it Works: Delivers predictability/repeatability for IT workload configuration. Helps meet
SLAs, reduces complexity, frees IT resources. Streamlines ongoing operations (network
management, IMAC, user access, storage/data admin, app deployment, troubleshooting). Offers
self-service, multi-cloud provisioning with consistency. Centralizes policies via templates (IaC).
• Processes that Can Be Automated: Self-Service Cloud, Kubernetes Automation, Multi-Cloud
Automation, Network Automation, DevOps for Infrastructure.
• Benefits: Cost reductions (eliminating manual processes), improved efficiency/accuracy
(central policy), increased agility/visibility, faster time to value, consistent dynamic adjustment,
consistent security/compliance, future-proof infrastructure.
Configuration Management
• Definition: Maintaining IT systems (hardware, software) in a desired state, ensuring consistent
performance over time. Helps identify systems needing patches/updates. Often used with ITIL.
• Why Important: Prevents undocumented changes, performance issues, inconsistencies,
compliance issues, downtime, instability, failure. Manually too complex for large systems.
• CM System Functionality: Classify/manage system groups, centralized config mods,
automated updates/patching, identify underperforming/non-compliant configs, prioritize
remediation, apply remediation.
• Microservices and CM: Increased need for consistent CM process due to many code
segments connected by APIs. Metadata encodes specs (resource allocation, secrets,
endpoints).
• "Single Version of the Truth": Provides visibility into config modifications, audit trails, tracks
every change.
• How it Works: Gather config data (apps, network, secrets), load into central repository.
Establish baseline config (known good state, often production). Adopt version control (Git).
Auditing/accounting ensure changes reviewed/accepted.
• Benefits: Avoids problems from improper configs, prevents expensive remediation. Ensures
dev/test/prod environments are same. Recreates environments for error diagnosis or
scaling/migration. Automates tasks, rapid provisioning.
• Risks of Not Using CM: Downtime, errors, unplanned expenses, wasted time, quality issues,
missed deadlines. Inability to understand system components/impact of changes. Difficult root
cause analysis, remediation, SLA maintenance.
Deployment Automation
• Definition: Deploying software to testing/production environments with "push of a button."
Reduces risk, provides fast feedback on quality.
• Inputs: Packages from CI, scripts (config, deploy, test), environment-specific config.
Recommend storing scripts/config in version control. Packages from artifact repository.
• Script Tasks: Prepare target environment, deploy packages, database migrations, perform
configuration, deployment test (smoke test).
• Implementation Best Practices: Same deployment process for every environment (including
production), allow anyone with credentials to deploy any version on demand, use same
packages for every environment (separate config), recreate state of any environment from
version control.
• Ideal Tool: Autonomous deployments, records builds in environments, records output for audit.
Many CI tools have these.
• Common Pitfalls:
◦ Complexity: Automating complex manual process yields complex automated process.
Re-architect for deployability (simple deployment script, complexity in app code/infra platform).
Steps should be idempotent and order independent.
◦ Dependencies: Orchestration required for tightly coupled services. Goal: independently
deployable services (backward compatibility, API versioning, resiliency patterns like circuit
breakers, parallel change for DB upgrades).
◦ Not Designed for Automation: Manual interaction with consoles. Use APIs (most platforms
offer).
◦ Poor Collaboration: Devs and IT Ops not in sync (different deployment methods,
inconsistent environments). Must be created by both working together.
• Ways to Improve: Document existing process, incrementally simplify/automate (packaging, VM
images, middleware automation, file copying, server restarts, config file generation, automated
deployment tests, scripting DB migrations). Remove manual steps, implement
idempotence/order independence, leverage infrastructure.
• Ways to Measure: Count manual steps, measure % automation, determine time spent on
delays.
Performance Management
• Definition: Processes and systems for developing employees to perform optimally, reach
potential, boost success, and achieve organizational goals. Continuous conversation between
employees, managers, HR.
• Goals: Establish clear expectations, set SMART goals (align with org goals), provide ongoing
feedback, evaluate results. Linked to career decisions.
• Importance: Future-proofing workforce skills, increased employee engagement (transparency,
L&D, clear career path), higher employee retention, culture of feedback/trust, improved
organizational performance (revenue growth, customer satisfaction).
• Stages of Performance Management Cycle:
1. Planning: Establish expectations, set SMART goals, ensure flexibility. Employee actively
involved. HR provides tools/training to managers.
2. Monitoring: Regularly monitor performance against goals, provide feedback. Use software
for real-time tracking, but not a substitute for face-to-face.
3. Developing: Analyze monitoring data to boost performance. Suggest training, coaching,
extra projects.
4. Rating & Rewarding: Rate performance regularly (appraisal). Quantify value, make
changes. 360-degree feedback. Recognize/reward superior performance (praise, raise,
promotion). Address sub-par performance (move, dismissal).
• Best Practices: Evaluate current state, choose right approach (behavioral vs. results-oriented,
role-dependent), meet/train managers (constructive feedback, continuous dialogue), help set
SMART goals (personalized KPIs), apply continuous performance management (ongoing
dialogue, frequent check-ins), set up formal system (rewards for top performers), help workers
create development plans, employ technology (HR tech, software for data/insights).
• Examples: HSBC (HR mobile app for achievements, feedback, L&D), Deloitte (frequent
check-ins, career coaches, data for trends), IKEA (train-the-trainer coaching program for
managers, KPI increase).
Log Management
• Definition: Continuous process of centrally collecting, parsing, storing, analyzing, and
disposing of log data/events. Provides actionable insights for troubleshooting, performance,
security.
• Log File: Computer-generated data file recording activities, operations, usage patterns
(applications, servers, OS, devices). Critical for identifying root causes of problems.
• Common Log Types:
◦ Application logs: Events within applications.
◦ System logs: Events within OS (driver errors, sign-in/out).
◦ Security logs: Security events (unsuccessful logins, failed auth, password changes, file
deletion).
• How to Collect/Organize: Centralized log aggregation and standardization (automate,
transform diverse formats). Event correlation (machine learning/rules-based to connect events).
Log search and analysis (intuitive search, intelligent analysis). Log reporting and visualization
(automated reports, customizable dashboards).
• Benefits of Log Monitoring:
◦ Improved operational efficiency: Overcome manual inspection challenges, faster,
cost-effective.
◦ Efficient resource utilization: Pinpoint performance issues early.
◦ Proactive troubleshooting: Better insight, real-time alerts (minimize MTTD/MTTR).
◦ Better end-user experience: Applications free from challenges, monitor requests at multiple
levels, timely root cause identification.
◦ Greater security: Key source for detecting breaches, predicting threats. Monitor network
access, system, authentication logs for suspicious activity.
Monitoring (DevOps Monitoring)
• Definition: Overseeing the entire development process (planning, development, integration,
testing, deployment, operations). Real-time view of application, service, infrastructure status in
production.
• DevOps Monitoring vs. Observability: Monitoring tells you if something is wrong. Observability
tells you why it is wrong. Observability covers the full stack.
• Importance:
◦ Frequent code changes demand visibility: Complex production environments
(microservices) require greater visibility to detect/respond to degraded customer experience.
◦ Automated collaboration: Overcomes lack of tool integration, coordinates teams. Automate
via workflows (Jira updates, Slack notifications).
◦ Experimentation: Constant experimentation (personalization, conversion funnels) makes it
challenging for monitoring systems to communicate cause of degraded experience. Define
SLOs/SLIs.
◦ Change management: Most outages caused by changes. Determine risks, automate
approval flows based on risk.
• Dependent System Monitoring: Common in distributed systems. Need to monitor and manage
performance/availability of dependent systems (e.g., AWS services). Instrumentation/strategies
for tracing errors and handling failures.
• Key Capabilities:
◦ Shift-left testing: Perform testing/monitoring earlier in life cycle to increase quality, shorten
cycles, reduce errors.
◦ Alert and incident management: Embrace incidents, high-quality monitors. Best practices:
collaboration culture, high-quality alerts (minimize MTTD/MTTI), monitor dependent services,
allocate dashboard/training time, plan "war games," close incident review actions, build security
detectors, "measure and monitor everything" mindset.
• DevOps Monitoring Tools:
◦ Single pane of glass: Comprehensive view of apps, services, infrastructure (production &
staging).
◦ Application performance monitoring (APM): For app-specific performance indicators (page
load, latencies). Tools: SignalFX, NewRelic.
◦ Implement different types of monitors: Errors, transactions, synthetic, heartbeats, alarms,
infrastructure, capacity, security. Application-specific.
◦ Alert and incident management system: Integrates with other tools (log management, crash
reporting). Sends alerts to preferred channels, groups alerts.
Unit V: MLOps
What is MLOps?
• Definition: Machine Learning Operations. Core function of ML engineering, focused on
streamlining taking ML models to production, and then maintaining and monitoring them.
Collaborative function (data scientists, DevOps engineers, IT).
• Use: Creation and quality of ML/AI solutions. Adopting MLOps increases pace of model
development/production by implementing CI/CD practices with proper monitoring, validation,
governance.
• Why Needed: Productionizing ML is difficult due to complex lifecycle (data ingest, prep,
training, tuning, deployment, monitoring, explainability). Requires collaboration/hand-offs across
teams (Data Engineering, Data Science, ML Engineering). Requires stringent operational rigor.
Encompasses experimentation, iteration, continuous improvement.
• Benefits:
◦ Efficiency: Faster model development, higher quality models, faster deployment/production.
◦ Scalability: Oversee thousands of models for CI/CD. Reproducibility of ML pipelines, tighter
collaboration, reduced conflict with DevOps/IT, accelerated release velocity.
◦ Risk Reduction: Regulatory scrutiny, drift-check, greater transparency, faster response,
compliance.
• Components: Exploratory data analysis (EDA), Data Prep and Feature Engineering, Model
training and tuning, Model review and governance, Model inference and serving, Model
monitoring, Automated model retraining.
• Best Practices (by stage):
◦ EDA: Iteratively explore, share, prep data; reproducible/editable/shareable datasets, tables,
visualizations.
◦ Data Prep/Feature Engineering: Iteratively transform, aggregate, de-duplicate data; create
visible/shareable features (feature store).
◦ Model Training/Tuning: Use open source libs (scikit-learn, hyperopt) or AutoML for
automatic trial runs/code creation.
◦ Model Review/Governance: Track model lineage/versions, manage artifacts/transitions
(e.g., MLflow).
◦ Model Inference/Serving: Manage refresh frequency, inference times in testing/QA. Use
CI/CD tools (repos, orchestrators) for pre-production pipeline.
◦ Model Deployment/Monitoring: Automate permissions/cluster creation to productionize
registered models. Enable REST API endpoints.
◦ Automated Model Retraining: Create alerts/automation for corrective action on model drift.
• MLOps vs. DevOps: MLOps is ML-specific engineering practices borrowing from DevOps
principles in software engineering. Both aim for higher software quality, faster patching/releases,
higher customer satisfaction.
• MLOps Platform: Collaborative environment for data scientists/software engineers. Facilitates
iterative data exploration, real-time co-working (experiment tracking, feature engineering, model
management), controlled model transitioning/deployment/monitoring. Automates
operational/synchronization aspects of ML lifecycle.
MLOps Challenges and Potential Solutions
• CRISP-ML(Q) Process Model: Business and Data Understanding, Data Engineering, Model
Engineering, Quality Assurance for ML Systems, Deployment, Monitoring and Maintenance.
• Stage 1: Defining Business Requirements:
◦ Challenges: Unrealistic expectations (AI as magic), Misleading success metrics (poor
analysis).
◦ Solutions: Technical leads explain feasibility/limitations, deep analysis for realistic metrics
(high-level for customer, low-level for developers).
• Stage 2: Data Preparation:
◦ Challenges: Data discrepancies (multiple sources, format mismatches), Lack of data
versioning (evolving data, different processing/updates).
◦ Solutions: Centralize data storage, universal mappings, create new data versions (store
metadata for space optimization).
• Stage 3: Running Experiments:
◦ Challenges: Inefficient tools/infrastructure (chaotic, resource-heavy, reliance on notebooks),
Lack of model versioning (hyperparameter changes, data influencing controlled elements),
Budget constraints.
◦ Solutions: Seek virtual hardware subscriptions (AWS, IBM Bluemix), perform experiments
on scripts, record every model version (e.g., [Link]), cost-benefit analysis for resources.
• Stage 4: Validating Solution:
◦ Challenges: Overlooking meta performance (memory/time consumption,
hardware/production limits), Lack of communication (not involving all stakeholders), Overlooking
biases (model trained on biased samples).
◦ Solutions: Consider meta metrics (memory, time, hardware), involve business stakeholders
(link model performance to business KPIs), validate on multiple data combinations (less biased
samples, retrain).
• Stage 5: Deploying Solution:
◦ Challenges: Surprising IT department (no communication, short notice demands), Lack of
iterative deployment, Suboptimal company framework (Python ML solution in Java framework),
Long-chain of approvals.
◦ Solutions: Involve IT early (insights, common elements), iterative deployment (sprints for
modules), invest in separate ML stack or leverage virtual environments (Docker, Kubernetes),
restrict code references to verified codebases (TensorFlow, scikit-learn) to shorten approval
time.
• Stage 6: Monitoring Solution:
◦ Challenges: Manual monitoring (demanding, wasteful, no instant alerts), Change of data
trends (abrupt external factors).
◦ Solutions: Automate monitoring/alerts, study recent monitoring data (set up retraining
process), keep data up-to-date/fresh (automated crawlers).
• Stage 7: Retraining Model:
◦ Challenges: Lack of scripts (manual, reduces efficiency), Deciding triggering threshold
(when to kickstart retraining), Deciding degree of automation (some solutions need manual
intervention).
◦ Solutions: Create ML pipeline script (conditional calls for sub-modules), weigh business
stakeholder view for retraining triggers (e.g., KPI drops), observe performance deviation to
determine automation level (minimal deviation: automation; significant change: EDA).
Developing Models
• Importance: Understanding model development is crucial for MLOps. It dictates constraints for
subsequent usage, monitoring, and maintenance.
• ML Project Life Cycle: Scoping (define project), Data collection (acquire, define, baseline,
label, organize), Train model (select, train, error analysis, final check/audit), Deployment (first
time, then iterative).
• Machine Learning Model (Theory): Projection of reality, partial/approximate representation.
Mathematical formula yielding results from inputs (probability, estimated value). Based on
statistical theory, algorithms build models from training data (synthetic representation). Predicts
when future resembles past.
• Generalization Capacity: Ability to accurately predict for cases not exactly like input data (e.g.,
house prices, health diagnosis).
• Machine Learning Model (Practice): Set of parameters to rebuild and apply formula. Stateless,
deterministic. Includes transformations from input data to end formula (e.g., zip codes to derived
inputs). Outputs can be richer (probability, confidence interval, recommendations).
Transformations are part of the model but not always monolithically bundled.
Machine Learning in Production (MLOps)
• Introduction: ML models are great, but value comes from production deployment. Requires
thinking through ML project lifecycle.
• ML Infrastructure Requirements: ML model code is a small fraction (5-10%) of a real-world ML
system. Vast and complex surrounding infrastructure needed (data collection, data verification,
feature extraction, monitoring). Leads to "POC to production gap."
• The Machine Learning Project Lifecycle (Major Steps): Scoping, Data Collection, Train Model,
Deployment, Monitoring. Highly iterative, especially error analysis looping back to model/data
collection.
• Deployment Challenges:
◦ ML/Statistical Issues: * Concept Drift & Data Drift: Data changes after deployment (e.g.,
lighting changes, COVID-19 impacting credit card fraud). Data drift (input X changes), Concept
drift (mapping from X to Y changes).
◦ Software Engineering Issues: Real-time vs. Batch predictions (affects software
implementation).
• Deployment Patterns:
◦ New Product/Capability: Start small, gradually ramp up traffic.
◦ Automating/Assisting Human Task: * Shadow Mode Deployment: ML algorithm shadows
human, runs in parallel. ML output not used for decisions, only for gathering performance data
compared to human judgment. Verifies performance before real decisions. * Canary
Deployment: Roll out to small fraction of traffic (e.g., 5%), let algorithm make real decisions.
Monitor, gradually ramp up traffic. Spot problems early with minimal consequences. * Blue
Green Deployment: Old software (blue) vs. new (green). Router switches traffic from old to new.
Advantage: easy rollback. Typically 100% traffic switch.
• Degrees of Automation: Spectrum from human-only to full automation.
◦ No automation (human only).
◦ AI assistance (AI highlights, human decides, UI critical).
◦ Partial automation (AI decides if confident, otherwise human). Good for applications where
full automation isn't feasible.
◦ Full automation (AI makes every decision). Common in consumer internet apps.
◦ Many applications start from human-in-the-loop and gradually move right.
Monitoring (MLOps)
• Purpose: Ensure ML system meets performance expectations.
• Method: Dashboards tracking performance over time. Brainstorm potential problems, then
brainstorm metrics to detect them. Start with many metrics, gradually remove less useful ones.
• Types of Metrics:
◦ Software Metrics: Memory, compute, latency, throughput, server load (monitor software
health).
◦ Statistical Health/Performance Metrics: Input data (average values, missing values,
outliers), output data (mean prediction, standard deviation, distribution), accuracy (precision,
recall, F1, AUC), fairness metrics. Output metrics (e.g., click-through rate) for overall system
health.
◦ Application-Specific: Input/output metrics configured for specific application.
• Iterative Deployment: ML modeling is iterative, so is deployment. First deployment with
monitoring dashboards is just the start. Real user data leads to performance analysis, updates,
and continued monitoring.
• Alarms and Thresholds: Set thresholds for metrics to trigger alarms/notifications (e.g., server
load, missing values). Adapt metrics/thresholds over time. If accuracy problems, update model.
• Pipeline Monitoring: Many AI systems are pipelines of multiple steps. Monitor cascading
effects (e.g., clickstream data changes impacting user profile, then recommendations). Alert for
significant changes (e.g., unknown tags).
MLOps and Model Governance
• Relationship: Tightly interwoven. MLOps is extension of DevOps for ML. Model Governance is
processes/frameworks for ML deployment, ensuring responsibility/compliance.
• Model Governance - New Challenge: Often overlooked until deployment. Many automated ML
pipelines fail compliance. Survey: 56% consider governance a big challenge.
• Model Governance Will Not Be Optional: Legal regulations (e.g., EU AI regulation draft - April
2021) classify AI systems by risk.
◦ Unacceptable Risk: Forbidden (social scoring).
◦ High Risk: Strict requirements (robustness, security, accuracy, docs/logging, risk
assessment, high-quality data, non-discrimination, traceability, transparency, human monitoring,
conformity testing, CE marking). Examples: credit scoring, education systems.
◦ Limited Risk: Transparency obligation (chatbot users informed).
◦ Minimal Risk: No regulation (spam filters).
◦ EU law applies to any company offering AI services in EU (similar to GDPR). Conformity
assessment before deployment for high-risk.
◦ Challenges in EU Draft: Blurry AI definition, unclear how to apply requirements.
◦ Beyond Legal: Also relevant for low-risk ML with high business risks (e.g., spam filter
malfunction impacts market position). Ensures quality.
• Integration of Model Governance and MLOps (depends on):
◦ Strength of regulations (business domain, ML model risk category, business risk).
◦ Number of ML models.
• Variant 1: Many Models and Strict Regulation: Most complex. Model governance and MLOps
equally important, closely integrated into every MLOps lifecycle step. Examples: healthcare,
financial sectors, high-risk systems in critical infrastructure or human decision-making.
◦ Framework for Model Governance: * Development: Reproducibility (metadata,
documentation, registry, evaluation/validation). Documentation (business context, algorithm,
parameters, features, reproduction instructions, examples). Tools: Data Sheets, Model Cards.
Validation (performance indicators, business KPIs, explainability). * Deployment & Operations:
Observation, Visibility, Control (logging, continuous monitoring/evaluation, cost transparency,
versioning of models/data, tracking metadata/artifacts). * Monitoring and Alerting: Continuous
monitoring/evaluation, automated alerts for deviations. * Model Service Catalog: Internal
marketplace of ML models, good UX, connected to storage, displays metadata. * Security: ML
security (cyber security attacks), adherence to IT standards (DNS, proxies, load balancing),
management of endpoints, authentication (SSO, RBAC), key/secret management, security
audits. Protection against ML-specific attacks (Adversarial ML Threat Matrix). * Conformity and
Auditability: Collect relevant info (logging, documentation, audit results, conformity testing, CE
mark). Compliance with security. Very complex, needs experts.
• Variant 2: Many Models and Little Regulation: MLOps focus stronger, model governance part
of MLOps, not standalone. Uses Google's MLOps framework (ML Development, Training
Operationalization, Continuous Training, Model Deployment, Prediction Serving, Continuous
Monitoring, Data and Model Management).
◦ Training Pipeline: Formalization of training pipeline (data- & model-engineering). Triggered
by performance drop/distribution shifts. Data/feature repositories for
standardization/consistency. Model evaluated, validated, saved in registry. Metadata/artifacts
saved.
◦ Model Governance as Part of Model Management: Recording, auditing, validation, approval,
monitoring. Uses ML metadata, artifact repo, model registry. Components: Storage/Versioning,
Evaluation/Explainability (shadow deployment), Testing, Release, Report.
• Variant 3: Few Models and Little Regulation: Limited scope of regulations, low number of
models. Model governance optional but recommended for quality assurance. Development
phase components relevant.
• Variant 4: Few Models and Strict Regulation: Strict regulation, low number of models. MLOps
less important, but close integration of model governance with MLOps remains (governance
covers complete MLOps life cycle).
• Summary - Main Components of Model Governance (Common to all variants):
◦ Comprehensive model documentation/reports (metrics, visualization).
◦ Versioning of all models (transparency, explainability, reproducibility).
◦ Auditing of ML systems (automated approval or CE certification).
◦ Comprehensive data documentation (quality, protection).
◦ Management of ML metadata.
◦ Validation of ML models.
◦ Continuous monitoring and logging of model metrics.
• Conclusion: MLOps provides infrastructure for model governance. Both are needed for
successful ML model deployment.
--------------------------------------------------------------------------------
Quiz: DevOps and Microservices
Instructions: Answer each question in 2-3 sentences.
1. Software Engineering: Why is Software Engineering considered essential for managing the
dynamic nature of software?
2. SDLC Models: Briefly explain the key difference in approach between the Waterfall Model
and the Spiral Model in software development.
3. DevOps Definition: What is DevOps, and how does it emphasize people, processes, and
automation?
4. DevOps Life Cycle: Describe the role of Continuous Integration (CI) as the "heart" of the
DevOps lifecycle.
5. Cloud Service Models: Distinguish between IaaS and PaaS in terms of what they provide to
users.
6. Containers: How do containers achieve being lightweight and portable compared to virtual
machines (VMs)?
7. Multi-Data Centers: Explain the primary purpose of deploying applications across Multi-Data
Centers (MDC) for large organizations.
8. Deployment Pipeline: What is a deployment pipeline, and what is its main benefit in modern
software development?
9. Integration Testing: Why is integration testing necessary even after individual units have been
thoroughly tested?
10. MLOps Definition: Define MLOps and identify the primary benefits it offers to machine
learning projects.
--------------------------------------------------------------------------------
Answer Key
1. Software Engineering: Software Engineering is crucial for managing the dynamic nature of
software because it provides a scientific and engineering approach to adapt to continually
growing and changing user requirements and environments. This structured procedure makes it
simpler to update and scale existing software, rather than having to recreate it from scratch,
ensuring that quality is maintained even with frequent upgrades.
2. SDLC Models: The Waterfall Model follows a strictly linear, sequential approach where each
phase must be completed before the next begins, making it suitable for projects with stable
requirements. In contrast, the Spiral Model is evolutionary and iterative, involving repeated
cycles of planning, risk analysis, engineering, and evaluation, which makes it more flexible and
suitable for experimental projects or those with evolving requirements.
3. DevOps Definition: DevOps is a culture within an organization aimed at continuously
delivering projects to end-users. It achieves this by prioritizing cross-functional collaboration
(people), refining established procedures (processes), and automating various stages of the
software delivery pipeline (automation), fostering a more integrated and efficient workflow.
4. DevOps Life Cycle: Continuous Integration (CI) is considered the "heart" of the DevOps
lifecycle because it involves developers frequently committing code changes to a central
repository. Each commit automatically triggers builds, unit tests, integration tests, code reviews,
and packaging, allowing for early detection of problems and ensuring that the code is
continuously integrated and ready for further stages.
5. Cloud Service Models: IaaS (Infrastructure as a Service) provides virtualized computing
infrastructure over the internet, such as virtual machines, storage, and networks, giving users
control over operating systems and applications. PaaS (Platform as a Service), on the other
hand, provides a complete platform for developing, running, and managing applications,
abstracting away the underlying infrastructure management from developers.
6. Containers: Containers are lightweight and portable because, unlike VMs, they virtualize the
operating system and do not need to include a full guest OS in every instance. Instead, they
share the host OS kernel, making their files smaller and allowing them to spin up quickly and
run consistently across various computing environments.
7. Multi-Data Centers: The primary purpose of deploying applications across Multi-Data Centers
(MDC) for large organizations is to distribute workload efficiently and ensure business continuity
and disaster recovery. This setup provides load balancing, allowing traffic to be routed based on
various criteria, and offers failover capabilities in case one data center or node becomes
unavailable, ensuring high availability and resilience.
8. Deployment Pipeline: A deployment pipeline is an automated system of processes in
software development designed to move new code additions and updates from version control
to production quickly and accurately. Its main benefit is the elimination of tedious manual tasks,
which reduces human error, accelerates release cycles, and allows development teams to focus
more on innovation and improving the end product.
9. Integration Testing: Integration testing is necessary even after individual units are tested
because it verifies if combined units work as intended, specifically focusing on the interfaces
between modules. Individual unit testing cannot detect issues that arise from different
programming logics, changing user requirements that affect module interactions, or problems
with data formatting, hardware, or third-party service interfaces when modules are put together.
10. MLOps Definition: MLOps (Machine Learning Operations) is a set of engineering practices
focused on streamlining the process of taking machine learning models into production and
subsequently maintaining and monitoring them. Its primary benefits include increased efficiency
(faster development, deployment), vast scalability (managing thousands of models), and
significant risk reduction (regulatory compliance, transparency, faster response to model drift).
--------------------------------------------------------------------------------
Essay Questions
1. DevOps Implementation Challenges: Discuss the key barriers organizations face when
implementing DevOps, and propose strategies to overcome each of these challenges, drawing
on principles and practices mentioned in the source material.
2. Traditional vs. Agile vs. DevOps: Compare and contrast the Waterfall, Agile, and DevOps
approaches to software development. Analyze how DevOps attempts to address the limitations
and build upon the strengths of both traditional and Agile methodologies, particularly in the
context of continuous delivery.
3. Cloud Service Models in DevOps: Explain how the three primary cloud service models (IaaS,
PaaS, and SaaS) can be utilized within a DevOps framework. Provide specific examples of how
each model contributes to the goals of automation, scalability, and efficiency in a DevOps
pipeline.
4. The Interplay of Testing in the Deployment Pipeline: Describe the various types of testing
(e.g., unit, integration, acceptance, non-functional, early release) that occur throughout a
comprehensive software deployment pipeline. Explain how each type of testing contributes to
building confidence in the system's correctness as it moves towards production.
5. MLOps: A Specialized Extension: Elaborate on how MLOps is a specialized extension of
DevOps for machine learning projects. Discuss the unique challenges faced in deploying and
managing ML models in production that necessitate MLOps practices, and explain how MLOps
principles address these specific challenges, including concept/data drift and model
governance.
--------------------------------------------------------------------------------
Glossary of Key Terms
• A/B Testing: A method of early release testing where two versions of an application are
deployed to different user segments to determine which performs better in terms of specific
business-level key performance indicators.
• Access Management: A service operation process that deals with granting appropriate rights
to authorized users to utilize a service.
• Agile Model: A software development approach based on iterative development, breaking
tasks into smaller iterations (typically 1-4 weeks) and emphasizing flexibility and continuous
feedback.
• API Gateway: In a microservices architecture, this serves as the entry point for clients,
forwarding calls to appropriate backend services and handling cross-cutting functions like
authentication and load balancing.
• Application Logs: Computer-generated data files that record events occurring within an
application, used by developers to understand and measure application behavior.
• Automated Acceptance Tests: The automated version of User Acceptance Tests (UATs) that
control an application through its UI to mirror human user actions, enabling higher repetition
rates.
• Automated Policy Sync (APS): A set of REST APIs used to automatically replicate data from a
Master data center to Clone data centers in a Multi-Data Center environment.
• Big Bang Integration Testing: An integration testing approach where all modules are developed
and tested individually before being integrated and tested together at once.
• Blue Green Deployment: A deployment pattern where an old version of software ("blue") runs
alongside a new version ("green"), and traffic is suddenly switched from blue to green to enable
easy rollback.
• Canary Testing: A method of early release testing (cloud equivalent of beta testing) where a
new version of software is deployed to a small fraction of servers first, and its performance is
monitored before wider rollout.
• Capacity Planning: The process of determining how much production capacity is required to
meet changing demand for products, involving balancing available resources to satisfy needs.
• CI/CD Pipeline: A system that continuously compiles, validates, and deploys new code
updates as they are written, flowing through stages like build, test, staging, and production.
• Cloud Computing Operations: The delivery of superior cloud services and the performance of
business operations over the Internet, emphasizing efficiency, cost-effectiveness, and quality.
• Configuration Management (CM): The process of maintaining IT systems (hardware and
software) in a desired state, ensuring consistent performance and preventing undocumented
changes.
• Continuous Delivery (CD): An extension of Continuous Integration where software is always in
a deploy-ready state, with automated tests and pre-production stages passed, requiring only a
final approval for production deployment.
• Continuous Deployment: A software delivery practice where every change that passes
automated tests is automatically released to production without human intervention.
• Continuous Development: The initial phase of the DevOps lifecycle involving the planning and
coding of software, where the project vision is decided and development of code begins.
• Continuous Feedback: A phase in the DevOps lifecycle where application development is
consistently improved by analyzing the results from the operations of the software, often through
feedback loops from end-users.
• Continuous Integration (CI): A software development practice where developers frequently
merge their code changes to a central branch, triggering automated builds and tests, allowing
for early detection of integration problems.
• Continuous Monitoring: A phase in the DevOps lifecycle that involves real-time observation of
application and infrastructure performance and behavior, collecting information to identify trends
and problems.
• Continuous Operations: The final phase in the DevOps lifecycle, focusing on the continuity and
full automation of the release process to accelerate overall time to market.
• Continuous Testing: A phase in the DevOps lifecycle where developed software is
continuously tested for bugs, often using automation testing tools, to ensure functionality before
deployment.
• Concept Drift: A phenomenon in machine learning where the relationship between input
variables (x) and the target variable (y) changes over time after a model has been deployed.
• Containers: Executable units of software that package application code along with its libraries
and dependencies in common ways, leveraging operating system virtualization to be portable
and run consistently anywhere.
• Containerization: The process of designing and packaging software, along with its relevant
environment variables, configuration files, libraries, and software dependencies, to run efficiently
within containers.
• CRISP-ML(Q): A process model for the ML life cycle, outlining phases such as Business and
Data Understanding, Data Engineering, Model Engineering, Quality Assurance for ML Systems,
Deployment, Monitoring and Maintenance.
• Crystal: An Agile development methodology focusing on three concepts: chartering, cyclic
delivery, and wrap-up, adaptable to project size and criticality.
• Customer SLAs: The most common type of Service Level Agreement, representing a contract
between a service provider and an external customer.
• Data Drift: A phenomenon in machine learning where the statistical properties of the input data
(x) change over time, leading to a decline in model performance.
• Deployment Automation: The process of automating the deployment of software to testing and
production environments, aiming to reduce manual risk and provide fast feedback on software
quality.
• Deployment Pipeline: A system of automated processes designed to quickly and accurately
move new code additions and updates from version control to production in software
development.
• DevOps: A cultural and professional movement that stresses communication, collaboration,
integration, and automation to improve the flow of work between software development and IT
operations teams.
• DevOps Monitoring: Overseeing the entire development process from planning to operations,
providing a complete and real-time view of application, services, and infrastructure status in the
production environment.
• Dynamic Software Development Method (DSDM): A rapid application development strategy
within Agile, emphasizing active user involvement and team decision-making, using techniques
like Time Boxing and MoSCoW Rules.
• eXtreme Programming (XP): An Agile methodology used when customers have constantly
changing demands or requirements, or are unsure about the system's performance,
emphasizing frequent releases and customer collaboration.
• Feature Driven Development (FDD): An Agile method that focuses on "Designing and
Building" features, breaking down work into small, separately obtainable steps per function.
• Feature Toggles: Also known as feature flags or switches, these are "if" statements around
immature code that allow new, unfinished features to be disabled in the source code, preventing
their deployment into production until ready.
• Hardware Provisioning: The process of configuring and making physical computing hardware
(like servers, storage, memory, and network) available for use by software applications.
• IaaS (Infrastructure as a Service): A cloud computing service model that provides virtualized
computing resources over the internet, such as virtual machines, virtual storage, and networks,
managed by the user.
• Identity Management: Software that manages user access to information to increase security
and efficiency, controlling who has permission to access applications, databases, and other IT
assets.
• Indemnification Clause: A provision in an SLA where one party (the indemnitor) agrees to take
full responsibility for any liabilities, damages, or losses suffered by the other party in case of a
contract breach.
• Infrastructure as Code (IaC): A DevOps practice where the provisioning and management of
infrastructure (networks, virtual machines, load balancers, etc.) are performed using code,
typically in JSON or YAML files, to ensure standardization and automation.
• Integration Testing: A level of software testing where individual units or modules are combined
and tested together to verify if they work as intended when integrated, focusing on the interfaces
between them.
• Internal SLAs: Service Level Agreements designed to establish and adhere to service
standards within a specific company or organization, often functioning between different teams
or departments.
• Kubernetes: An open-source container orchestration platform that automates the deployment,
scaling, and management of containerized applications.
• Lean Software Development: A software development methodology based on the "just-in-time
production" principle, aimed at increasing development speed and reducing costs by eliminating
waste and amplifying learning.
• Live Testing: A form of testing performed on a running system in production, where the system
is deliberately perturbed to observe its behavior and performance under various conditions.
• Log Management: A continuous process of centrally collecting, parsing, storing, analyzing, and
disposing of massive amounts of log data and events to provide actionable insights for
troubleshooting, performance enhancement, or security monitoring.
• Machine Learning Model: In practice, a set of parameters necessary to rebuild and apply a
mathematical formula that yields a result (e.g., probability, estimated value) when fed certain
inputs.
• Master-Clone Deployment: A Multi-Data Center topology where one data center is designated
as the Master and one or more other data centers act as Clones, mirroring the Master for load
distribution and disaster recovery.
• Microservices Architecture: A design approach to build a single application as a set of small,
autonomous, and loosely coupled services, each implementing a specific business capability.
• MLOps: Machine Learning Operations, a core function of Machine Learning engineering
focused on streamlining the process of taking machine learning models to production and then
maintaining and monitoring them.
• Model Governance: A set of processes and frameworks that help in the responsible and
compliant deployment of ML models, ensuring adherence to legal requirements and internal
guidelines.
• Multilevel SLAs: Service Level Agreements that divide a contract into multiple levels for
occasions involving more than one service provider or end user, potentially catering to different
service levels and price ranges.
• OAM_GITO (Global Inactivity Time Out) Cookie: A domain cookie used in Multi-Data Center
environments to facilitate time out tracking across WebGate agents, containing session and
access time details.
• OAM_ID Cookie: The Single Sign-On (SSO) cookie for Access Manager in Multi-Data Center
deployments, holding attributes necessary for enabling MDC behavior and session adoption
across data centers.
• Observability: The ability to understand the internal state of a system based on its external
outputs, covering the full stack of application monitoring from alerting to tracing every request.
• Operating System Virtualization: A type of virtualization where the virtual machine software
(VMM) is installed on the host operating system rather than directly on the hardware, commonly
used for testing applications across different OS platforms.
• PaaS (Platform as a Service): A cloud computing service model that provides a platform for
programmers to develop, test, run, and manage applications without the complexities of building
and maintaining the underlying infrastructure.
• Performance Management: A set of processes and systems aimed at developing employees
to perform their job to the best of their ability, aligning individual goals with organizational
objectives.
• Production Deployment: The final stage of the deployment pipeline where validated code is
made live for end-users, typically handled by DevOps or operations teams with the goal of zero
downtime.
• Regression Testing: A type of software testing that seeks to uncover new software bugs or
regressions in existing functional and non-functional areas of a system after changes have been
made.
• SaaS (Software as a Service): A cloud computing service model where applications are hosted
by a cloud service provider and made available to users over the internet, typically accessed via
a web browser.
• Sandwich Integration Testing: A hybrid integration testing approach that combines elements of
both top-down and bottom-up integration testing, often used for large projects with multiple
sub-projects.
• Scrum: An Agile development process focused on managing tasks in team-based
development conditions, involving roles such as Scrum Master, Product Owner, and Scrum
Team.
• SDLC (Software Development Life Cycle): A process that outlines a standard procedure for
designing and developing software, encompassing various models and methodologies from
conception to deployment and maintenance.
• Service Concept: A central model that outlines how a service provider can realize the value
and desired outcomes of its services, describing how the organization wishes its services to be
perceived by stakeholders.
• Service Desk: The first and single point of contact in Service Operation, playing a vital role in
customer satisfaction and coordinating activities between end-users and the IT service provider
team.
• Service Level Agreement (SLA): A documented understanding between a service provider and
a customer that defines service expectations, responsibilities, and consequences for unmet
obligations.
• Service Operation: A stage in IT service management that ensures services are being
provided efficiently and effectively as per Service Level Agreements, including monitoring,
incident resolution, and request fulfillment.
• Shadow Mode Deployment: A deployment pattern in ML where a new machine learning
algorithm runs in parallel with a human or existing system, but its output is not used for any
real-world decisions, primarily for gathering performance data.
• Software Engineering: An engineering branch related to the evolution of software products
using well-defined scientific principles, techniques, and procedures to create effective and
reliable software.
• Software Provisioning: The comprehensive process of preparing and deploying software,
including environment setup, promotion between environments, testing, quality assurance, and
reporting.
• Spiral Model: An evolutionary software development model that emphasizes iterative
development through a series of prototypes, constantly passing through phases of planning, risk
analysis, engineering, and evaluation.
• Stubs and Drivers: Dummy programs used in integration testing to simulate the functionality of
modules that are not yet developed or integrated. Stubs simulate called programs, while drivers
simulate calling programs.
• Stretch Cluster Deployments: A Multi-Data Center configuration where a single OAM cluster is
extended across multiple geographically close data centers, treated as one cluster, but with
certain limitations regarding latency and reliability.
• Telemetry: The in-depth monitoring of applications and infrastructure used by DevOps teams
to gain complete observability and respond quickly to changes or issues.
• Test Harness: A collection of software and test data configured to test a program unit by
running it under varying conditions and monitoring its behavior and output, essential for test
automation.
• Test-driven Development (TDD): A software development philosophy where an automated test
for a piece of functionality is developed before the actual code, and the code is then written with
the goal of passing that test.
• Top-Down Integration Testing: An incremental integration testing approach that starts by
testing the top-most modules and gradually moves down to lower modules, using stubs for
modules not yet developed.
• Traceability: In software deployment, the ability to determine exactly how a system came to be
in production, by tracking source code, commands, and tools used throughout the pipeline.
• UAT (User Acceptance Tests): Tests where prospective users interact with a current revision of
the system through its user interface to verify that it meets business requirements and user
expectations.
• Unit Tests: Code-level tests that check individual classes and methods, often run automatically
as pre-commit tests before code is integrated.
• Virtualization: The creation of a virtual (rather than actual) version of something, such as a
server, desktop, storage device, or operating system, allowing a single physical instance to be
shared among multiple users.
• Waterfall Model: A traditional, linear-sequential software development model where
development flows downward through distinct phases, suitable for projects with clearly defined
and stable requirements.
• WebGate Agent: A component in Access Manager used to protect applications and configured
against Access Manager clusters in data centers.
• Zero Trust Security: A modern security model that restricts access until a user has been
thoroughly verified, moving away from perimeter-based security and assuming no inherent trust.