100% found this document useful (1 vote)

470 views25 pages

Developing A Google SRE Culture

This document is an introduction to a workbook for developing a Google SRE (Site Reliability Engineering) culture. The workbook contains key points and reflection exercises for modules on SRE topics. It recommends reviewing the exercises after completing video lessons for each module. The first module introduces SRE concepts like balancing development velocity with reliability and how SRE can benefit teams.

Uploaded by

yazid yazidcoders

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

470 views25 pages

Developing A Google SRE Culture

Uploaded by

yazid yazidcoders

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Developing a

Google SRE Culture

Learner Workbook

About this workbook

Welcome to the beginning of your SRE journey! This workbook contains

key points and reﬂection exercises for each module. The reﬂection
exercises can help you in future conversations with your leadership
teams during your journey to SRE adoption.

We recommend that you review each exercise after completing the

module video lessons.
Module 1 | Developing a Google SRE Culture

Module One
Welcome to Developing a Google SRE Culture

1. Key Points
2. Reﬂection Activity
Module 1 | Developing a Google SRE Culture

1. Key Points

● Customers’ experiences with your service tell you how reliable it is.
● In many IT organizations, development and operations teams have
conﬂicting priorities.
● Site Reliability Engineering (SRE) is the practice of balancing the
velocity of development features with the risk to reliability.
● SRE can beneﬁt IT teams, regardless of whether they are using cloud
or on-premises technology, for both large projects and daily work.

2. Reflection Activity

Have you ever had a concern about your service’s reliability? If so, what
caused this concern? Were there internal or external factors? How did you
address it?
Write down your thoughts below, and keep your experience in mind as you
learn about Google’s SRE practices.

3
Module 2 | Developing a Google SRE Culture

Module Two
DevOps, SRE, and Why They Exist

1. Key Points
2. Reﬂection Activity
Module 2 | Developing a Google SRE Culture

1. Key Points

● DevOps emerged to help close gaps and break down silos between
development and operations teams.
● DevOps is a philosophy, not a development methodology or
technology.
● SRE is a practical way to implement DevOps philosophy.
● Developers focus on feature velocity and innovation; operators focus
on reliability and consistency.
● SRE consists of both technical and cultural practices.
● SRE practices align to DevOps pillars:

5
Module 2 | Developing a Google SRE Culture

2. Reflection Activity

In this module, you heard the story of an online retailer whose developers
suffered from burnout due to the demands of increased feature deployment
while addressing reliability issues on the side.
Have you ever noticed this type of behavior with your development teams?
If so, what do you think caused it?

6
Module 3 | Developing a Google SRE Culture

Module Three
SLOs with Consequences

1. Glossary
2. Key Points
3. Reﬂection Activity
4. Postmortem Template
Module 3 | Developing a Google SRE Culture

1. Glossary

● Blameless postmortem: Detailed documentation of an incident or

outage, its root cause, its impact, actions taken to resolve it, and
follow-up actions to prevent its recurrence.
● Reliability: The number of “good” interactions divided by the number
of total interactions. This leaves you with a numerical fraction of real
users who experience a service that is available and working.
● Error budget: The amount of unreliability you are willing to tolerate.
● Service level indicator (SLI): A quantiﬁable measure of the reliability
of your service from your users' perspective.
● Service level objective (SLO): Sets the target for an SLI over a period
of time.

2. Key Points

2. Reflection Activity
● The mission of SRE is to protect, provide for, and progress software
and systems with consistent focus on availability, latency,
performance, and capacity.
● Understanding SRE practices and norms will help you build a
common language to use when speaking with your IT teams and
support your organization’s adoption of SRE both in the short and
long term.
● Experienced SREs are comfortable with failure.
● Failures are documented in postmortems, which focus on systems
and processes versus people.
● 100% reliability is the wrong target because it slows the release of
new features, which is what drives your business.

8
Module 3 | Developing a Google SRE Culture

● SLOs and error budgets create shared responsibility and ownership

between developers and SREs.
● Fostering psychologically safe environments is necessary for
learning and innovation in organizations.
● Organizations developing an SRE culture should focus on creating a
uniﬁed vision, determining what collaboration looks like, and
sharing knowledge among teams.

3. Reflection Activity

1. Think about your IT teams. List some scenarios where working in a

psychologically safe environment would beneﬁt them.

9
Module 3 | Developing a Google SRE Culture

2. Do you think blamelessness is achievable in your organization? How can

you support and encourage blamelessness and psychological safety within
your teams?
Write down as many ideas as you can. Share these with your leadership
team when you start your SRE implementation conversations.

10
Module 3 | Developing a Google SRE Culture

4. Postmortem Template

Below is a basic postmortem template. Share this with your IT teams as

you start to implement the SRE role and postmortem practice.

Part 1. What happened?

Title:

Date:

Authors:

Status: In Writing/In Review/Reviewed/Published

Summary: --- What was the incident? Its duration? Its cause? ---

Impact: --- Latency? Data loss? Availability?... Include revenue impact if

known ---

Root causes:

Trigger: --- Action that initiated the incident ---

Resolution: --- Actions taken to mitigate or prevent the incident’s impact in

the short term. Actions taken (ﬁxes deployed) to address the root causes ---

Detection: ---How was the incident detected? ---

11
Module 3 | Developing a Google SRE Culture

Lessons Learned
Some guiding questions:
● Was the incident detected quickly, or did it take a long time for a human to
notice?

● Did teams coordinate well among each other, or were there communication
problems?

● Were the escalation paths clear, or did engineers not know where to go for help?

What went well?

What didn’t go so well?

Where did we get lucky?

[There is often some aspect of an incident that ensures that it wasn’t as bad as it
could have been. Often, this aspect wasn’t by design. Call this out explicitly so you
can build new safeguards and not rely on luck next time.]

12
Module 3 | Developing a Google SRE Culture

Part 2. What can we do differently next time?

● Work together to document what you’ve learned from these issues

and come up with Action Items.

● Note: Do not focus solely on bug ﬁxes. Also include procedural

changes required to mitigate the impact of similar incidents.

Owners Action Items Priority Bug/Tickets

13
Module 4 | Developing a Google SRE Culture

Module Four
Make Tomorrow Better than Today

1. Glossary
2. Key Points
3. Reﬂection Activity
Module 4 | Developing a Google SRE Culture

1. Glossary

● Continuous integration: Building, integrating, and testing code

within the development environment.
● Continuous delivery: Deploying to production frequently, or at the
rate the business chooses.
● Canarying: Deploying a change in service to a group of users who
don’t know they are receiving the change, evaluating the impact to
that group, and then deciding how to proceed.
● Toil: Work directly tied to a service that is manual, repetitive,
automatable, tactical, or without enduring value, or that scales
linearly as the service grows.

2. Key Points

● Change is best when small and frequent.

● Design thinking methodology has ﬁve phases: empathize, deﬁne,
ideate, prototype, and test.
● Prototyping culture encourages teams to try more ideas, leading to an
increase in faster failures and more successes.
● Excessive toil is toxic to the SRE role.
● By eliminating toil, SREs can focus the majority of their time on work
that will either reduce future toil or add service features.
● Resistance to change is usually a fear of loss.
● Present change as an opportunity, not a threat.
● People react to change in many ways, and IT leaders need to
understand how to communicate with and support each group.

15
Module 4 | Developing a Google SRE Culture

3. Reflection Activity

1. Think about work your IT teams do that could be considered toil. How
much of that toil is bad? How much is good? Write down your thoughts
about the type of toil that you would consider automating, and the toil that
you would consider keeping.

2. How might you present adoption of SRE culture and practices as an

opportunity to your IT teams and other leadership? Brainstorm some ideas
below.

16
Module 5 | Developing a Google SRE Culture

Module Five
Regulate Workload

1. Glossary
2. Key Points
3. Reﬂection Activity
Module 5 | Developing a Google SRE Culture

1. Glossary

● Aﬃnity bias: Tendency to gravitate toward those who are similar to

you, such as with race, gender, socioeconomic background, or
education level.
● Conﬁrmation bias: Tendency to ﬁnd information, input, or data that
supports your preconceived notions.
● Selective attention bias: Tendency to pay attention to things, ideas,
and input from people whom you tend to gravitate toward.
● Labeling bias: Tendency to form opinions based on how people look,
dress, or appear externally.

2. Key Points

● Measure reliability with good service level indicators (SLIs).

● A good SLI correlates with user experience with your service; that is, a
good SLI tells you when users are happy or unhappy.
● Measure toil by identifying it, selecting an appropriate unit of
measure, and tracking the measurements continuously.
● Monitoring allows you to gain visibility into a system, which is a core
requirement for judging service health and diagnosing your service
when things go wrong.
● Goal-setting, transparency, and data-driven decision making are key
components of SRE measurement culture.
● To make truly data-driven decisions, you need to remove any
unconscious biases.

18
Module 5 | Developing a Google SRE Culture

3. Reflection Activity

1. Think about how your IT teams work. What are some things you know
they are already measuring? What are some things you think they should
measure that they don’t already measure?

2. How do you currently set and measure goals in your organization? Is

there anything you think you could improve about the process?

19
Module 6 | Developing a Google SRE Culture

Module Six
Apply SRE in Your Organization

1. Key Points
2. Reﬂection Activity
Module 6 | Developing a Google SRE Culture

1. Key Points

● Kitchen Sink/”Everything SRE” team: We recommend this approach

for organizations that have few applications and user journeys and
where the scope is small enough that only one team is necessary, but
a dedicated SRE team is needed in order to implement its practices.
● Infrastructure team: This type of team focuses on maintaining shared
services and components related to infrastructure, versus an SRE
team dedicated to working on services related to products, like
customer-facing code.
● Tools team: This type of SRE team tends to focus on building
software to help their developer counterparts measure, maintain, and
improve system reliability or other aspects of SRE work, such as
capacity planning.
● Product/Application team: This type of SRE team works to improve
the reliability of a critical application or business area. We
recommend this implementation for organizations that already have a
Kitchen Sink, Infrastructure, or Tools-focused SRE team and have a
key user-facing application with high reliability needs.
● Embedded team: This team has SREs embedded with their developer
counterparts, usually one per developer team in scope. The work
relationship between the embedded SREs and developers tends to be
project- or time-bounded and usually very hands-on, where they
perform work like changing code and configuration of the services in
scope.
● Consulting team: This implementation is very similar to the
embedded implementation, except SRE are usually less hands-on. We
recommend staffing one or two part-time consultants before you staff
your first SRE team.

21
Module 6 | Developing a Google SRE Culture

● Organizations with high SRE maturity have well-documented and

user-centric SLOs, error budgets, blameless postmortem culture, and
a low tolerance for toil.
● Engineers with operations experience and systems administrators
with scripting experience are good ﬁrst SREs to hire.
● Upskill current team members with necessary SRE skills such as
operations and software engineering, monitoring systems, production
automation, system architecture, troubleshooting, culture of trust, and
incident management.
● Contact your Account Executive or Account Director to learn how the
Google Cloud Professional Services team can support your
organization’s adoption of SRE.

22
Module 6 | Developing a Google SRE Culture

2. Reflection Activity

1. What do you think is your organization’s maturity level for adopting SRE?
Where does it ﬁt into the SRE journey? Write down your ideas.

23
Module 6 | Developing a Google SRE Culture

2. Think about your IT team composition. Are there already employees with
the skillset for SRE? How might you quickly upskill and train these
employees to move into the SRE role?

24
Resources | Developing a Google SRE Culture

Resources

● Site Reliability Engineering

Members of the SRE team explain how their engagement with the entire software
lifecycle has enabled Google to build, deploy, monitor, and maintain some of the
largest software systems in the world.

● The Site Reliability Workbook

The Site Reliability Workbook is the hands-on companion to the bestselling Site
Reliability Engineering book and uses concrete examples to show how to put SRE
principles and practices to work. This book contains practical examples from
Google’s experiences and case studies from Google’s Cloud Platform customers.
Evernote, The Home Depot, The New York Times, and other companies outline
hard-won experiences of what worked for them and what didn’t.

● Google Cloud Consulting Services

When you choose a Google Cloud consultant, you’ll be working hand in hand with
experts who will educate your team on best practices and guiding principles for a
successful implementation. Our deep technical expertise and services help you
unlock business value from the cloud across a range of solutions—including
infrastructure, application modernization, data management and analytics, machine
learning, and security.

● Site Reliability Engineering: Measuring and Managing Reliability (Coursera)

This course teaches the theory of service level objectives (SLOs), a principled way of
describing and measuring the desired reliability of a service. Upon completion,
learners should be able to apply these principles to develop the ﬁrst SLOs for
services they are familiar with in their own organizations.
Learners will also learn how to use service level indicators (SLIs) to quantify
reliability and error budgets to drive business decisions around engineering for
greater reliability. The learner will understand the components of a meaningful SLI
and walk through the process of developing SLIs and SLOs for an example service.

● DORA DevOps Quick Check

Measure your team's software delivery performance and compare it to the rest of
the industry by responding to ﬁve multiple-choice questions. The quick check takes
less than a minute to complete, and we don't store your answers or personal
information. Immediately compare your team's performance to others.

SRE Practices and Incident Management Guide
No ratings yet
SRE Practices and Incident Management Guide
58 pages
LinkedIn's SRE Implementation Guide
No ratings yet
LinkedIn's SRE Implementation Guide
12 pages
SRE Best Practices Guide
No ratings yet
SRE Best Practices Guide
11 pages
Google Cloud DevOps Exam Prep Guide
No ratings yet
Google Cloud DevOps Exam Prep Guide
10 pages
SRE Foundation V1 - 0 - Value Added Resources 11 - 2019
No ratings yet
SRE Foundation V1 - 0 - Value Added Resources 11 - 2019
8 pages
SRE Google Notes
100% (1)
SRE Google Notes
8 pages
Unit 05 - SRE
No ratings yet
Unit 05 - SRE
15 pages
Site Reliability Engineering Ebook
100% (2)
Site Reliability Engineering Ebook
21 pages
Enterprise Roadmap To Sre
No ratings yet
Enterprise Roadmap To Sre
62 pages
SRE Principles
No ratings yet
SRE Principles
15 pages
Site Reliability Engineer (SRE) v1
50% (2)
Site Reliability Engineer (SRE) v1
3 pages
Site Reliability Engineering (SRE)
No ratings yet
Site Reliability Engineering (SRE)
4 pages
DevOps Security Interview Questions
No ratings yet
DevOps Security Interview Questions
54 pages
Gitlabcimeetup 220330181442
No ratings yet
Gitlabcimeetup 220330181442
37 pages
DevOps & Kubernetes for Developers
100% (1)
DevOps & Kubernetes for Developers
40 pages
Professional Cloud DevOps Engineer V12.95 Copy 2
0% (1)
Professional Cloud DevOps Engineer V12.95 Copy 2
29 pages
???????? ?? ?????????????
No ratings yet
???????? ?? ?????????????
38 pages
Jenkins: A CI Tool for Test Management
No ratings yet
Jenkins: A CI Tool for Test Management
5 pages
Openshift Presentation
No ratings yet
Openshift Presentation
27 pages
An Architect's Guide to SRE
No ratings yet
An Architect's Guide to SRE
375 pages
DevOps Shack Pipeline Stages
No ratings yet
DevOps Shack Pipeline Stages
9 pages
CI/CD Pipeline with Jenkins & Kubernetes
No ratings yet
CI/CD Pipeline with Jenkins & Kubernetes
13 pages
DevOps Career Success Guide
100% (1)
DevOps Career Success Guide
53 pages
Devops Shack 50 Complex Kubernetes Scenario-Based Q&A: 1. Scenario: Zero-Downtime Deployment For Multiple Services
No ratings yet
Devops Shack 50 Complex Kubernetes Scenario-Based Q&A: 1. Scenario: Zero-Downtime Deployment For Multiple Services
45 pages
Troubleshooting in DevOps
No ratings yet
Troubleshooting in DevOps
5 pages
GitOps 2.0: DevOps Evolution
No ratings yet
GitOps 2.0: DevOps Evolution
29 pages
AI-Ops Introduction
No ratings yet
AI-Ops Introduction
9 pages
General Interview Questions
No ratings yet
General Interview Questions
6 pages
Kubernetes Architecture Guide
No ratings yet
Kubernetes Architecture Guide
40 pages
Docker Fundamentals Tutorial
No ratings yet
Docker Fundamentals Tutorial
34 pages
Edureka DevOps Certification Course
No ratings yet
Edureka DevOps Certification Course
11 pages
Kubernetes Cluster Architecture Guide
100% (1)
Kubernetes Cluster Architecture Guide
29 pages
DevOps and Azure Candidate Assessment Guide
No ratings yet
DevOps and Azure Candidate Assessment Guide
2 pages
DevOps Interview Tips & Tricks
No ratings yet
DevOps Interview Tips & Tricks
25 pages
Terraform Scenario Based Interview Questions: What Is Terraform and How It Works?
No ratings yet
Terraform Scenario Based Interview Questions: What Is Terraform and How It Works?
16 pages
GCP DevOps Project
100% (2)
GCP DevOps Project
178 pages
AWS DEVOPS Interview Questions and Answers
No ratings yet
AWS DEVOPS Interview Questions and Answers
9 pages
Building Scalable CICD Pipelines With GitHub Actions
No ratings yet
Building Scalable CICD Pipelines With GitHub Actions
32 pages
Kubernetes Security Guide
No ratings yet
Kubernetes Security Guide
64 pages
Kubernetes Persistent Volumes
No ratings yet
Kubernetes Persistent Volumes
13 pages
Concepts
No ratings yet
Concepts
500 pages
Dev Ops For Cloud
100% (1)
Dev Ops For Cloud
240 pages
SRE Practitioner v1.0 Exam Study Guide - July2021
No ratings yet
SRE Practitioner v1.0 Exam Study Guide - July2021
94 pages
Azure DevOps Engineer Learning Pathway 1122i
100% (1)
Azure DevOps Engineer Learning Pathway 1122i
1 page
DevOps Engineer Role in Romania
No ratings yet
DevOps Engineer Role in Romania
1 page
50 Jenkins Interview Questions and Answers 2023
No ratings yet
50 Jenkins Interview Questions and Answers 2023
10 pages
Helm For Freshers (Step by Step Guide)
No ratings yet
Helm For Freshers (Step by Step Guide)
14 pages
Shshs
No ratings yet
Shshs
33 pages
Organizational Security Policy Overview
No ratings yet
Organizational Security Policy Overview
4 pages
Realtime Hands On Kubernetes Q-A
No ratings yet
Realtime Hands On Kubernetes Q-A
117 pages
CICD Pipelines For Different Deployment Stratgeies
100% (1)
CICD Pipelines For Different Deployment Stratgeies
12 pages
DevOps Corporate Workflow Overview
No ratings yet
DevOps Corporate Workflow Overview
3 pages
DevOps Shack - 500 Essential DevOps Commands
No ratings yet
DevOps Shack - 500 Essential DevOps Commands
47 pages
DevSecOps Buyer’s Guide: Shift Left & Right
No ratings yet
DevSecOps Buyer’s Guide: Shift Left & Right
7 pages
Guide to Scaling Infrastructure as Code
No ratings yet
Guide to Scaling Infrastructure as Code
25 pages
Periodic Table of DevOps Toolsgh
100% (1)
Periodic Table of DevOps Toolsgh
1 page
DevOps Tasks Devops Shack
No ratings yet
DevOps Tasks Devops Shack
5 pages
Kubernetes-Certified-Administrator - README - MD at Master Walidshaari - Kubernetes-Certified-Administrator GitHub PDF
No ratings yet
Kubernetes-Certified-Administrator - README - MD at Master Walidshaari - Kubernetes-Certified-Administrator GitHub PDF
7 pages
JLR Site Reliability Engineer Role
No ratings yet
JLR Site Reliability Engineer Role
5 pages
Modul1 SRE
No ratings yet
Modul1 SRE
2 pages
Advertisement Audioscript
No ratings yet
Advertisement Audioscript
1 page
Bookkeeping 1A Assessment
No ratings yet
Bookkeeping 1A Assessment
3 pages
Clarification Request Under Request For Proposals
No ratings yet
Clarification Request Under Request For Proposals
3 pages
Padma Bridge: Socio-Economic Impact
No ratings yet
Padma Bridge: Socio-Economic Impact
22 pages
As 16
No ratings yet
As 16
10 pages
Industrial Attachment (Epyllion Group) - Final Copy For Printing
No ratings yet
Industrial Attachment (Epyllion Group) - Final Copy For Printing
386 pages
Acc Impairment Question
No ratings yet
Acc Impairment Question
4 pages
Fintech Growth and Challenges
No ratings yet
Fintech Growth and Challenges
2 pages
Income Tax Law and Accounts Chapter 7
No ratings yet
Income Tax Law and Accounts Chapter 7
7 pages
Digital Transformation - Work Scope
No ratings yet
Digital Transformation - Work Scope
20 pages
Set 1-Paper 2 MS IB HL Economics
No ratings yet
Set 1-Paper 2 MS IB HL Economics
16 pages
Ims Project Plan-Final 1
No ratings yet
Ims Project Plan-Final 1
36 pages
Procurement Management Internship Report
No ratings yet
Procurement Management Internship Report
47 pages
SS150371 PMG201c PE 01
No ratings yet
SS150371 PMG201c PE 01
5 pages
Does PayPal Work in Ethiopia in 2022 - (Complete Guide)
80% (5)
Does PayPal Work in Ethiopia in 2022 - (Complete Guide)
11 pages
International Commodity Derivative Markets A 2-Credit Course For PGCPIBF
No ratings yet
International Commodity Derivative Markets A 2-Credit Course For PGCPIBF
8 pages
Disclosures Checklist For Cooperatives.
No ratings yet
Disclosures Checklist For Cooperatives.
5 pages
Hichem REGUIEG +14 Yrs Hse
No ratings yet
Hichem REGUIEG +14 Yrs Hse
3 pages
TRUST AGREEMENT Fao Johnatan Tanchongco
No ratings yet
TRUST AGREEMENT Fao Johnatan Tanchongco
3 pages
2025-05!22!20.02.21 Utizam - Training@gmail - Com Facdada
No ratings yet
2025-05!22!20.02.21 Utizam - Training@gmail - Com Facdada
2 pages
Sole Trader Financial Statements Compilation
No ratings yet
Sole Trader Financial Statements Compilation
4 pages
RTP Notes (Auditing and Ethics)
No ratings yet
RTP Notes (Auditing and Ethics)
71 pages
WPS Vs PQR
No ratings yet
WPS Vs PQR
3 pages
Just Us Cafes Marketing Strategy
No ratings yet
Just Us Cafes Marketing Strategy
24 pages
Unit-3-Cloud Governance
No ratings yet
Unit-3-Cloud Governance
3 pages
Karia and Company Limited V Dhamani Civil Appeal No 45 of 1968 1969 Eaca 1 21 March 1969
No ratings yet
Karia and Company Limited V Dhamani Civil Appeal No 45 of 1968 1969 Eaca 1 21 March 1969
12 pages
1001
No ratings yet
1001
2 pages
Car Sales Analysis Report
No ratings yet
Car Sales Analysis Report
12 pages
Insurance Claim Checklist
No ratings yet
Insurance Claim Checklist
1 page
SKR Selen PDF
100% (1)
SKR Selen PDF
1 page

Developing A Google SRE Culture

Uploaded by

Developing A Google SRE Culture

Uploaded by

Developing a

Google SRE Culture

About this workbook

Welcome to the beginning of your SRE journey! This workbook contains

We recommend that you review each exercise after completing the

● Blameless postmortem: Detailed documentation of an incident or

● SLOs and error budgets create shared responsibility and ownership

1. Think about your IT teams. List some scenarios where working in a

2. Do you think blamelessness is achievable in your organization? How can

Below is a basic postmortem template. Share this with your IT teams as

Part 1. What happened?

Status: In Writing/In Review/Reviewed/Published

Impact: --- Latency? Data loss? Availability?... Include revenue impact if

Trigger: --- Action that initiated the incident ---

Resolution: --- Actions taken to mitigate or prevent the incident’s impact in

Detection: ---How was the incident detected? ---

What went well?

What didn’t go so well?

Where did we get lucky?

Part 2. What can we do differently next time?

● Work together to document what you’ve learned from these issues

● Note: Do not focus solely on bug ﬁxes. Also include procedural

Owners Action Items Priority Bug/Tickets

● Continuous integration: Building, integrating, and testing code

● Change is best when small and frequent.

2. How might you present adoption of SRE culture and practices as an

● Aﬃnity bias: Tendency to gravitate toward those who are similar to

● Measure reliability with good service level indicators (SLIs).

2. How do you currently set and measure goals in your organization? Is

● Kitchen Sink/”Everything SRE” team: We recommend this approach

● Organizations with high SRE maturity have well-documented and

● Site Reliability Engineering

● The Site Reliability Workbook

● Google Cloud Consulting Services

● Site Reliability Engineering: Measuring and Managing Reliability (Coursera)

● DORA DevOps Quick Check

You might also like