Skip to content

Commit

Permalink
name all runbooks
Browse files Browse the repository at this point in the history
  • Loading branch information
paulfantom committed Jul 2, 2021
1 parent 9bffc45 commit 665f827
Show file tree
Hide file tree
Showing 22 changed files with 52 additions and 41 deletions.
2 changes: 2 additions & 0 deletions config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,6 @@ params:
BookRepo: "https://github.com/paulfantom/runbooks"
BookEditPath: "edit/main"
BookSearch: true
BookSection: "runbooks"
#BookMenuBundle: "/menu"

42 changes: 15 additions & 27 deletions content/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,39 +3,27 @@ title: Introduction
type: docs
---

# Acerbo datus maxime
# Welcome!

{{< columns >}}
## Astris ipse furtiva
Welcome to the site hosting runbooks for alerts shipped with
[kube-prometheus](https://github.com/prometheus-operator/kube-prometheus) project.

Est in vagis et Pittheus tu arge accipiter regia iram vocatur nurus. Omnes ut
olivae sensit **arma sorori** deducit, inesset **crudus**, ego vetuere aliis,
modo arsit? Utinam rapta fiducia valuere litora _adicit cursu_, ad facies
## Reason

<--->
Kube-prometheus was always meant to provide the complete monitoring solution for kubernetes environments. The project
already includes a lot of various components to fullfill this goal and one crucial part is including alerting rules.
However what good are those alerting rules when one don't know what to do when the alert fires?

## Suis quot vota
## Goal

Ea _furtique_ risere fratres edidit terrae magis. Colla tam mihi tenebat:
miseram excita suadent es pecudes iam. Concilio _quam_ velatus posset ait quod
nunc! Fragosis suae dextra geruntur functus vulgata.
{{< /columns >}}
We aim to ship meaningful runbook for every alert in kube-prometheus project and provide enough insight to help
kube-prometheus users during incidents.

## How to contribute?

## Tempora nisi nunc
If you find any issues with current runbooks, please use the `Edit this page` link at the bottom of the runbook page.

Lorem **markdownum** emicat gestu. Cannis sol pressit ducta. **Est** Idaei,
tremens ausim se tutaeque, illi ulnis hausit, sed, lumina cutem. Quae avis
sequens!
For adding a new runbook please follow [add runbook](/docs/add-runbook) guide.

var panel = ram_design;
if (backup + system) {
file.readPoint = network_native;
sidebar_engine_device(cell_tftp_raster,
dual_login_paper.adf_vci.application_reader_design(
graphicsNvramCdma, lpi_footer_snmp, integer_model));
}

## Locis suis novi cum suoque decidit eadem

Idmoniae ripis, at aves, ali missa adest, ut _et autem_, et ab?
If you find any other issues, please [open an issue on GitHub](https://github.com/paulfantom/runbooks/issues/new)
or ask questions in [prometheus-operator slack channel](https://kubernetes.slack.com/archives/CFFDS2Z7F).
File renamed without changes.
1 change: 0 additions & 1 deletion content/docs/kubernetes/KubeSchedulerDown.md

This file was deleted.

File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Alertmanager
title: alertmanager
bookCollapseSection: true
weight: 10
bookFlatSection: true
weight: 10
---

File renamed without changes.
7 changes: 7 additions & 0 deletions content/runbooks/general/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
title: general
bookCollapseSection: true
bookFlatSection: true
weight: 1
---

Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: Kubernetes
title: kube-state-metrics
bookCollapseSection: true
bookFlatSection: true
weight: 10
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
# Impact
# KubeAPIErrorBudgetBurn

## Impact

The overall availability of your Kubernetes cluster isn't guaranteed anymore.
There may be **too many errors** returned by the APIServer and/or **responses take too long** for guarantee proper reconciliation.

Expand Down Expand Up @@ -88,4 +91,4 @@ sum(rate(apiserver_request_total{job="apiserver",verb=~"POST|PUT|PATCH|DELETE"}[
```

---
Learn more about Multiple Burn Rate Alerts in the [SRE Workbook Chapter 5](https://sre.google/workbook/alerting-on-slos/#recommended_time_windows_and_burn_rates_f).
Learn more about Multiple Burn Rate Alerts in the [SRE Workbook Chapter 5](https://sre.google/workbook/alerting-on-slos/#recommended_time_windows_and_burn_rates_f).
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# KubePersistentVolumeFillingUp

There can be various reasons why a volume is filling up. This runbook does not cover application specific reasons, only mitigations for volumes that are legitimately filling.

## Volume resizing
Expand Down Expand Up @@ -46,4 +48,4 @@ When the data is ephemeral and volume expansion is not available, it may be best

WARNING/DANGER: This will permanently delete the data on the volume. Performing these steps is your responsibility.

TODO
TODO
3 changes: 3 additions & 0 deletions content/runbooks/kubernetes/KubeSchedulerDown.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# KubeSchedulerDown

Runbook available at https://coreos.com/tectonic/docs/latest/troubleshooting/controller-recovery.html#recovering-a-scheduler
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# KubeletTooManyPods

The Kubelet in a Kubernetes cluster is the agent that ensures Pods are running on that host.

Kubelet's have a configuration that limits how many Pods they can run. The default value of this is 110 Pods per Kubelet, but it is configurable (and this alert takes that configuration into account with the `kube_node_status_capacity_pods` metric). The alert fires when a Kubelet reaches 95% of its capacity. This alert warns about the likelihood that the cluster is close to running out of capacity to run Pods on the cluster. Either the cluster must be increased in its node count, or the number of Pods must be reduced.
Kubelet's have a configuration that limits how many Pods they can run. The default value of this is 110 Pods per Kubelet, but it is configurable (and this alert takes that configuration into account with the `kube_node_status_capacity_pods` metric). The alert fires when a Kubelet reaches 95% of its capacity. This alert warns about the likelihood that the cluster is close to running out of capacity to run Pods on the cluster. Either the cluster must be increased in its node count, or the number of Pods must be reduced.
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: Prometheus
title: kubernetes
bookCollapseSection: true
bookFlatSection: true
weight: 10
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: Node
title: node
bookCollapseSection: true
bookFlatSection: true
weight: 10
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# PrometheusRuleFailures

Your best starting point is the rules page of the Prometheus UI (:9090/rules). It will show the error.

You can also evaluate the rule expression yourself, using the UI, or maybe using PromLens to help debug expression issues.
You can also evaluate the rule expression yourself, using the UI, or maybe using PromLens to help debug expression issues.
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: General
title: prometheus
bookCollapseSection: true
bookFlatSection: true
weight: 10
Expand Down
7 changes: 5 additions & 2 deletions layouts/404.html
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,11 @@ <h4>If you were directed here by a link from an alert, we sadly don't have a run
<p>We would like to have runbooks for all alerts shipped with kube-prometheus, but we sadly don't have time to write them all.</p>
<p>If you would like to help us, please consider opening a pull request. Thank you!</p>
<h3>
<a href="{{ .Site.Params.BookRepo }}/issues/new?template=new-runbook.md">Add runbook</a>
</h3>
<a href="/docs/add-runbook">Add runbook</a>
</h3>
<h3>
<a href="{{ .Site.Params.BookRepo }}/issues/new">Open an issue</a>
</h3>
<h3>
<a href="{{ .Site.Home.RelPermalink }}">Back to main page</a>
</h3>
Expand Down

0 comments on commit 665f827

Please sign in to comment.