cloud.gov post mortems

Note: If you're experiencing a problem with cloud.gov or you want to discuss an ongoing incident: check the cloud.gov StatusPage and visit #cloud-gov-support in Slack or email [email protected]. This repository and wiki are only for post-mortems after the incident has been closed.

We hold a post mortem as soon as possible after a cloud.gov service disruption or other incident. We use a broad definition of incident; ITIL says "Failure of a configuration item that has not yet affected service is also an incident — for example, failure of one disk from a mirror set. The ITIL incident management process ensures that normal service operation is restored as quickly as possible and the business impact is minimized."

We keep our post mortems in the wiki attached to this repo.

For more information on post mortems, check out:

John Allspaw's introduction
This great presentation by Dan Milstein

How we run post mortems - the short version

Before the post mortem we'll put together a timeline of the incident on the wiki, beginning at the time the incident was announced in Slack, and ending at the point we declared the incident over. Everybody is welcome to add their observations to the timeline, including which actions were taken when, the effects observed, and their understanding of the events.

The facilitator starts by reading the retrospective prime directive.
We review the timeline and add anything we have missed.
We analyze the factors that contributed to the incident.
We propose, discuss, and prioritize remediation steps to reduce the likelihood of future incidents, to improve detection and response times for future incidents, to improve our incident handling processes and training, and to validate and test these remediation steps.

We add the work that comes out of the post mortem to our backlog. We then schedule a meeting 2 months after the incident to review our progress.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cloud.gov post mortems

How we run post mortems - the short version

About

Releases

Packages

License

sharms/cg-postmortems

Folders and files

Latest commit

History

Repository files navigation

cloud.gov post mortems

How we run post mortems - the short version

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages