{"id":32342,"date":"2021-01-16T12:00:03","date_gmt":"2021-01-16T12:00:03","guid":{"rendered":"https:\/\/www.testpreptraining.com\/tutorial\/?page_id=32342"},"modified":"2021-01-16T12:00:04","modified_gmt":"2021-01-16T12:00:04","slug":"troubleshooting-postmortem-analysis-culture","status":"publish","type":"page","link":"https:\/\/www.testpreptraining.ai\/tutorial\/troubleshooting-postmortem-analysis-culture\/","title":{"rendered":"Troubleshooting\/postmortem analysis culture"},"content":{"rendered":"\n<p><strong><a href=\"https:\/\/www.testpreptraining.ai\/tutorial\/google-certified-professional-cloud-architect\/\" target=\"_blank\" rel=\"noreferrer noopener\">Go back to GCP Tutorials<\/a><\/strong><\/p>\n\n\n\n<p>In this we will learn and understand about troubleshooting\/post mortem analysis culture.<\/p>\n\n\n\n<p>For platform providers that offer a wide range of services to a wide range of users, fully public postmortems such as these make sense. But even if the impact of your outage isn\u2019t as broad. However, if you are practising SRE, it can still make sense to share postmortems with customers that have been directly impacted. <\/p>\n\n\n\n<p>This is the position we take on the Google Cloud Platform (GCP) Customer Reliability Engineering. Further, to help customers run reliably on GCP, we teach them how to engineer increased reliability for their service by implementing SRE best practices in our work together. We identify and quantify architectural and operational risks to each customer\u2019s service, and work with them to mitigate those risks and drive to sustain system reliability at their SLO.<\/p>\n\n\n\n<h6 class=\"wp-block-heading\"><em>Specifically, the CRE team works with each customer to help them meet the availability target expressed by their SLOs. For this, the principal steps are to:<\/em><\/h6>\n\n\n\n<ul class=\"wp-block-list\"><li>Firstly, define a comprehensive set of business-relevant SLOs<\/li><li>Secondly, get the customer to measure compliance to those SLOs in their monitoring platform (how much of the service error budget has been consumed)<\/li><li>Thirdly, share that live SLO information with Google support and product SRE teams (which we term shared monitoring)<\/li><li>Lastly, jointly monitor and react to SLO breaches with the customer (shared operational fate)<\/li><\/ul>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><a href=\"https:\/\/www.testpreptraining.ai\/google-cloud-certified-professional-cloud-architect-free-practice-test\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"750\" height=\"117\" src=\"https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2021\/01\/Google-Certified-Professional-Cloud-Architect-prac-tests-750x117.png\" alt=\"gcp cloud architect practice tests\" class=\"wp-image-31460\" srcset=\"https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2021\/01\/Google-Certified-Professional-Cloud-Architect-prac-tests-750x117.png 750w, https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2021\/01\/Google-Certified-Professional-Cloud-Architect-prac-tests.png 961w\" sizes=\"auto, (max-width: 750px) 100vw, 750px\" \/><\/a><\/figure><\/div>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Foundations of an external postmortem<\/strong><\/h4>\n\n\n\n<p>Analyzing outages  and subsequently writing about them in a postmortem gives benefits from having a two-way flow of monitoring data between the platform operator and the service owner. However, this provides an objective measure of the external impact of the incident: When did it start, how long did it last, how severe was it?<\/p>\n\n\n\n<p>Further, based on the monitoring data from the service owner and their own monitoring, the platform team can write their postmortem following the standard practices and our postmortem template. This results in an internally reviewed document that has the canonical view of the incident timeline, the scope and magnitude of impact, and a set of prioritized actions to reduce the probability of occurrence of the situation, reduce the expected impact, improve detection  and\/or recover from the incident more quickly.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Selecting an audience for your external postmortem<\/strong><\/h4>\n\n\n\n<ul class=\"wp-block-list\"><li>Firstly, if your customers have defined SLOs, they know how badly this affected them. Generally, the greater the error budget that has been consumed by the incident, the more interested they are in the details, and the more important it will be to share with them. They&#8217;re also more likely to be able to give relevant feedback to the postmortem about the scope, timing and impact of the incident.<\/li><li>Secondly, if your customer\u2019s SLOs weren\u2019t violated but this problem still affected their customers, that\u2019s an action item for the customer\u2019s own postmortem: what changes need to be made to either the SLO or its measurements? <\/li><li>Thirdly, if your customer doesn\u2019t have SLOs that represent the end-user experience, it\u2019s difficult to make an objective call about this. Unless there are obvious reasons why the incident disproportionately affected a particular customer. Then, you should probably default to a more generic incident report.<\/li><li>Lastly, if the outage has impacted most of your customers, then you should consider whether the externalized postmortem might be the basis for writing a public postmortem or incident report, like the examples we quoted above. <\/li><\/ul>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Deciding how much to share, and why?<\/strong><\/h4>\n\n\n\n<p>Another question when writing external postmortems is how deep to get into the weeds of the outage. At one end of the spectrum you might share your entire internal postmortem with a minimum of redaction; at the other you might write a short incident summary. This is a tricky issue that we\u2019ve debated internally.<br>The two factors we believe to be most important in determining whether to expose the full detail of a postmortem to a customer, rather than just a summary, are:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Firstly, How important are the details to understanding how to defend against a future re-occurrence of the event?<\/li><li>Secondly, How badly did the event damage their service, i.e., how much error budget did it consume?<\/li><\/ul>\n\n\n\n<h5 class=\"wp-block-heading\"><strong>Postmortems should never include these three things:<\/strong><\/h5>\n\n\n\n<ul class=\"wp-block-list\"><li>Firstly, Names of humans. Rather than \u201cJohn Smith accidentally kicked over a server\u201d, say \u201ca network engineer accidentally kicked over a server,\u201d Internally, we try to express the role of humans in terms of role rather than name. This helps us keep a blameless postmortem culture.<\/li><li>Secondly, Names of internal systems. The names of your internal systems are not clarifying for your users and creates a burden on them to discover how these things fit together. For example, even though we\u2019ve discussed Chubby externally. But, we still refer to it in postmortems we make external as \u201cour globally distributed lock system.\u201d<\/li><li>Lastly, Customer-specific information. The internal version of your postmortem will likely say things like \u201con XX:XX, Acme Corp filed a Support ticket alerting us to a problem.\u201d It\u2019s not your place to share this kind of detail externally. As it may create an undue burden for the reporting company (in this case Acme Corp.). Rather, simply say \u201con XX:XX, a customer filed\u2026\u201d. If you\u2019re going to reference more than one customer, then just label them Customer A, Customer B, etc..<\/li><\/ul>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><a href=\"https:\/\/www.testpreptraining.ai\/google-cloud-certified-professional-cloud-architect-practice-exam\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"750\" height=\"117\" src=\"https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2021\/01\/Google-Certified-Professional-Cloud-Architect-online-course-750x117.png\" alt=\"Troubleshooting\/post mortem analysis culture GCP cloud architect  online course\" class=\"wp-image-31461\" srcset=\"https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2021\/01\/Google-Certified-Professional-Cloud-Architect-online-course-750x117.png 750w, https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2021\/01\/Google-Certified-Professional-Cloud-Architect-online-course.png 961w\" sizes=\"auto, (max-width: 750px) 100vw, 750px\" \/><\/a><\/figure><\/div>\n\n\n\n<p><strong>Reference:<\/strong> <a href=\"https:\/\/cloud.google.com\/blog\/products\/gcp\/fearless-shared-postmortems-cre-life-lessons\" target=\"_blank\" rel=\"noreferrer noopener\">Google Documentation<\/a><\/p>\n\n\n\n<p><strong><a href=\"https:\/\/www.testpreptraining.ai\/tutorial\/google-certified-professional-cloud-architect\/\" target=\"_blank\" rel=\"noreferrer noopener\">Go back to GCP Tutorials<\/a><\/strong><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Go back to GCP Tutorials In this we will learn and understand about troubleshooting\/post mortem analysis culture. For platform providers that offer a wide range of services to a wide range of users, fully public postmortems such as these make sense. But even if the impact of your outage isn\u2019t as broad. However, if you&#8230;<\/p>\n","protected":false},"author":2,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_acf_changed":false,"footnotes":""},"categories":[],"tags":[],"class_list":["post-32342","page","type-page","status-publish","hentry"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v22.1 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Troubleshooting\/postmortem analysis culture - Testprep Training Tutorials<\/title>\n<meta name=\"description\" content=\"Enhance your knowledge about Troubleshooting\/postmortem analysis culture using the Google Certified Professional Cloud Architect Course Now!\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.testpreptraining.ai\/tutorial\/troubleshooting-postmortem-analysis-culture\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Troubleshooting\/postmortem analysis culture - Testprep Training Tutorials\" \/>\n<meta property=\"og:description\" content=\"Enhance your knowledge about Troubleshooting\/postmortem analysis culture using the Google Certified Professional Cloud Architect Course Now!\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.testpreptraining.ai\/tutorial\/troubleshooting-postmortem-analysis-culture\/\" \/>\n<meta property=\"og:site_name\" content=\"Testprep Training Tutorials\" \/>\n<meta property=\"article:modified_time\" content=\"2021-01-16T12:00:04+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2021\/01\/Google-Certified-Professional-Cloud-Architect-prac-tests-750x117.png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/troubleshooting-postmortem-analysis-culture\/\",\"url\":\"https:\/\/www.testpreptraining.ai\/tutorial\/troubleshooting-postmortem-analysis-culture\/\",\"name\":\"Troubleshooting\/postmortem analysis culture - Testprep Training Tutorials\",\"isPartOf\":{\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#website\"},\"datePublished\":\"2021-01-16T12:00:03+00:00\",\"dateModified\":\"2021-01-16T12:00:04+00:00\",\"description\":\"Enhance your knowledge about Troubleshooting\/postmortem analysis culture using the Google Certified Professional Cloud Architect Course Now!\",\"breadcrumb\":{\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/troubleshooting-postmortem-analysis-culture\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.testpreptraining.ai\/tutorial\/troubleshooting-postmortem-analysis-culture\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/troubleshooting-postmortem-analysis-culture\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.testpreptraining.ai\/tutorial\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Troubleshooting\/postmortem analysis culture\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#website\",\"url\":\"https:\/\/www.testpreptraining.ai\/tutorial\/\",\"name\":\"Testprep Training Tutorials\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.testpreptraining.ai\/tutorial\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#organization\",\"name\":\"Testprep Training\",\"url\":\"https:\/\/www.testpreptraining.ai\/tutorial\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.testpreptraining.com\/tutorial\/wp-content\/uploads\/2020\/07\/tpt-logo-6.png\",\"contentUrl\":\"https:\/\/www.testpreptraining.com\/tutorial\/wp-content\/uploads\/2020\/07\/tpt-logo-6.png\",\"width\":583,\"height\":153,\"caption\":\"Testprep Training\"},\"image\":{\"@id\":\"https:\/\/www.testpreptraining.ai\/tutorial\/#\/schema\/logo\/image\/\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Troubleshooting\/postmortem analysis culture - Testprep Training Tutorials","description":"Enhance your knowledge about Troubleshooting\/postmortem analysis culture using the Google Certified Professional Cloud Architect Course Now!","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.testpreptraining.ai\/tutorial\/troubleshooting-postmortem-analysis-culture\/","og_locale":"en_US","og_type":"article","og_title":"Troubleshooting\/postmortem analysis culture - Testprep Training Tutorials","og_description":"Enhance your knowledge about Troubleshooting\/postmortem analysis culture using the Google Certified Professional Cloud Architect Course Now!","og_url":"https:\/\/www.testpreptraining.ai\/tutorial\/troubleshooting-postmortem-analysis-culture\/","og_site_name":"Testprep Training Tutorials","article_modified_time":"2021-01-16T12:00:04+00:00","og_image":[{"url":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-content\/uploads\/2021\/01\/Google-Certified-Professional-Cloud-Architect-prac-tests-750x117.png"}],"twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.testpreptraining.ai\/tutorial\/troubleshooting-postmortem-analysis-culture\/","url":"https:\/\/www.testpreptraining.ai\/tutorial\/troubleshooting-postmortem-analysis-culture\/","name":"Troubleshooting\/postmortem analysis culture - Testprep Training Tutorials","isPartOf":{"@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#website"},"datePublished":"2021-01-16T12:00:03+00:00","dateModified":"2021-01-16T12:00:04+00:00","description":"Enhance your knowledge about Troubleshooting\/postmortem analysis culture using the Google Certified Professional Cloud Architect Course Now!","breadcrumb":{"@id":"https:\/\/www.testpreptraining.ai\/tutorial\/troubleshooting-postmortem-analysis-culture\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.testpreptraining.ai\/tutorial\/troubleshooting-postmortem-analysis-culture\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.testpreptraining.ai\/tutorial\/troubleshooting-postmortem-analysis-culture\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.testpreptraining.ai\/tutorial\/"},{"@type":"ListItem","position":2,"name":"Troubleshooting\/postmortem analysis culture"}]},{"@type":"WebSite","@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#website","url":"https:\/\/www.testpreptraining.ai\/tutorial\/","name":"Testprep Training Tutorials","description":"","publisher":{"@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.testpreptraining.ai\/tutorial\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#organization","name":"Testprep Training","url":"https:\/\/www.testpreptraining.ai\/tutorial\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#\/schema\/logo\/image\/","url":"https:\/\/www.testpreptraining.com\/tutorial\/wp-content\/uploads\/2020\/07\/tpt-logo-6.png","contentUrl":"https:\/\/www.testpreptraining.com\/tutorial\/wp-content\/uploads\/2020\/07\/tpt-logo-6.png","width":583,"height":153,"caption":"Testprep Training"},"image":{"@id":"https:\/\/www.testpreptraining.ai\/tutorial\/#\/schema\/logo\/image\/"}}]}},"_links":{"self":[{"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/pages\/32342","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/comments?post=32342"}],"version-history":[{"count":6,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/pages\/32342\/revisions"}],"predecessor-version":[{"id":32423,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/pages\/32342\/revisions\/32423"}],"wp:attachment":[{"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/media?parent=32342"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/categories?post=32342"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.testpreptraining.ai\/tutorial\/wp-json\/wp\/v2\/tags?post=32342"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}