Summary
The video discusses a recent incident of bad code being pushed into production, causing major outages across popular internet services. Google Cloud took responsibility, apologizing for the disruption, shedding light on the significance of big infrastructure like Google Cloud. Financial implications of such outages are explored, alongside the impact on market share compared to Azure and AWS. The incident was traced back to an API management service issue and a binary crash due to lack of error handling, emphasizing the importance of AI and proper development practices. Additionally, a new policy change triggered chaos, but a rollback procedure with a 'big red button' was implemented to restore normalcy.
Chapters
Introduction to Bad Code Incident
Impact of Bad Code on Internet Services
Apology from Google Cloud
Power of Big Infrastructure
Financial Impact of Major Outages
Service Level Agreements with Cloud Providers
Market Share Impact on Google Cloud
Root Cause Analysis of the Incident
AI in Technology and Human Error
Policy Change Leading to API Loop
Rollback Procedure and Recovery
Introduction to Post Hog AI Product
Introduction to Bad Code Incident
Discussion of a recent incident where bad code pushed into production caused major outages across the internet, including Snapchat, Spotify, Discord, and Cloudflare's workers KV service.
Impact of Bad Code on Internet Services
Exploration of the repercussions of the bad code incident on various internet services and websites, leading to significant error rates and downtime.
Apology from Google Cloud
Google Cloud taking responsibility for the bad code incident and offering apologies for the disruption caused to favorite apps and services.
Power of Big Infrastructure
Highlighting the importance and power of big infrastructure like Google Cloud in today's technological landscape.
Financial Impact of Major Outages
Discussion on the financial implications of major outages like the recent incident, which can result in significant losses for companies.
Service Level Agreements with Cloud Providers
Explanation of service level agreements with cloud providers and the criteria for financial compensation in case of violations.
Market Share Impact on Google Cloud
Assessment of the impact of the outage on Google Cloud's market share in comparison to Azure and AWS.
Root Cause Analysis of the Incident
Investigation into how the bad code incident occurred, involving an API management service issue and a binary crash due to lack of proper error handling.
AI in Technology and Human Error
Exploring the role of AI in technology and addressing human errors in code development, emphasizing the need for error handling mechanisms.
Policy Change Leading to API Loop
Detailing a policy change on May 29th, 2025, that triggered an API loop due to a feature not being executed properly, causing chaos and panic.
Rollback Procedure and Recovery
Implementation of a rollback procedure with a 'big red button' to address the incident and restore normalcy after the chaos caused by the bad code.
Introduction to Post Hog AI Product
Introduction to Post Hog AI-powered product, Max, integrated within the Post Hog app to enable various functionalities like natural language questions and feature flags.
Get your own AI Agent Today
Thousands of businesses worldwide are using Chaindesk Generative
AI platform.
Don't get left behind - start building your
own custom AI chatbot now!