That time Google Cloud Platform bricked the Internet…


Summary

The video discusses a recent incident of bad code being pushed into production, causing major outages across popular internet services. Google Cloud took responsibility, apologizing for the disruption, shedding light on the significance of big infrastructure like Google Cloud. Financial implications of such outages are explored, alongside the impact on market share compared to Azure and AWS. The incident was traced back to an API management service issue and a binary crash due to lack of error handling, emphasizing the importance of AI and proper development practices. Additionally, a new policy change triggered chaos, but a rollback procedure with a 'big red button' was implemented to restore normalcy.


Introduction to Bad Code Incident

Discussion of a recent incident where bad code pushed into production caused major outages across the internet, including Snapchat, Spotify, Discord, and Cloudflare's workers KV service.

Impact of Bad Code on Internet Services

Exploration of the repercussions of the bad code incident on various internet services and websites, leading to significant error rates and downtime.

Apology from Google Cloud

Google Cloud taking responsibility for the bad code incident and offering apologies for the disruption caused to favorite apps and services.

Power of Big Infrastructure

Highlighting the importance and power of big infrastructure like Google Cloud in today's technological landscape.

Financial Impact of Major Outages

Discussion on the financial implications of major outages like the recent incident, which can result in significant losses for companies.

Service Level Agreements with Cloud Providers

Explanation of service level agreements with cloud providers and the criteria for financial compensation in case of violations.

Market Share Impact on Google Cloud

Assessment of the impact of the outage on Google Cloud's market share in comparison to Azure and AWS.

Root Cause Analysis of the Incident

Investigation into how the bad code incident occurred, involving an API management service issue and a binary crash due to lack of proper error handling.

AI in Technology and Human Error

Exploring the role of AI in technology and addressing human errors in code development, emphasizing the need for error handling mechanisms.

Policy Change Leading to API Loop

Detailing a policy change on May 29th, 2025, that triggered an API loop due to a feature not being executed properly, causing chaos and panic.

Rollback Procedure and Recovery

Implementation of a rollback procedure with a 'big red button' to address the incident and restore normalcy after the chaos caused by the bad code.

Introduction to Post Hog AI Product

Introduction to Post Hog AI-powered product, Max, integrated within the Post Hog app to enable various functionalities like natural language questions and feature flags.

Logo

Get your own AI Agent Today

Thousands of businesses worldwide are using Chaindesk Generative AI platform.
Don't get left behind - start building your own custom AI chatbot now!