Google Cloud Suspended Railway Without Warning

Two years ago, Google Cloud deleted the entire cloud infrastructure of UniSuper an Australian pension fund managing $124 billion for 615,000 people. No cyberattack. No unpaid bill. A misconfiguration during setup silently set an expiry date on what was supposed to be a permanent subscription. A year later, the account expired. Google's system deleted everything both geographic backup regions included.

It took two weeks to restore. The only reason data survived at all was because UniSuper kept backups with a completely separate provider outside Google's infrastructure.

Google called it a "one-of-a-kind occurrence that has never before occurred with any of Google Cloud's clients globally."

May 19, 2026, Google Cloud did it again.

What Happened This Time

Railway is a developer platform that thousands of companies use to deploy and run their backends, databases, and APIs. It's the kind of infrastructure that, when it goes down, everything built on top of it goes down with it.

At 22:20 UTC on May 19, Google Cloud placed Railway's production account into a "restricted" status. No warning. No advance notice. An automated action swept through Railway's account and, according to Railway's own incident report, this same automated action extended across many other Google Cloud accounts simultaneously.

The cascade went like this:

22:20 UTC Google Cloud restricts Railway's account. VMs offline. CloudSQL instance gone. API down.
22:22 UTC Railway files a P0 emergency ticket with Google.
22:29 UTC Google restores account access. Seven minutes.

Seven minutes to restore access. Eight hours to actually recover.

Because restoring account access isn't the same as restoring services. Once the account was restricted, Railway's network routes started expiring. Workloads hosted on AWS and Railway's own hardware which were completely unaffected by the Google restriction became unreachable because the networking layer that connected them ran through Google Cloud. The entire platform came down even though most of the actual compute was fine.

By the time full service was restored at 06:14 UTC on May 20, Railway had been down for roughly eight hours. Thousands of developers and their users experienced 503 errors, login failures, and complete inability to access their dashboards.

The Same Playbook, Two Years Later

Here's what makes this worth paying attention to beyond a single incident:

The 2024 UniSuper deletion and the 2026 Railway suspension are different failure modes one was a misconfiguration that triggered account deletion, the other was an erroneous automated suspension. But the pattern underneath is identical:

Google Cloud took a drastic account-level action without human review
The action happened without warning to the customer
Redundancy within Google's infrastructure didn't help the action was at the account level, above regional redundancy
Recovery took far longer than the triggering action seven minutes to reverse the suspension, eight hours to recover services

In 2024, Google said it was a "one-of-a-kind occurrence." In 2026, they haven't said much yet. The Register contacted Google for comment on the Railway incident and received nothing.

The Railway Incident Is Actually More Interesting Than It First Appears

UniSuper lost access because of a configuration bug. That's a human error in a setup script embarrassing, but comprehensible.

Railway lost access because of an automated system that suspended their account incorrectly, as part of a sweep that hit multiple Google Cloud accounts simultaneously. Nobody at Google reviewed Railway's account and decided it should be restricted. An algorithm decided, acted, and was wrong.

Railway's CEO Jake said directly on X: "It appears Google Cloud has blocked our account, and so some services are unavailable." The framing was careful "blocked" rather than "deleted" but the practical effect for customers was identical. Nothing worked.

What's unsettling about the automated nature of this is that there's no natural defense against it. You can protect against misconfiguration. You can have backup processes reviewed. You can't easily protect against your cloud provider's automated fraud or compliance detection incorrectly flagging your account and pulling the plug at 10pm on a Tuesday.

Google restored the account access in seven minutes once Railway escalated. But that seven minutes cascaded into eight hours because of how tightly Railway's networking was woven through Google's infrastructure, even though most of their compute ran elsewhere.

The Lesson That Keeps Not Being Learned

After the UniSuper incident in 2024, every serious engineering team had a conversation about cloud dependency. The consensus was clear: if you have critical infrastructure, you need:

Backups outside your primary provider not in another region of the same provider
A recovery plan that assumes your cloud account could be unavailable at any moment
Network and control plane architecture that doesn't create single points of failure through one provider

Railway actually did most of this. They run infrastructure across Google Cloud, AWS, and their own hardware (Railway Metal). On paper, that's good redundancy. In practice, their control plane the API and networking layer ran through Google Cloud. So when Google suspended the account, the multi-cloud setup became irrelevant. The thing that connected everything was the thing that went down.

This is a genuinely subtle failure. Running compute on multiple clouds doesn't help if your control plane is single-provider. The dependency that matters isn't where your servers run it's where your routing, authentication, and API infrastructure live.

Railway acknowledged this directly in their incident report and committed to removing GCP as a dependency for their control plane. That's the right call, and it's probably the most important architectural lesson from this incident.

Is Google Cloud Actually Getting Worse?

This is worth asking honestly.

Google Cloud is the third-largest cloud provider behind AWS and Azure, and the gap to second place is significant. UniSuper in 2024. Railway in 2026. Between those two, there were also incidents where Google Cloud took down customers during maintenance operations that accidentally swept beyond their intended scope.

Gergely Orosz at The Pragmatic Engineer who originally covered the UniSuper incident in 2024 and republished it yesterday specifically because of the Railway incident has been consistent in his assessment: Google Cloud lacks a clear strategy for what it wants to offer, and it shows in how it handles the edge cases that reveal infrastructure culture.

The counterargument is fair: AWS and Azure have had major incidents too. Every cloud provider eventually has an outage that makes headlines. The UniSuper and Railway incidents are qualitatively different from typical infrastructure failures they're account-level actions, not hardware failures or software bugs but that doesn't mean Google Cloud is uniquely unreliable.

What it does mean is that if you're making infrastructure decisions and Google Cloud's track record is part of your thinking, these incidents are relevant data points.

What This Means If You're Building on Cloud

The practical takeaways, stated plainly:

Account-level risk is different from infrastructure risk. Regional redundancy doesn't protect you if the account that owns the regions gets suspended or deleted. Treat them as separate threat models.
Your control plane is your most critical dependency. You can run compute anywhere, but if your routing, DNS, and API layer is single-provider, you have a single point of failure regardless of how many clouds your workloads span.
Automated systems at cloud providers can affect you without warning. There's no contract clause that protects you from an erroneous automated action. Build assuming it can happen.
Backups outside your primary provider aren't optional for critical systems. UniSuper learned this in 2024. The engineers who made that call are the reason the incident was recoverable. Railway is learning a version of this now.
Recovery time ≠ fix time. Google fixed the Railway account access in seven minutes. Full recovery took eight hours. Plan your RTO (recovery time objective) around the cascade, not the trigger.

One More Thing

Railway's response to this incident transparent timeline, honest post-mortem, clear architectural changes committed to publicly was genuinely good. They apologized to customers despite the cause being entirely Google's, published a detailed incident report within 24 hours, and laid out concrete plans to remove the single-provider dependency in their control plane.

The contrast with Google's communication is stark. Seven minutes to restore access, eight hours for customers to recover, and at the time of writing, no public statement from Google Cloud explaining what the automated action was, why it incorrectly swept Railway's account, or how many other accounts were affected.

That gap in communication is, arguably, its own data point.

Railway is a cloud deployment platform used by developers to run backends, databases, and full-stack applications. The incident occurred May 19-20, 2026. The UniSuper incident referenced occurred April-May 2024.

Google Cloud accidentally deletes a $124 billion pension fund in 2024 - Then Did It Again in May 202

What Happened This Time

The Same Playbook, Two Years Later

The Railway Incident Is Actually More Interesting Than It First Appears

The Lesson That Keeps Not Being Learned

Is Google Cloud Actually Getting Worse?

What This Means If You're Building on Cloud

One More Thing

Comments (1)

Tech News

Someone Posted Morse Code on Twitter and Walked Away With $200,000

More from this blog

What is an LLM? How do Large Language Models work?

Someone Posted Morse Code on Twitter and Walked Away With $200,000

Cloudflare Laid Off 1,100 People While Making Record Revenue

Fix 404 DEPLOYMENT_NOT_FOUND Error (Vercel + Hashnode + Cloudflare)

Command Palette

What Happened This Time

The Same Playbook, Two Years Later

The Railway Incident Is Actually More Interesting Than It First Appears

The Lesson That Keeps Not Being Learned

Is Google Cloud Actually Getting Worse?

What This Means If You're Building on Cloud

One More Thing

Comments (1)

Tech News

Someone Posted Morse Code on Twitter and Walked Away With $200,000

More from this blog