As a counterpoint, I find our AWS super team to be a mix of 40% helpful, 40% “things we say are going over their head,” 20% attempting to upsell and expand our dependence. It’s nice that we have humans but I don’t think it’s a reason to choose it or not.

GCP’s architecture seems clearly better to me especially if you are looking to be global.

Every organization I’ve ever witnessed eventually ends up with some kind of struggle with AWS’ insane organizations and accounts nightmare.

GCP’s use of folders makes way more sense.

GCP having global VPCs is also potentially a huge benefit if you want your users to hit servers that are physically close to them. On AWS you have to architect your own solution with global accelerator which becomes even more insane if you need to cross accounts, which you’ll probably have to do eventually because of the aforementioned insanity of AWS account/organization best practices.

There's a very large gap between "seems" and reality. GCP is a huge PITA. It's not even stable to use, as the console is constantly unresponsive and buggy, the UX is insane, finding documentation is like being trapped in hell.

Know how you find all the permissions a single user in GCP has? You have to make 9+ API calls, then filter/merge all the results. They finally added a web tool to try and "discover" the permissions for a user... you sit there and watch it spin while it madly calls backend APIs to try to figure it out. Permissions for a single user can be assigned to users, groups, orgs, projects, folders, resources, (and more I forget), and there's inheritance to make it more complex. It can take all day to track down every single place the permissions could be set for a single user in a single hierarchical organization, or where something is blocking some permission. The complexity increases as you have more GCP projects, folders, orgs. But, of course, if you don't do all this, GCP will fight you every step of the way.

Compare that to AWS, where you just click a user, and you see what's assigned to it. They engineered it specifically so it wouldn't be a pain in the ass.

> Every organization I’ve ever witnessed eventually ends up with some kind of struggle with AWS’ insane organizations and accounts nightmare.

This was an issue in the early days, but it's well solved now with newer integrations/services. Follow their Well Architected Framework (https://docs.aws.amazon.com/wellarchitected/latest/framework...), ask customer support for advice, implement it. I'm not exaggerating when I say this is the best description of the best information systems engineering practice in the world, and it's achievable by startups. It just takes a long time to read. If you want to become an excellent systems engineer/engineering manager/CTO/etc, this is your bible. (Note: you have to read the entire thing, especially the appendixes; you can't skim it like StackOverflow)

I’m sure anyone willing to read through a 1,000 page document can make a well-architected GCP environment, too.

The problem is that no company I’ve ever worked for implemented the well architected framework with their AWS environment, and not one company will ever invest the time to make their environment match that level of quality.

I think what you describe with the web tool to discover user permissions sounds a lot like the AWS VPC Reachability Analyzer which I had to live in for quite a while because figuring out where my traffic was getting blocked between an endless array of AWS accounts and cross-region transit gateways was such a nightmare that wouldn’t exist with GCP global VPCs and project/folder based permissions.

I don’t like the GCP console, but I also wouldn’t consider a lot of the AWS console to be top tier software. Slow/buggy/inconsistent are words I would use with the AWS console. I can concede that AWS has better documentation, but I don’t think it’s a standout, either.

GCP's UI sure is dog slow. I once filled in that 'How would you rate...' prompt that sometimes appears about Instance Group management via the UI and it seems they later addressed the issue.

Similar to my experience with the two. We didn't have regular meetings with our GCP account manager, but they did help us and we had a technical support rep there we were in contact with sometimes. We rarely heard from anyone at AWS, and a friend had some horror stories of reporting security issues to AWS.

Architecturally I'd go with GCP in a heartbeat. Bigquery was also one of the biggest wins in my previous role. Completely changed out business for almost everyone, vs Redshift which cost us a lot of money to learn that it sucked.

You could say I'm biased as I work at Google (but not on any of this), but for me it was definitely the other way around, I joined Google in part because of the experience of using GCP and migrating AWS workloads to in.

> Every organization I’ve ever witnessed eventually ends up with some kind of struggle with AWS’ insane organizations and accounts nightmare.

What are these struggles? The product I work on uses AWS and we have ~5 accounts (I hear they used to be more TBF) but nowadays all the infrastructure is on one of them and the other are for some niche stuff (tech support?). I could see how going overboard with many accounts could be an issue, but I don't really see issues having everything on one account.

I like AWS, but Organizations was something that was retrofit onto the account model versus being part of the original design. GCP had second mover advantage in this area.

The way to automate provisioning of new AWS accounts requires you to engage with Control Tower in some way, like the author did with Account Factory for Terraform.

AWS makes the account model feel retrofit versus being part of the original design and 5 years later someone retrofit the organisations onto that before they they added 90% of the products into any square round hole they could find.

We were saved by the bell when they announced the increased account limit for S3 buckets (1M buckets, now, 1k I think before).

Just before they announced that I was working on creating org accounts specifically to contain S3 buckets and then permitting the primary app to use those accounts just for their bucket allocation.

AWS themselves recommend an account per developer, IIRC.

It's as you say, some policy or limitation might require lots of accounts and lots of accounts can be pretty challenging to manage.

5 accounts would be heaven if that could be my environment.

I have almost 40 AWS accounts on my login portal.

Two accounts per product, one for development environments and one for production environments, every new company acquisition has their own accounts, then we have accounts that solely exist to help traverse accounts or host other ops stuff.

Maybe you don’t see issues with everything in one account but my company would.

I don’t really think they’re following current best practices but that’s a political issue that I have no control over, and I think if you went back enough years you’d find that we followed AWS’ advice at the time.

Global VPCs are very nice but they feel like a single blast radius.

Whether or not your VPC can have subnets in multiple regions is entirely unrelated to security.

I meant failure blast radius. Having isolated regions is a core part of the AWS reliability design. AWS has had entire regions fail but these failure have always been isolated to a single region. Global VPCs must rely on globally connected routers that can all fail in ways AWS regional VPCs can't.

If you need global HA to the extent that you're worried about global VPC failure modes, you're going to have to spend a lot of effort to squeeze uptime to the max regardless of where you deploy.

Undersea cable failures are probably more likely than a google core networking failure.

In AWS a lot of "global" things are actually just hosted in us-east-1.

On the other hand, when they say something is in us-west-2 they mean it, so if another region has an outage your workloads aren't impacted unless your code is reaching out to that region.

Guessing that's similar on the other clouds.

I believe you’re likely misunderstanding Google’s architecture.

The routing isn’t centralized, it’s distributed. The VPCs are a logical abstraction, not a centralized dependency.

If you have a region/AZ going down in your global VPC, the other ones are still available.

I think it’s also not that much of an advantage for AWS to be able to say its outages are confined to a region. That doesn’t help you very much if their architecture makes architecting global services more difficult in the first place. You’re just playing region roulette hoping that your region isn’t affected. Outages frequently impact all/multiple AZs.

Regions and Availability Zones are very different things. Regions are much more isolated.

[dead]