I'd say the core of their success is running everything in a single rack in a single datacenter at first (for months? a year?) and getting lucky. Life is simple when you don't need the costs and effort of reliability upfront.
I'd say the core of their success is running everything in a single rack in a single datacenter at first (for months? a year?) and getting lucky. Life is simple when you don't need the costs and effort of reliability upfront.
They mention having a second half-rack in a different DC.
In any case, not everyone need five nines, and usually it's just much easier to bring down a platform due to some bug in your own software rather that the core infrastructure going down at a rack level.
The point is valid, they mention adding that, so at one point they didn't have that. They're also only storing monitoring & observability data, that's never going to be mission critical for their customers.
It's probably the main reason why they were able to get away with this and why their application does not need scalability. I see they themselves are only offering two 9s of uptime.
They mentioned having a backup AWS cluster that would spin up when something happens.