Yea but thats not really an excuse, is it? They offer a service, (some) people pay for that service and should therefore expect it to work. If GitHub cannot keep up with the growth then they could disable new account registrations or start reducing free tiers so people either use the free tier more mindfully or need to pay for usage-base products like Actions which would GitHub allow to scale.
I mean it's an easy problem to solve when it's just speculating solutions. But there's a very possible reality where in 5 years guys are making YouTube video essays about the fall of Github caused by their "obviously stupid decision" to throttle access to people who were trying to use their service in record numbers, leaving opportunity for someone else to come in and take their lunch.
I don't envy their position of having to scale that fast on something that has to be instant and real-time. As far as I know, you can't do CDN/edge caching shenanigans with a remote git repository like Google can with a YouTube video. It's gotta always be reading/writing to the latest, single source of truth.
Sure, backseat commenting is easier and I wouldn't wanna be in charge at github right now, but on the other side there also a reality where we'd see video essays about githubs downfall because their reliability crashed so hard that businesses could not trust them and moved to competitors / self hosted instances which then meant less paid users to subsidize the ever growing demand of the free users.
Yes it's potentially a write-heavy workload which also needs to be consistent aka the worst case scenario.
The easy solutions like caching and read replicas don't work and you're forced to go the route of sharding or similar techniques that have much more painful tradeoffs.
I'm not sure if that's why everything keeps breaking but at that scale write-heavy workloads are never going to be easy
However, they have reported numbers along rather inconsistent dimensions. Like, historically they've focused on number of repos and users and later PR's and issues, and often catch-all terms like "contributions" which includes all of those + comments etc... but the number of commits alone (which apparently is the main culprit now?) has been mentioned very sporadically. This has made it hard to get a consistent sense of historical growth.
Without any other information, however, it is reasonable to assume that a 14x in commits is the prime candidate for instability. Especially since commits are write traffic, which is much harder to scale than read traffic. Plus every 3 - 5x increase in scale can reveal bottlenecks in your distributed systems that you never knew existed, so they probably have like 2 - 3 "generations" of bottlenecks to figure out!
Commits are up 14x year-over-year
https://x.com/kdaigle/status/2040164759836778878
Yea but thats not really an excuse, is it? They offer a service, (some) people pay for that service and should therefore expect it to work. If GitHub cannot keep up with the growth then they could disable new account registrations or start reducing free tiers so people either use the free tier more mindfully or need to pay for usage-base products like Actions which would GitHub allow to scale.
I mean it's an easy problem to solve when it's just speculating solutions. But there's a very possible reality where in 5 years guys are making YouTube video essays about the fall of Github caused by their "obviously stupid decision" to throttle access to people who were trying to use their service in record numbers, leaving opportunity for someone else to come in and take their lunch.
I don't envy their position of having to scale that fast on something that has to be instant and real-time. As far as I know, you can't do CDN/edge caching shenanigans with a remote git repository like Google can with a YouTube video. It's gotta always be reading/writing to the latest, single source of truth.
Sure, backseat commenting is easier and I wouldn't wanna be in charge at github right now, but on the other side there also a reality where we'd see video essays about githubs downfall because their reliability crashed so hard that businesses could not trust them and moved to competitors / self hosted instances which then meant less paid users to subsidize the ever growing demand of the free users.
Yes it's potentially a write-heavy workload which also needs to be consistent aka the worst case scenario.
The easy solutions like caching and read replicas don't work and you're forced to go the route of sharding or similar techniques that have much more painful tradeoffs.
I'm not sure if that's why everything keeps breaking but at that scale write-heavy workloads are never going to be easy
Not a valid excuse without knowing what their historical growth rate has been. And how much of the instability is load related.
GitHub has been publishing their growth numbers since at least 2016: https://octoverse.github.com/2016/
However, they have reported numbers along rather inconsistent dimensions. Like, historically they've focused on number of repos and users and later PR's and issues, and often catch-all terms like "contributions" which includes all of those + comments etc... but the number of commits alone (which apparently is the main culprit now?) has been mentioned very sporadically. This has made it hard to get a consistent sense of historical growth.
Without any other information, however, it is reasonable to assume that a 14x in commits is the prime candidate for instability. Especially since commits are write traffic, which is much harder to scale than read traffic. Plus every 3 - 5x increase in scale can reveal bottlenecks in your distributed systems that you never knew existed, so they probably have like 2 - 3 "generations" of bottlenecks to figure out!