Perfect example of a base rate fallacy - https://en.wikipedia.org/wiki/Base_rate_fallacy
What percentage of GitHub activity goes to GitHub repos with less than 2 stars? I would guess it's close to the same number.
Perfect example of a base rate fallacy - https://en.wikipedia.org/wiki/Base_rate_fallacy
What percentage of GitHub activity goes to GitHub repos with less than 2 stars? I would guess it's close to the same number.
My reaction as well -- I have a few dozen public repos of 100% human-written code, most are 0 stars!
The first thing I do when I make a new repo is star it myself ;-)
Didn't know I could do that, I assumed it wouldn't be permitted.
This likely tripled the amount of stars I have.
I think this is best practice.
https://knowyourmeme.com/memes/obama-awards-obama-a-medal
Half way there
Tell me about it!
I have a few dozen org repos, of course none of them have stars, who stars their corporate repos?
> who stars their corporate repos?
workers on the management track
We need to have a talk about your pieces of flair.
My private repos also have 0 stars!
(But I don't use AI on them.)
Starring GitHub repos considered harmful.
The actual number is that 98% have less than 2 stars (0 or 1). About 90.25% has zero stars.
So Claude repos are statistically more likely to have stars than the average GitHub repo. Not the conclusion the headline was going for.
But the header is just "90% of Claude-linked output going to GitHub repos w <2 stars". No conclusion, just some random fact.
The problem is that this title is editorialized, and the fact is cherry-picked. Why not =0? Why not >1000? This is just a dashboard, it highlights "Interesting Observations", but stars statistics is not there.
Sounds like Claude commits are, on average, going into higher visibility repositories than humans… maybe the author would like to reconsider their approach?
Well, you can't reconsider your approach when you don't like the results.
If anything, the fact that this is what he arrived at, even when starting with the opposite position, is proof of the validity of this result.
Yes, stars can mean a lot of things.
- visibility
- popularity (technical, domain, persona)
- genuine utility
- novelty
...
There are also plenty of super high utility repos that are widely used (often indirectly), but don't have a lot of stars, or even a meagre amount.
Also there is the issue of star != star, because it's not granular.
It's similar to upvotes on general social media platforms. Everyone likes cute cats doing funny things somewhat, but only few people appreciate something that's more niche but way more impactful, useful or entertaining (or requires some effort to consume), but those who do, value it very highly. But the same person might use the same score (single upvote) for a cat video and a video that they value much higher.
You should check recent commits, because obviously there are a lot of forked 0 star repos.
I think this is useful in answering the grandparent comment's question:
stars : uniq(k)
1 : 14946505
10 : 1196622
100 : 213026
1000 : 28944
10000 : 1847
100000 : 20
each line (mostly) being equal length provides me an odd comfort
power law distribution ~1/x I think
Zipf's law?
only 80 more repos need 100000 stars and all lines would be equal! e.g.
1 : 14946505
10 : 1196622
100 : 213026
1000 : 28944
10000 : 1847
100000 : 100
you would lose 80 repos from "10000 : 1847" also in that case.
interesting that you only need ~150 stars on a project for it to be in the top 1%
Let's establish a roving band of ~150 GitHub users that go around 1% things.
Funny how everyone gravitated towards analysis of the star distribution of REPOS when the headline claim is on ADDITIONS. If you look at my comment below (invite you to verify the stats), the distribution of additions by star count is far more weighted to 2+ star repos in GitHub overall. The observation is meaningful, up to the observer to draw a conclusion. Is Claude just speeding up output or is it generating piles of spaghetti code with no use? Considering the get rich quick economy that has sprung up around app development, I'm inclined to at least consider the latter.
How do you know that?
https://ghe.clickhouse.tech/
[flagged]
It is relevant because if the vast, vast majority of repos have 2 or less stars then it's not that weird that a great deal of repos linked are, too, 2 or less stars.
[flagged]
Yeah. Most of my public repos have 0 stars. Most of what I write sucks.
GitHub Stars (or any online 'star count') is not an indicator of quality.
> not an indicator of quality
I mean, it’s an indicator. Just not a definitive—or individually sufficient—one.
Stars occasionally correlate with quality but more often it's timing and naming. I have a total of 40k stars on GitHub, and I know the code is shit in most of those repos (many written back when I was 16-18 as I was just learning to code). Jumping on hype trains before they start is how you get stars.
Yeah, but knowing something sucks means you are probably reasonably competent at coding. =3
https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
Even if you're not correct, I respect your positivity and constructive attitude
It's good to raise people's expectations of themselves
Self-reported studies are arguably weaker evidence, but are common in some areas for ethics reasons. In general, if errors are truly random, than they will cancel out over larger/frequent population samples.
The study conclusion inferred the skills needed to be effective at some task, are the same skills needed to correctly evaluate if you are actually proficient at the same tasks.
https://arxiv.org/abs/2505.02151
If the data infers another explanation is more applicable, than I'd be interested in the primary papers/studies the editorialized opinion seems to have omitted. =3
No it doesn't. The people with the lowest self perception also have the lowest actual skill. Look at the chart:
https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect#...
I guess you linking to it was a self fulfilling prophecy
If you read your own reference (not the picture, but where you took it from on Wikipedia) really really carefully, you might be able to tell why it so perfectly applies to you
The person with little knowledge overestimates they're capability, and the person which actually knows how complicated [the thing] is , usually isn't as confident they mastered it.
Your take on that makes absolutely no sense
You’re talking about a confidence and ability gap. I have heard of the Dunning-Kruger effect. I accept all of that.
But the claim above was that having low confidence was correlated to higher skill. Ie, skill and confidence are anti correlated. The chart does not show that. The lowest data point for confidence is the point on the left of the chart. This is also the data point corresponding to people who have the least competence. Having low confidence is not evidence that you’re secretly an expert. Confidence and competence are still positively correlated according to that chart.
The Dunning-Kruger effect is not so strong that there are scores of novices convinced they are experts in a field. But in your case, I admit the data may not tell the full story.
"Nor would a wise man, seeing that he was in a hole, go to work and blindly dig it deeper..." ( The Washington Post dated 25 October 1911 )
"Baloney Detection Kit"
https://www.youtube.com/watch?v=aNSHZG9blQQ
Best regards =3
That isn't what that shows, and the article you linked to even warns:
> In popular culture, the Dunning–Kruger effect is sometimes misunderstood as claiming that people with low intelligence are generally overconfident, instead of denoting specific overconfidence of people unskilled at particular areas.
Dunning-Kruger has also been discredited with suggestion they may have been over confident themselves:
The Dunning-Kruger Effect Is Probably Not Real (2020) https://www.mcgill.ca/oss/article/critical-thinking/dunning-...
Debunking the Dunning‑Kruger effect – the least skilled people know how much they don’t know, but everyone thinks they are better than average (2023) https://theconversation.com/debunking-the-dunning-kruger-eff... the Dunning‑Kruger effect – the least skilled people know how much they don’t know, but everyone thinks they are better than average
Are you replying to the wrong comment? The person you're responding to seems to make the same point
Self-reported studies are arguably weaker evidence, but are common in some areas for ethics reasons. In general, if errors are truly random, than they will cancel out over larger/frequent population samples.
The study conclusion inferred the skills needed to be effective at some task, are the same skills needed to correctly evaluate if you are actually proficient at the same tasks.
Or put another way, the <5% population of narcissists by their nature become evasive when their egos are perceived as threatened. Thus, often will pose a challenge in a team setting, as compulsive lying or LLM turd-polishing is orthogonal to most real world tasks.
People are not as unique as they like to believe, and spotting problems is trivial after you meet around 3000 people. Best to avoid the nonsense, and get outside to enjoy life. Have a great day =3
No idea why we all get negative karma on this thread, as I do respect a cited source opinion even if we disagree. Do have a look around for papers rather than editorialized content in the future, and note account LLM agent output is a violation of YC usage policy. Have a great day =3
https://arxiv.org/abs/2505.02151
Doesn’t matter if the recruiter doesn’t call you back because you’re not a 1000x engineer.
Why would anyone settle for underpaid positions from an agency taking a 7% contract cut, and purging CVs from any external firm also contracting with their services.
Most people figure out this scam very early in life, but some cling to terrible jobs for unfathomable reasons. =3
> Why would anyone settle for
The answer to such questions is always that, given their circumstances, they have no realistic choice not to.
This is very obvious, and it's frustrating to continually see people pretend otherwise.
> they have no realistic choice not to
If folks expect someone to solve problems for them, than 100% people end up unhappy. The old idea of loyalty buying a 30 year career with vertical movement died sometime in the 1990s.
Ikigai chart will help narrow down why people are unhappy:
https://stevelegler.com/2019/02/16/ikigai-a-four-circle-mode...
Even if folks are not thinking about doing a project, I still highly recommend this crash course in small business contracts
https://www.youtube.com/watch?v=jVkLVRt6c1U
Rule #24: The lawyers Strategic Truth is to never lie, but also avoid voluntarily disclosing information that may help opponents.
Best of luck =3
> If folks expect someone to solve problems for them
In this type of situation, the fundamental issue is that making progress depends on many people acting in unison to increase their bargaining power, which is (a) hard to arrange even if everyone who acted this way would benefit, and (b) actually may be detrimental to some people (usually the high performers).
I agree it is nearly impossible to alter the inertia of existing firms. Most have entrenched process people that defend how things are done right up until a company enters insolvency. Fine if you sell soda or rubber tires, but a death knell for technology or media firms.
In my observations it is usually conditioned fear, personal debt-driven risk aversion, and or failure to even ask if the department above you is really necessary. These days, it is almost always easier to go to another firm if you want a promotion. =3
+1 star for ttul
OP. I'm not a confident SQL user so encourage someone to double check this. From what I can tell have been 2.21x10^12 additions on GitHub in total, 6.36x10^11 of which are in repos with 2+ stars. That's about 29%. People earlier were comparing the star distribution of repos which is not really what this is about - it is about OUTPUT, as measured by additions.
Interestingly, there are 21.37b commits in GitHub, implying 104 additions per commit. Per the dashboard, Claude is linked to 20.81m commits and 50.44b additions - or 2,424 additions/commit. So additions for Claude-linked repos is higher, and it's actually higher for repos with 0-1 stars (2,568 additions/commit for Claude, 91 for all GitHub). None of this is a smoking gun but aligns with the intuition that Claude is producing enormous amounts of code. TBD whether it is 'adding value'.
Would be appreciative of anyone who verifies/invalidates this. https://play.clickhouse.com/ https://ghe.clickhouse.tech/#clickhouse-demo-access
Activity isn't a good measurement for this, because AI can vibeslop thousands of lines per day of code that isn't necessarily useful for anything but increasing activity.
Off topic, but it reminds me of another principle: every geographic heatmap is just a population map. https://xkcd.com/1138/
That, or https://i.redd.it/soy72dye93o91.jpg
https://reddit.com/r/PeopleLiveInCities/
Yep, every time I see a heatmap of Australian lotto winners - very high correlation with Australia's population.
shouldn't a serious heatmap (or any comparative graph for that matter) normalize the stat being displayed versus the baseline population in that bucket?
in otherwords, plot the percentage or average metric and not the absolute metric.
e.g. number of lotto winners per thousand people living in that grid, percentage of starred repos as a percentage of all repos, per capita alcohol consumption, average screen-time etc.
Edit: unless ofcourse the point of the heatmap is to show the population distribution itself. In which case the metric would be number of people per square kilometer or some such.
Still would just show where people live. If nobody lives there, you've got a null (or divide by zero) spot on the map ... so you just show where people live.
Yes, this is why we use per capita stats for basically everything.
There is still a sampling bias if you compare blanket human written repos. I would guess people are far more likely to share their homework assignments, experiments, hackathon results, weekend toys, etc. as a public repo if they put some amount of work into it. I would guess minority of those would get any stars at all. If the whole thing was generated by AI in less then 20 minutes, I would guess they are more likely to simply throw it away when they are done with it.
Personally I think comparing github stars is always going to be a fraught metric.
Exactly, just pick one subset and make out as if a base rate is because of this one specific set. Backwards logic...
[dead]
When I first got a job, I asked the company okay, how many people are going to use the code i write?
If the answer wasn't in hundreds of request per seconds, i wasn't interested in job.
I found job at ad tech companies, pay wasn't any good but the challenges were immense.
Most people write code, which will hardly be run by other people or even receive any customers.
One of the products my employer builds is used twice a year. People pay tens of thousands of dollars for the privilege of using it twice a year. It's tremendously valuable to be used twice a year.
Value and use are not always synonymous.
My wife uses me twice a year.