They also have a contract with Reddit to train on user data (a common go-to source for finding non-spam search results). Unsure how many other official agreements they have vs just scraping.
They also have a contract with Reddit to train on user data (a common go-to source for finding non-spam search results). Unsure how many other official agreements they have vs just scraping.
Good heavens, I'd think that if anything could turn an AI model into a misanthrope, it would be this.
One distinctive quality I've observed with OpenAI's models (at least with the cheapest tiers of 3,4 and o3) are their human-like face-saving when confronted with things they've answered incorrectly.
Rather than directly admit fault they'll regularly respond in subtle (moreso o3) to not so subtle roundabout ways that deflect blame rather than admit direct fault, even when it's an inarguable factual error about even conceptually non-heated things like API methods.
It's an annoying behavior of their models and in complete contrast to say Anthropic's Claude which ime will immediately and directly admit to things it had responded incorrectly about when the user mentions it (perhaps too eagerly).
I have wondered if this is something its learned based on training from places like Reddit, or if OpenAI deliberately taught it or instructed via system prompts to seem more infallible or if models like Claude were made to deliberately reduce that aspect.
> It's an annoying behavior of their models and in complete contrast to say Anthropic's Claude which ime will immediately and directly admit to things it had responded incorrectly about when the user mentions it
I don't know whats better here. ChatGPT did have a tendency to reply with things like "Oh, I'm sorry, you are right that x is wrong because of y. Instead of x, you should do x"
> Rather than directly admit fault they'll regularly respond in subtle (moreso o3) to not so subtle roundabout ways that deflect blame rather than admit direct fault
Human-level AI is closer than I'd realised... at this rate it'll have a seat in the senate by 2030.
They are already passing ChatGPT written laws
https://hellgatenyc.com/andrew-cuomo-chatgpt-housing-plan/
I cannot think of a worse future for AI than parroting Reddit comments.
the upside of reddit data is you have updoots and downdoots, so you can positively and negatively train your AI model on what people would typically upvote, and train against what they might downvote
Now, that's the upside, the downside is you end up with an AI catering to the typical redditor. Since many claims there are formed on the basis of, "confident, sounds reasonable, speaks with authority, gets angry when people disagree" - hallucinations happen. Rather we want something like "produces evidence-based claims with unbiased data sources"
They also have syndication agreement with The Guardian [0]. I lost all my respect for The Guardian after seeing this.
[0]: https://www.theguardian.com/gnm-press-office/2025/feb/14/gua...
Why shouldn’t they license their content? It’s theirs and it’s a non-profit that needs revenue.
It's not that they shouldn't license their content. I'm not a fan of OpenAI and their "fair use" of things, to be honest.
But this is the opposite of fair use. They're licensing the content, which means they're paying for it in some fashion, not just scraping it and calling it fair use.
If you don't like the fair use of open information, I would expect you to be cheering this rather than losing respect for those involved.
Why? I can't see any issues with this at all, whatsoever.
Everybody is free to have their own opinions. I don't like how AI companies and mostly OpenAI "fair-uses" whole internet, so there's that.
Again, like others have contested with you, how is this The Guardian's fault to have issue with? They convinced ClosedAI to give them money in a licensing deal to use their content as training data without having it scraped for free.
Your sense of injustice or whatevs you want to call it is aimed in the opposite direction.
This isn't even fair-use defence, they're paying to use it expressly for this purpose.