Hacker News

No, you just parroted an increasingly popular talking point, the entire purpose of which seems to be to absolve AI companies of the enormous theft that put them in the position to hire experts in the first place.

rafram 6 hours ago [ - ]

Well, I'd never heard anyone make it before, but sure. (I looked into Mercor a bit and know some people who've worked in data generation/labeling, which is what exposed me to that side of the operation.)

It doesn't absolve them of any theft, but it does make the assertion that they should be required to release their models to the public seem, to me, a bit farcical. There are dozens of free and open-weights models that have all trained on exactly the same web crawls and books as GPT-5 and Opus. The proprietary models are better because of proprietary data.

franga2000 6 hours ago [ - ]

Cool, then they can train their proprietary models on their proprietary data only.

Even if the other models were trained on the same data, which is unlikely, since they had less time and money to scrape it and fewer lawyers to be able to do something like pirate, the proprietary models are still largely built on the public data and wouldn't exist without it. At the very least, they should release the intermediate model, before training on their proprietary data. Not that that's how that works...

thom 6 hours ago [ - ]

I agree that saying that they have now trained on lots of proprietary data allows them to muddy the legislative waters further than they already have. What a happy coincidence!

noitemtoshow 6 hours ago [ - ]

I’d suggest you to learn more about how LLM training work. Training on internet data alone will not result in an agent answering your questions.

thom 5 hours ago [ - ]

Sure as shit won't answer them without that though.

mbesto 6 hours ago [ - ]

> The proprietary models are better because of proprietary data

Source? Otherwise this is pure speculation.