AI is in danger of peeing in it's own water source. It's unbelievably useful at imitating and generating content, but it needs enough original content to be able to train and scrape.

Google got one thing wrong and nearly destroyed the internet - people need to have an incentive to contribute content online, and that incentive should not be to game the system for advertising.

This in particular dawned on me when asking Claude for instructions in taking apart my dryer. There was literally only one webpage on the internet left with instructions for my particular dryer - the page was more or less unusable with rotten links and riddled with adware. Claude did it's best but filled in the missing diagrams with hallucinations.

I was imaging if LLMs could finally solve the micropayments solution people have always proposed for the internet. Part of my monthly payment gets split between all of the sites that the LLM scraped knowledge. Paid out like Spotify pays out artists.

It might not be a lot of money, but it would certainly be more than the pitiful ad revenue you get from posting content online right now. And if I want to upload corrected instructions for repairing this dryer I would have reason to.

> Paid out like Spotify pays out artists.

So, mostly to fraudulent AI spam?

AI makes this problem worse in both directions. It makes it fantastically easy to produce ""content"". So if you're scraping content, or browsing content, you're going to run in to increasing amounts of AI. Micropayments makes this worse, because it's then a means of getting paid to produce spam. The problem comes when you want the ""content"" to be connected to real questions like "how does my dryer work" or "what is going to happen to oil availability six months from now".

AI trainers didn't pay book authors until forced to. $3,000 ended up being a pretty high value! But it was also a one-off. Everyone writing books from now on is going to have to deal with being free grist to the machine.

> So, mostly to fraudulent AI spam?

Spotify does not pay out mostly to AI spam.

Their pay scales by listens. The AI spam doesn’t collect many listens. The spammers do it because they can automate it and make it low effort, but it’s not a cash cow for the spammers.

An interesting listen https://darknetdiaries.com/episode/171/ about money laundering and spam in streaming services

Spammers do it because it pays out.

> Paid out like Spotify pays out artists.

As others said, Spotify pays shit for artists, but maybe that's the problem with the whole thing here. It should be more like how Bandcamp pays artists (80% to the artists, 20% for Bandcamp), but then the rapacious economy supporting the largest LLM providers would collapse and (wipes away a single tear) we'd all have to use simpler, cheaper, most likely local models.

> Paid out like Spotify pays out artists.

That's probably not the best comparison. Spotify only benefits the big players resp. those with the most bots. If you actually want to support specific artists, you'd have to use Bandcamp or similar sites.

I think most labs actively create synthetic data using existing model as part of the mix for the pretraining stage for their next model.

Would love to know exactly what the latest process is to keep slop out of training data.

I think everyone overblows the whole "AI is poisoning AI!" thing. It could be a problem but the genuine value in Reddit or any other human social media is honestly pretty low from my estimates. It's great for seeing how humans talk but in terms of 'nutritional' value for truth or answers... I am not sold. If I was choosing what to 'feed' AI, I wouldn't even bother with textual social media (besides Github / Gitlab / other source control)

There's way more value, if seeking out answers, in following the links to external sources, scraping books, and other sources that aren't "unwashed masses saying whatever they want".

> the genuine value in Reddit or any other human social media is honestly pretty low from my estimates. It's great for seeing how humans talk but in terms of 'nutritional' value for truth or answers...

> ...

> scraping books, and other sources that aren't "unwashed masses saying whatever they want".

The problem is there's a lot of knowledge that only exists as reddit comments, blog posts, or social Q&A.

You can put it in scare quotes all you want, doesn't stop you from sounding like Scrooge McDuck.

const isAiContent = (str) => str.includes('—');?

:)

Latest generation LLM's use en dashes instead of em dashes to avoid detection.

No, they don’t. But obviously GP was tongue–—in-–cheek.

> in danger

It has already done so, and we can be confident in saying that.

Verified content will always be relatively expensive when compared to AI content.

Visits to wikipedia and most sites have dropped. Rtings has gone full paywall. Ad revenue for producing Verified content will be too meager to allow for public consumption.

Theres jokes about GenAI being the great filter; while I doubt this, I do hope this is the final push that makes us think of how we want our information commons to be nurtured.

> Verified content will always be relatively expensive when compared to AI content....

> Visits to wikipedia and most sites have dropped. Rtings has gone full paywall. Ad revenue for producing Verified content will be too meager to allow for public consumption.

AI is a technology that's going to further entrench inequality, by warping incentives to push us further away from democratization. Unless you've got $$$ to drop on verified content, you'll be served prolefeed slop and be that much more ignorant.

At this point, it feels like most technology will be used in favor of people with power, and not in a democratizing manner.

I'd argue that this is something that is more about the state of play, than tech itself.

> I was imaging if LLMs could finally solve the micropayments solution people have always proposed for the internet. Part of my monthly payment gets split between all of the sites that the LLM scraped knowledge. Paid out like Spotify pays out artists.

As a software user I wish I could do the same for all the software I use.

Many open source projects accept donations. There's also explicitly paid-for software. What exactly do you wish for that you can't do right now?

Specifically the part where engineers get paid the same way as artists on Spotify.

So a handful will make a buttload but the vast majority won't make enough to pay rent?

Certainly that's how open source pans out.

So not at all for their work and with a reverse Robin Hood model? That would be terrible for software. The way artists gets paid on streaming is a genius play at catering to the biggest artists and labels and screw over the smaller ones, especially true on Spotify with their freemium model

> I was imaging if LLMs could finally solve the micropayments solution people have always proposed for the internet. Part of my monthly payment gets split between all of the sites that the LLM scraped knowledge. Paid out like Spotify pays out artists.

This system is usually called taxes.

Which then pay for the universal healthcare, free education, affordable housing, libraries, parks,.. and so on.

LLM doesn't need to invent it, we should stop allowing them (people and companies behind LLM) to avoid it.