Incredible.

> A while ago, we discovered a way to scrape Spotify at scale.

They wont and shouldn’t divulge the details, but I imagine that would be a fun read!

It is not hard. But please don't misuse it and ruin the fun for everyone. It is nice to be able to use the music relatively easily for hobby projects. My music server has functionality to play tracks from Spotify this way:

https://codeberg.org/raphson/music-server/src/branch/main/sp...

Where the magic actually happens: https://github.com/librespot-org/librespot

I wonder how many premium accounts Anna’s Archive had to use to scrape the whole thing. Surely Spotify has scrape protection and wouldn’t allow a single account to stream (download) millions of separate tracks.

I have a feeling they didn't use premium accounts since they downloaded at 160kbit/s, which is the highest quality that free accounts can get.

Premium gets 320kbit/s (or lossless)

I haven't looked at the code but I would be surprised if the premium account "requirement" is anything more than an if statement that can be commented out.

You are correct

Pretty sure that requirement is server-side?

What do you mean? You can still stream any song with a free account. It's just that there will be ads. Additionally, in mobile apps, there will be ridiculous artificial limitations to make sure your experience is as miserable as it could possibly be.

My understanding is that the premium requirement is there to avoid having the repo taken down.

My understanding, based on a related comment in this thread, was that premium accounts get higher quality; in that case, I figured any such checks would be server-side.

If you were referring to a separate check in the above repo's code, my mistake.

Hm, maybe. I don't remember whether they offer higher quality. If they do, it would make sense to have that check on the server side. It's been a while since I last used Spotify because they deleted my account in 2022 without warning when they left Russia.

But I was referring specifically to all third-party reverse-engineered Spotify players requiring premium accounts to function at all.

How they manage to transfer 300TB of data while remaining anonymous is also astonishing.

Rent a dedicated server, setup mullvad wireguard on it or whatever. Download stuff to said server using wireguard.

Sure, you can also use Tor. The people engaged in copyright-related illegality generally don't.

But then you need to rent a server without leaving any hint on your real identity. Which means going to some dodgy corners of the internet.

I certainly wouldn't attempt

Depends on your threat model, you'd probably have to be scraping at a pretty large scale for anyone to try pursuing you through vpn providers.

I would guess this can be hidden under normal music streaming activity? But one would need lots of proxies!

It's hard to imagine anything but physical egress for that kind of volume.

50 free accounts continually streaming music rack up 20 TB in a month. So that would take about 1.5 years. Our you use 750 accounts and do it in a month.

I would say it's weird they don't rate limit accounts but probably having a device play music pretty much all the time isn't even that rare of a use case.

That’s if they pretend to stream the music. If they are using throwaway free accounts I imagine they can download the DRM-stripped files much more quickly.

True, but I could see them rate limiting that much more aggressively than streaming.

You can download playlists for offline use, it'll go pretty fast. I doubt they monitor it that hard.

You can probably just buy a thousand hacked spotify accounts for not much more than $1 a piece

I mean 300TB is nothing for a streaming service, like it woudn't even show on a dashboard. They probably did that over weeks which is invisible.

"at scale" could mean they had direct access to a server or to storage, maybe because they had an insider giving them access, or they found secrets that had leaked somewhere?

they're probably just using something like https://github.com/nor-dee/spotizerr-spotify

No way, that would take far too long.

Probably not, those tools don't actually download Spotify tracks at source quality.

There are tools that actually download directly from Spotify (needs premium then) but yeah most of them just use the search and download from other sources like YouTube without mentioning it. I won't say which tools download directly out of fear that they get killed but they exist.

Sadly since zspotify was killed I don't know of any remaining tools.

votify