Important disclaimer for anyone using Cursor: make sure to disable "data sharing" in your account settings, as it is enabled by default and old accounts are automatically opted into it.

Devil's advocate: why? Why do people need to make sure to disable it?

Is this the "but privacy!111" stance, or the Dog in the Manger stance ("I'm not being paid for it, so why should anyone else benefit?"), or...?

Also from a security perspective. People have been able to extract copyrighted code / API keys some LLMs have been trained on before. If you opt-in to this, your / your company code will be used to train and improve the model. People may then be able to extract that from the model. Another threat vector.

Always appreciate people seeking clarity on positions that are not clearly elaborate/taken for granted. Kind of stresses me out sometimes.

> Is this the "but privacy!111" stance

Mostly, yes. But let's unpack this a little: these companies often claim a massive amount of data is being collected "to improve our products" and "to make the world a better place." In reality, the data is more often used to deanonymize the user and build a precise profile of them. It is subsequently sold to third-party companies who specialize in this sort of business and are equipped to extract the most value possible from the data. In the most benign form, this is done by targeting the user with personalized ads.

Nowadays, this data-mining process is almost completely automated, and there are ways to legally cover your ass by stripping the datasets of too-directly identifiable information. If you audit these databases, your name, phone number, email, and what you like having for dinner are probably not going to appear together in a single row. However, a direct mapping is trivial to recover by correlation and inference. The profile that can be built from this data is usually very precise.

People who say "muh privacy!!111" tend to find this whole process fundamentally icky. I personally don't like opening an IDE like Cursor and feeling like I'm stepping into some KGB-era hotel room with microphones in the walls and a two-way mirror in the bathroom.

Of course, it is completely up to you to disagree and continue uploading your codebase, prompts, and session data to Cursor's servers and give Cursor permission to inspect this data and send it to third parties at their leisure. I am merely giving my advice, which I admit is very biased.

Personally, what I find the most reprehensible is that this feature is opt-out instead of opt-in. I strongly suspect that most users would not agree to enable data sharing if they were asked directly.

Do you have evidence for those claims? I don’t mean to be contrary or subversive, I’d just be interested in seeing how this is actually taking place.

To be very precise about session recording: you can inspect the Cursor binary and see that it comes bundled with rrweb and full telemetry infrastructure setup is in place: mouse movements, clicks, scroll positions, etc. on top of the codebase and prompts being sent over the wire.

However, I have edited my other claims for now and you can consider them provisionally retracted. My original advice about turning off data sharing stands. You are right to ask for more evidence given the severity of the claims. I think this merits a deeper dive, and a throwaway hacker news comment might not be the best channel for it. Stay tuned ;)

[deleted]