Devil's advocate: why? Why do people need to make sure to disable it?

Is this the "but privacy!111" stance, or the Dog in the Manger stance ("I'm not being paid for it, so why should anyone else benefit?"), or...?

Also from a security perspective. People have been able to extract copyrighted code / API keys some LLMs have been trained on before. If you opt-in to this, your / your company code will be used to train and improve the model. People may then be able to extract that from the model. Another threat vector.

Always appreciate people seeking clarity on positions that are not clearly elaborate/taken for granted. Kind of stresses me out sometimes.

> Is this the "but privacy!111" stance

Mostly, yes. But let's unpack this a little: these companies often claim a massive amount of data is being collected "to improve our products" and "to make the world a better place." In reality, the data is more often used to deanonymize the user and build a precise profile of them. It is subsequently sold to third-party companies who specialize in this sort of business and are equipped to extract the most value possible from the data. In the most benign form, this is done by targeting the user with personalized ads.

Nowadays, this data-mining process is almost completely automated, and there are ways to legally cover your ass by stripping the datasets of too-directly identifiable information. If you audit these databases, your name, phone number, email, and what you like having for dinner are probably not going to appear together in a single row. However, a direct mapping is trivial to recover by correlation and inference. The profile that can be built from this data is usually very precise.

People who say "muh privacy!!111" tend to find this whole process fundamentally icky. I personally don't like opening an IDE like Cursor and feeling like I'm stepping into some KGB-era hotel room with microphones in the walls and a two-way mirror in the bathroom.

Of course, it is completely up to you to disagree and continue uploading your codebase, prompts, and session data to Cursor's servers and give Cursor permission to inspect this data and send it to third parties at their leisure. I am merely giving my advice, which I admit is very biased.

Personally, what I find the most reprehensible is that this feature is opt-out instead of opt-in. I strongly suspect that most users would not agree to enable data sharing if they were asked directly.