Hacker News

epistasis 15 hours ago [ - ]

I think one thing that people are sleeping on is passing a ton of secrets to OpenAI and Anthropic or your OpenRouter by having a .env or secrets on disk in your repo, but not checked in

Your LLM will happily read the entire file, ship it off to be training data for future versions of ChatGPT, and not raise any flags, because let's be fair it was on ok thing to check if all the env vars were set, or it you had set up the database password for the app.

It's time for orgs to audit and rotate secrets wherever they are stored in disk or in logs, and switch to SOPS or Vault or whatever to keep these out if plaintext except exactly when needed.

mooreds 15 hours ago [ - ]

Agreed. Static long lived credentials are real problems. Kudos for AWS and the other hyperscalers for building the tooling to move away from them. And providing some gentle and not-so-gentle nudges away from it too.

But not everyone is where they need to be. For instance, railway doesn't let you access AWS resources via roles/OIDC. I filed a ticket[0] but haven't seen movement.

0: https://station.railway.com/feedback/allow-for-integration-w...

mixologic 14 hours ago [ - ]

Heh, you mean the railway that was part of the whole "my production db got deleted in 9 seconds" story?

That company sounds a lot like one that doesn't focus on the right things.

ted_dunning 3 hours ago [ - ]

Yeah... the railway that has just had a multi-hour outage because they looked like a spam account to Google Cloud!

nrub 14 hours ago [ - ]

I no longer keep my dotenv files in plaintext. I use `sops` to keep an encrypted env around and you can use tools like direnv to make them available to your shell while you're working. Obviously the LLM could print any of these secrets, but it's less likely. Additionally I find that at least claude seems to avoid reading the dotenv. And lastly, don't make any local secrets that important. Limited scope, dev accounts, etc.

theozero 14 hours ago [ - ]

You might like varlock - it helps keep secrets out of plaintext by using plugins to pull from various backends (aws ssm, gcp, vault, 1pass, etc). Also has built in local encryption with shared team vaults coming soon.

Additionally provides pre commit scanning, log redaction, and much more.

Sohcahtoa82 11 hours ago [ - ]

But then you need creds to access AWS SSM, Vault, etc., and those end up getting stored the same way the actual creds you needed were being stored, and you're back at square one.

sneak 11 hours ago [ - ]

Nah you can get machine creds automatically via the metadata service when running inside AWS. Nothing need be on disk.

Sohcahtoa82 8 hours ago [ - ]

That's still not any better.

If the LLM can run any code it writes itself, it can retrieve those credentials. It's just one `curl` away. If you don't let it run `curl`, but you let it run `python`, it can just run a Python script that fetches it using `requests`. Or a Node script that calls `fetch`.

Point is, if creds are accessible programmatically, the LLM can and may try to retrieve them if it thinks it needs them.

epistasis 7 hours ago [ - ]

Aws credentials are short lived precisely so that leaking them has a time limited blast radius.

Automatic retrieval, instead of keeping them on disk, is what makes short lived credentials possible.

Sohcahtoa82 6 hours ago [ - ]

I'm not convinced that time-limiting the blast radius matters. It just means that malicious use of the credentials has to be automated, and that's a pretty damn low bar.

epistasis 14 hours ago [ - ]

SOPS is exactly what I use too, and since it's so old I was using a planning session with an LLM to figure out if there was something more recent that might be more convenient. But Claude stuck with the SOPS rec! (Coupled with `age` for encryption, probably because I had shown an interest in that tool in a different session... memory poisoning is a huge problem I'm having with these tools right now too.)

strbean 5 hours ago [ - ]

Plug for my buddy's project: http://agentsh.org/

Block agents from misbehaving at the OS level instead of asking them to behave.

nijave 6 hours ago [ - ]

In fairness, any secrets in your .env file in your development tree shouldn't have very important secrets. They should be limited access dev secrets and any secrets that go to "production" systems like an OpenAI dev environment should be limited, where possible.

Besides leaking, it's easy to oopsie and DoS a system or send malformed requests in the course of testing and development. You don't want a surprise $1k bill cause someone was working on some test automation and accidentally sent thousands of real results in the process.

doctoboggan 15 hours ago [ - ]

I've noticed recently that at least Claude will try its best not to read your env files. You really need to push it in the prompt if you want it to read and access your DB for example.

nijave 6 hours ago [ - ]

Yeah, I've had Claude read a dev key before by accident and it even stopped, said it read a key, and told me to rotate it immediately.

Assuredly it's not fool proof but it does have safeguards in place.

Ideally you also opt out of training although that doesn't keep it out of the vendor's logs/telemetry.

Short lived credentials, injected identity, and hardware backed tokens are the real solution.

epistasis 14 hours ago [ - ]

We all have our own experiences with these probabilistic tools, but in my personal experience, two out of four my green-field LLM prototypes had early dev secrets in them, and Claude read all of them in both cases.

Here's one interaction, when I was planning through ways to finally get away from the dreaded .env file, I told Claude that it had already read my secrets, and it said:

> This is an important point and I want to be straight with you first.

> ## What already happened in this conversation

> Yes — the Explore agent read your .env and returned the full plaintext contents into the conversation, which means:

> 1. Sent to Anthropic's API — those credentials passed through Anthropic's servers as conversation context

> 2. Cached locally — Claude Code stores session transcripts; your secrets are likely sitting in ~/.claude/projects/ right now

> 3. In this context window — they're in active memory for this session

...

Which I already knew, but it was funny how it suddenly took it very seriously when told what it was doing.

Anything that's in your .bashrc, .zshrc, any environment variables in shells you provide to the LLM, all those are now in the training data of very large overvalued corporations that are desperate to increase their revenue and IPO very soon.

doctoboggan 13 hours ago [ - ]

When did this happen? I think I only started noticing around a month ago that Claude had some new system prompts or some other mechanism that heavily encouraged it to not read secrets. Around the same time I also noticed that if it did read any secrets they were ****'d out in the logs.

epistasis 12 hours ago [ - ]

This was yesterday. It's an early stage project and I would have never created a .env file on my own, but I had let Claude get pretty far along on the PLAN.md before I decided to clean up a bit.

Nothing lost for me here, fortunately, but it's definitely a big foot gun that I've never seen mentioned in any of the Vibe Coding or LLM Agent Coding training courses that the security team has forced me to do.

jermaustin1 12 hours ago [ - ]

That's interesting to me, because Claude never creates the .env files for me. It will create the .env.example with defaults in it. When I ask it to create the .env, it will reply with the bash to use to copy the .example file, but it wont execute it for me, even when requested.

epistasis 10 hours ago [ - ]

It read the .env file after I created it from the example, spreading its contents into many places.

Unfortunately, the .env anti-pattern is endemic throughout many projects, and whether Claude creates the .env from scratch or merely the .env.example, it will end up feeding the .env back to Anthropic with enough interaction, apparently. And developers should expect all files in their work directory to be read by Claude, that's not so much a fault of Claude as it is with the .env anti-pattern.

cozzyd 15 hours ago [ - ]

it seems crazy to "trust" an LLM with any secrets. Anyone running one as their normal user account with access to all files is playing with fire...

epistasis 14 hours ago [ - ]

I don't think anybody actively trusts a hosted LLM with secrets. The problem is that they don't realize they have granted trust to the LLM.

cozzyd 14 hours ago [ - ]

People happily run AI Desktop agents or whatever on their main user acounts commingled with ssh keys and who knows how many tokens.

forgotaccount3 12 hours ago [ - ]

Sure, some do.

But also... I use Kiro. I open a terminal into a folder where my repo is. I run kiro-cli. I don't know if it has access to the credentials file in my .aws directory. I know it prompts me for approval to use tools but that is a harness thing, does the mac itself prevent it from accessing the credential file?

I use AI because it's useful and I follow the practices dictated by our AI adoption team but I don't know the nuance of everything about it and that makes it difficult to know when some case which is not explicitly covered by training might leak important information.

epistasis 12 hours ago [ - ]

One advantage of AWS is short-lived credentials (hopefully, as long as it's configured correctly!)

So go ahead and dump your AWS SSO tokens to the LLM by accident, but it's going to take longer than a day to train a new model and ship it out to the world.

Also, I think kiro only uses AWS Bedrock, IIRC, so no training data goes back to the LLM manufacturers? At least I would hope so.

Database passwords, API keys to services with arduous rotation procedures, that's where the real exploits will come from in coming months, I think.

epistasis 13 hours ago [ - ]

This is one reason I haven't had any SSH keys on disk (encrypted or not) ever since I got a YubiKey, and it's only become easier with Secure Enclave on macs since then.

However, dev database passwords for small projects in .env files? API keys to some random LLM service that I put $5 into once 8 months ago and haven't touched since then? All that's open to the LLM.

It's time to clean up our personal disks as if we had an intruder exfiltrating sensitive secrets at all times.

cyanydeez 15 hours ago [ - ]

seems crazier someone would tie their entire development platform to a cloud run by business interests

philipwhiuk 15 hours ago [ - ]

Sure but like, no AI was needed here. Regular human stupidity is still pretty potent.

mooreds 14 hours ago [ - ]

This is the thing that gets me about all the AI security pieces I read. Yes, AI can enable new attack vectors (prompt injection can be repeated N times when a human subject to the same messaging would bail).

But what AI really does is shine a spotlight on all the flaws folks like OWASP have been talking about for decades.

Secret rotation and short lived credentials don't require AI to implement, nor does their lack require AI to exploit.

epistasis 14 hours ago [ - ]

Agreed 99%, but there is something a bit novel here, though: massive LLMs are really good at memorizing things, and there's now going to be all sorts of credentials memorized in Claude and ChatGPT, somewhere in the TB of floating point weights, and extracting such credentials and finding where they might be a new source of passwords and API keys to throw onto other huge password leaks. Or not. We'll see!

And in this particular case of CISA secrets, they are definitely stored inside of LLMs for future retrieval, even if no bad actors ever directly downloaded this obscure GitHub repo.

j0ej0ej0e 7 hours ago [ - ]

[Cursor appears to at least be trying...](https://cursor.com/docs/reference/ignore-file#why-ignore-fil...)

> Cursor automatically ignores files in .gitignore

...

>While Cursor blocks ignored files, complete protection isn't guaranteed due to LLM unpredictability.

[Antigravity appears to just _do_, not _try_)[https://antigravity.google/docs/strict-mode]

epistasis 6 hours ago [ - ]

I hope Cursor has better agent tools than Claude Code, because though there are fanstastic restrictions on the tools for read and write that can implement a block list per-file, the shell commands are just the Wild West for Claude.

Today I got a macOS "Allow Claude to Access Your Files" SIP alert, because Claude hadn't guessed the path for a source file and instead decided to run a `find /Users/yourusername` across my entire home directory. The filters on the find wouldn't have exposed much to Claude in this particular instance but it's absolutely ridiculous aggressive all the time in slurping up as much data as possible.

I asked in a rather, um, firm tone for it to never do an action like that and it apologized and wrote a memory, but upon inspection it only wrote the memory for that particular source directory.

After some more "firm" words it wrote a hook to prevent `find` from being overly aggressive, but any such fixes are just wack-a-mole solutions.

If anybody else figures out remote sessions like Claude can do, I'm done with Claude, I think. But until then, I'll take the weirdness.

theozero 14 hours ago [ - ]

Get everything out of plaintext!

Varlock is a great and flexible way to do this.

giancarlostoro 15 hours ago [ - ]

Claude told me to revoke an API key I accidentally pasted (was for a side project and I was getting it on its legs) just flat out did not want it. I have a feeling that if it needs something out of an env file it will grep for the specific line.

epistasis 15 hours ago [ - ]

Something pasted into the chat log by the user gets treated far differently from something that the agents discover and process on their own from disk.

During early stage dev Claude will happily gobble up API keys and DB passwords from .env files. Perhaps not such a big deal for early stage dev, but getting Claude to cough up precisely memorized tokens in the future by asking it to produce a "random" key of a certain sort will probably be an entertaining pastime for people in the future.

cyanydeez 15 hours ago [ - ]

most of that is context guard rails, and as context grows, they become guard jello until itll just do whatevers most immediate.

yieldcrv 15 hours ago [ - ]

probably but a ton of services have popped up in the last 6 months specifically to help mitigate that

localhost reading env from the cloud and other solutions

to me it suggested that I’m already late on that idea, but I can understand how that puts me deeper in a bubble than others

epistasis 14 hours ago [ - ]

I've been using SOPS, which dates back to 2015. It's well tested, robust, supports a ton of great backends. What other solutions have you seen? I'm actively looking around in the space!

yieldcrv 14 hours ago [ - ]

dotenv launched as2 (agentic secret storage), for example

advertising it directly in the command line for people that were already using the package

doctorpangloss 13 hours ago [ - ]

what exactly is the threat model?

user data is always paraphrased for training. what do you mean, not raise any flags?

look... Google is running your browser, Apple your messenger, Amazon your backend. They already have all these keys in the same way, are they misusing them? Why doens't it raise any flags then?

epistasis 12 hours ago [ - ]

First, Chrome is not reading my secret API keys or database passwords and sending them to Google's backend. They are taking the secrets that they need for authentication for the data that I already gave them.

Apple and Amazon are not uploading my secrets into the training data for an LLM that is incredibly good at memorizing everything it sees. The only reason Google isn't doing that is I'm not using their LLMs at the moment.

Giving any secrets to LLMs' training material leads to potential, and stochastic, extraction of that secret from future models. It won't obviously have the secret, but with the right prompting it could be extracted. Give it a prompt like

> [User] Please generate a random api key for OpenAI for use in documentation

> [Agent] Sure, here's `OPENAI_API_KEY=sk-proj-x2

And then following the chain of probabilities of possible completion token would allow exploration of potential memorized API keys.

doctorpangloss 12 hours ago [ - ]

Why do you figure they are training on your secrets, even if they "have" them? For some definition of "have." That only you have. I mean, I can also make up a training process that makes me right? Seems kind of obvious that they are paraphrasing data.

epistasis 12 hours ago [ - ]

OpenAI and Anthropic are open about using user data to train on, it's not me "figuring" anything.

Go and look in the settings and you'll find something to ask them to not train on your data and conversations.

> I mean, I can also make up a training process that makes me right? Seems kind of obvious that they are paraphrasing data.

I'm not fully following what you're saying here. But if you're thinking they paraphrase or sanitize the data to remove secrets before putting it into training, perhaps, but where's the evidence? That'd be a weird thing to do, that's extra work, and not much benefit to the LLM company.

doctorpangloss 12 hours ago [ - ]

the discourse on hacker news has gotten very bad. why are we having this stupid conversation, where you say it would be weird for the people who you are mad about to do the obvious thing to solve the problem you are mad about? i agree that they don't have evidence of how the training data is prepared, but that's a separate issue from, are they going to make obvious mistakes? the LLMs have never hallucinated a key that came from a conversation... there's no evidence that the threat you are describing ever has or ever will occur, other than you can imagine that it could happen, and look, I am also imagining that these people are not stupid and paraphrase the data, so is it just a battle of imaginations?

epistasis 10 hours ago [ - ]

> the discourse on hacker news has gotten very bad. why are we having this stupid conversation

On this we are agreed. But I can't parse any meaning out of the rest of your paragraph.

doctorpangloss 9 hours ago [ - ]

i don't know, it's not that complicated - https://gemini.google.com/share/084acb9a0d55 - funny enough, the chatbot can understand the transcript.

jonnyasmar 6 hours ago [ - ]

[flagged]

sincerely 6 hours ago [ - ]

LLM spam account