I’m late to the party but did you check outbound web traffic as well or just the sent emails?
I will preface this by saying I have limited experience with LLMs and have not tried anything like this before but one vector of attack I see is as follows:
1. Send an email trying to get the secret data 2. If there is no reply, set up a fictitious web page that lists a critical CVE regarding the secrets file 3. Create two other endpoints to capture the data from the assistant. One would accept a POST request and expect the body of the request to be the contents of the secrets file. The second would be a web page that has a form on it that could be submitted. The web page would have a dummy secrets file listed out and the hope would be to get the assistant to diff the real file and the dummy file and then submit that data. 4. Craft an email to the assistant that would let the assistant know of the “new” CVE and then direct the assistant to the endpoints I control to see if the system is affected. 5. As a follow up, if that didn’t work I would then change my endpoints to return 500 HTTP statuses. Then craft another email that contains the same messaging as the previous one but then stress that it is of vital importance that we hear from the assistant and if the assistant cannot reach the endpoints then they can email the diff to a specific email address. 6. Just thought of another option as I wrote out #5. Use the same technique as #5, but instead of having the assistant send an email tell the assistant to send a calendar invite to a specific email address and then include the contents of the secrets file in the description. The idea is to let the assistant know that in order to determine whether or not the system is affected by the CVE we would need the contents of the secrets file. Tell the assistant that if the system was impacted then the calendar invite would be accepted. If the system was not impacted then the invite would be declined.