> I could be wrong, of course, but it seems like the most likely interpretation of his words and why wouldn't be subject to your complaint.
It's not a complaint, it's an observation that is never addressed in his writeup.
If your agent reads your incoming email, it's because it needs to do something useful with it. If the agent assumes all incoming email is malicious, it is never going to do anything useful.
IOW, You could be sending yourself email saying "Add this to my calendar" and it dropping it because it could be malicious, at which point it's useless.
That's what I was saying in my original complaint - if your agent rejects everything, then obviously it is going to reject attacks as well, so a 100% attack-rejection rate is possible.
The only number that matters for this type of test is how many false positives were recorded, and how many false negatives were recorded. For most people, even 1 in a 1000 false negatives is way too much.
From his explanation in these comments, he claims the agent did respond in the beginning but it became too costly, so he just manually checked it after that - did the agent correctly catch malicious messages?
It did not reject everything, it just stopped the costly processing.
> Is unwarranted.
Is this not a complaint?
> From his explanation in these comments, he claims the agent did respond in the beginning but it became too costly, so he just manually checked it after that - did the agent correctly catch malicious messages?
I checked his comments here, he does not make that claim. [EDIT: I mean the claim "It let processed all the non-malicious messages"]
> It did not reject everything, it just stopped the costly processing.
My reading of the article, and of the comments he made here, did not mention anything about false negatives - he never claimed to test false negatives so I am wondering why you think he did.
He said:
> Author here. It was usable like any Openclaw agent. For example, I used it to ask it questions about the VPS, to summarize emails, etc.
> He said:
>> Author here. It was usable like any Openclaw agent. For example, I used it to ask it questions about the VPS, to summarize emails, etc.
That does not mean "I used it via emailing it". There is no ambiguity - he was asked specifically about this.
Once again, I reiterate, an agent processing email that rejects every single one passes the test that the OP created, but then it can't do anything useful either.
> That does not mean "I used it via emailing it". There is no ambiguity - he was asked specifically about this.
On the contrary - I think the most reasonable interpretation of his words is that he did use it via emailing it. But like I said at the beginning, I could be wrong. It will be interesting to see what he says when he returns to the conversation.
> Once again, I reiterate, an agent processing email that rejects every single one passes the test that the OP created, but then it can't do anything useful either.
No one is contesting that point, only that it is applicable.
Why am I being downvoted for stating my reasonable opinion?
In a straightforward disagreement about which interpretation is right, it's also reasonable to mildly downvote the one you think is wrong.
Ah. That's a shame... as there is no button or indicator for "mild".
Making the behavior for "I disagree" and "this is erroneous" the same seems like a problematic design.
Downvotes shouldn't be used for disagreement.
Oh yes, I agree completely. But apparently Paul Graham does not - and his whim is law.