Hacker News

This makes a good benchmark LLMs:

``` look at this paper: https://arxiv.org/pdf/2603.21852

now please produce 2x+y as a composition on EMLs ```

Opus(paid) - claimed that "2" is circular. Once I told it that ChatGPT have already done this, finished successfully.

ChatGPT(free) - did it from the first try.

Grok - produced estimation of the depth of the formula.

Gemini - success

Deepseek - Assumed some pre-existing knowledge on what EML is. Unable to fetch the pdf from the link, unable to consume pdf from "Attach file"

Kimi - produced long output, stopped and asked to upgrade

GLM - looks ok

fc417fc802 4 days ago [ - ]

> Once I told it that ChatGPT have already done this, finished successfully.

TIL you can taunt LLMs. I guess they exhibit more competitive spirit than I thought.

varispeed 4 days ago [ - ]

Opus seems to be wired currently to get you to spend more money. Once you tell it "Stop defrauding me, just get to the right solution" it often gets it.

RALaBarge 4 days ago [ - ]

I am like "Yeah ok, use the Arcee Trinity models!" and its like, you got it boss, 3 opus agents in parallel, got it!

nurettin 4 days ago [ - ]

I always start the chat with "we have been going in circles" before giving any context.

DoctorOetker 4 days ago [ - ]

[dead]

eru 4 days ago [ - ]

I copy and pasted the abstract into DeepSeek and asked your question. It's a bit unfair to penalise it for not knowing PDFs.

It got a result.

stared 4 days ago [ - ]

If you like creating such things, consider contributing to Terminal Bench Science, https://www.tbench.ai/news/tb-science-announcement.

theanonymousone 4 days ago [ - ]

I changed the prompt to this:

""" Consider a mathematical function EML defined as `eml(x,y)=exp(x)−ln(y)`

Please produce `sin(x)/x` as a composition on EMLs and constant number 1 (one). """

XCSme 4 days ago [ - ]

[dead]

brrrrrm 4 days ago [ - ]

meta.ai in instant mode gets it first try too (I think?)

``` 2x + y = \operatorname{eml}\Big(1,\; \operatorname{eml}\big(\operatorname{eml}(1,\; \operatorname{eml}(\operatorname{eml}(1,\; \operatorname{eml}(\operatorname{eml}(L_2 + L_x, 1), 1) \cdot \operatorname{eml}(y,1)),1)\big),1\big)\Big) ```

for me Gemini hallucinated EML to mean something else despite the paper link being provided: "elementary mathematical layers"

aixpert 3 days ago [ - ]

this should be a tangential proof for the dying bunch of people who still believe that LLMs are just parrots. EML are literally a new invention

spuz 4 days ago [ - ]

So what is the correct answer?