My first take is that you could have 10 TB of logs with just a few unique lines that are actually interesting. So I am not thinking "Wow, what impressive big data you have there" but rather "if you have an accuracy of 1-10^-6 you are still are overwhelmed with false positives" or "I hope your daddy is paying for your tokens"

Mendral co-founder and post author here.

I agree with your statement and explained in a few other comments how we're doing this.

tldr:

- Something happens that needs investigating

- Main (Opus) agent makes focused plan and spawns sub agents (Haiku)

- They use ClickHouse queries to grab only relevant pieces of logs and return summaries/patterns

This is what you would do manually: you're not going to read through 10 TB of logs when something happens; you make a plan, open a few tabs and start doing narrow, focused searches.

Yeah this is my experience with logs data. You only actually care about O(10) lines per query, usually related by some correlation ID. Or, instead of searching you're summarizing by counting things. In that case, actually counting is important ;).

In this piece though--and maybe I need to read it again--I was under the impression that the LLM's "interface" to the logs data is queries against clickhouse. So long as the queries return sensibly limited results, and it doesn't go wild with the queries, that could address both concerns?

What does O(10) mean?

Mathematically, it means that the number of lines read is bounded by 10*M, where M is some constant. So it's basically equivalent to saying that it's O(1).

I'm guessing that intention was to say "around 10 lines", though it kind of stretches the definition if we're being picky.

I normally see that from engineers using "O(x)" as "approximately x" whenever it's clear from context that you're not actually talking about asymptomatic complexity.

I've always thought it was like this, maybe I'm wrong:

O(some constant) -- "nearby" that constant (maybe "order of magnitude" or whatever is contextually convenient)

O(some parameter) -- denotes the asymptotic behavior of some parametrized process

O(some variable representing a small number) -- denotes the negligible part of something that you're deciding you don't have to care about--error terms with exponent larger than 2 for example

Those last two notations are, formally, the same. To call a part negligible, we say it's asymptotically bounded above by a constant multiple of this expression, which obviously goes away as we approach the limit. The first one is a colloquial alternative definition that would probably be considered "wrong" in formal writing.

Agreed

I think the O means order of magnitude. It looks like Big O notation, but O(10) would collapse to O(1) and OP is not talking about efficiency anyway.

"about 10"