You don't need specially trained LLMs for this. My team has been using successfuly Claude 3.5 for a year for the purpose of analyzing huge time series data sets (close to the max context window), without anything special beyond a prompt describing the task at hand.
I agree, LLM's are capable of doing this right out of the box if you provide it grounding data like current time and a few other things in the system prompt. Its really odd that this is getting any attention.
You guys are so funny, when papers like these exist: https://arxiv.org/abs/2404.11757
Numerous research, INCLUDING the OpenTSLM paper has PROVEN they are NOT able to do this out of the box. Did you even check out the results at all? They literally compare OpenTSLM against standard text only baselines. Gemma3-270M performs better than GPT-4o using tokenized time series alone. Thus, I guess you guys are being ironic.
I understand how annoying it is when people post shallow dismissals of your work on the internet, but please don't give in to the annoyance when replying. It makes the thread worse, and it's against the HN guidelines: https://news.ycombinator.com/newsguidelines.html.
I don't know if this is your work or not, but I appreciate your wanting to defend it...we just need you to do that in a way that doesn't attack others, no matter how wrong they are or you feel they are. Easier said than done of course, but we're all working on it together.
An experiment is not a proof.
If this is the level of one of the contributors to the OpenTSLM paper (which you very obviously are), no wonder due diligence wasn't done properly.
It’s less about proof and more about demonstrating a new capability that TSLMs enable. To be fair, the paper did test standard LLMs, which consistently underperformed. @iLoveOncall, can you point to examples where out of the box models achieved good results on multiple time-series? Also, what kind of time-series data did you analyze with Claude 3.5? What exactly did you predict, and how did you assess reasoning capabilities?
[flagged]
This sounds very interesting, would you be able to share a little more about your process? What works and what doesn't?
Unfortunately not really, but we've found (and used in production for a year) that Claude 3.5 is perfectly capable of identifying anomalies or other points of interests in very large sets of time series data.
Think of 100-200K worth of tokens formatted like this:
<Entity1>-<Entity2> <Dimension> <ISO 8601 time> <value>
<Entity1>-<Entity2> <Dimension> <ISO 8601 time +1> <value>
<Entity1>-<Entity2> <Dimension> <ISO 8601 time +2> <value>
<Entity1>-<Entity2> <Dimension2> <ISO 8601 time> <value>
<Entity1>-<Entity2> <Dimension2> <ISO 8601 time +1> <value>
The only pre-filtering we do is eliminate "obviously non relevant" data, such as series where the value is completely flat the whole time, but this was done to add more data to the context, not because Claude struggled with it (it doesn't).