I agree, LLM's are capable of doing this right out of the box if you provide it grounding data like current time and a few other things in the system prompt. Its really odd that this is getting any attention.

You guys are so funny, when papers like these exist: https://arxiv.org/abs/2404.11757

Numerous research, INCLUDING the OpenTSLM paper has PROVEN they are NOT able to do this out of the box. Did you even check out the results at all? They literally compare OpenTSLM against standard text only baselines. Gemma3-270M performs better than GPT-4o using tokenized time series alone. Thus, I guess you guys are being ironic.

I understand how annoying it is when people post shallow dismissals of your work on the internet, but please don't give in to the annoyance when replying. It makes the thread worse, and it's against the HN guidelines: https://news.ycombinator.com/newsguidelines.html.

I don't know if this is your work or not, but I appreciate your wanting to defend it...we just need you to do that in a way that doesn't attack others, no matter how wrong they are or you feel they are. Easier said than done of course, but we're all working on it together.

An experiment is not a proof.

If this is the level of one of the contributors to the OpenTSLM paper (which you very obviously are), no wonder due diligence wasn't done properly.

It’s less about proof and more about demonstrating a new capability that TSLMs enable. To be fair, the paper did test standard LLMs, which consistently underperformed. @iLoveOncall, can you point to examples where out of the box models achieved good results on multiple time-series? Also, what kind of time-series data did you analyze with Claude 3.5? What exactly did you predict, and how did you assess reasoning capabilities?

[flagged]