I understand this provides a way to interact with ts data via natural language, but is there any benefit to this over tool calling to a library that uses signal processing and/or rule based algos (or using machine learning if the data is noisy/variable)?
For example, you ask an off-the-shelf LLM to analyze your ECG data. The LLM uses a tool to call out to your ECG ts analysis library. The library iterates over the data and finds stats & ECG events. It returns something like "Average heart rate: 60bpm, AFib detected at <time>, etc...". The LLM has all the info it needs to give an accurate analysis at a fraction of computational cost.
On top of that, this requires a large annotated dataset and a pre-trained model. And correct me if I'm wrong, but I don't think it's possible to have a "general" model that could handle arbitrary time series data. I.e. a model that is trained on ECG data would not be compatible with stock market data. And there isn't a way to have a model that understands both stock market data and ECG data.
You couldn’t run that on the edge though .
The point is to be reliably run it on the edge , nobody sane would want their heart rate monitor to be run via the cloud with the uptimes and reliability that come that would come with any remote service plus the extra challenges of llm inference .
The goal would be running on the edge in addition to standard rules based detection which already these machines have and add advanced pattern detection that llms can provide to reduce alert fatigue and also detect new class of complex patterns which these sensors typically don’t.
> advanced pattern detection... detect new class of complex patterns
This sounds great and all, but it's wishful thinking. There isn't anything in this supporting that it's able to find any meaningful patterns beyond existing solutions (i.e. standard rules based detection/machine learning as mentioned above).
What they've essentially done is taken a dataset in which each report was "annotated with a report string (generated by cardiologist or automatic interpretation by ECG-device)" [1] and used it with a series of templates (i.e. questions to ask the llm) from the ECG-QA paper [2] to fine-tune a model to achieve 65% accuracy with solely pattern recognition and 85% accuracy with pattern+clinical context (i.e. patient history).
The 42 template questions they used (as mentioned in 4.1 in the paper) can each be evaluated deterministically via code and retrieved via a tool call for any llm to parse. And I argue that the results would be the same, if not better, for a fraction of the cost. Doing calculations like this on time series data is very very quick. A couple ms at most. I don't see why this couldn't be run on the edge.
Plus, Table 9 shows this thing takes a minimum of 7GB of ram usage with a 270m parameter model and ~15-20GB for a 1B model. I don't see how this could be run on the edge considering most phones have 6-8GB of ram.
[1]: https://physionet.org/content/ptb-xl/1.0.3/ [2]: https://arxiv.org/pdf/2306.15681
Thank you for the detailed breakdown with reference.
I just wanted to show what would be the motivation in this line of research of building fine tuned light-weight foundation models like this , I didn’t mean to imply this paper already achieves those goals.
The tech and hardware is not yet ready as you point out both in terms of performance and what it can actually do currently , but the key thing to be excited about is that gap is within the realm of possibility to close in next few years with the right funding.
I understand this provides a conversation interface for interacting with internet scale data (ChatGPT), but is there any benefit to this over searching in Google then clicking on the top link, (avoiding the ad) clicking accept my cookies, reading the header, scrolling down, Xing out of premium subscription, reading rest of article, repeat for the 4 next links?
Ok bro.
> Limitations: ... Finally, while we report strong results on individual datasets, we have not yet demonstrated generalization to unseen data, an essential step toward general TSLMs.
From the paper itself...
Imagine you asked ChatGPT a question but it could only give you answers from a single blog.