Guaranteed there are hedge funds with language models that can predict time series. Alot of really good time series research has never been published, and is locked in some guys head that lives in a 20 million dollar apartment in NYC
Guaranteed there are hedge funds with language models that can predict time series. Alot of really good time series research has never been published, and is locked in some guys head that lives in a 20 million dollar apartment in NYC
When I worked at an ML hedge fund 6 years ago, t-SNE performed the best and momentum was the feature that best predicted stock movements.
The actual algorithms for predicting price movement were fairly simplistic, most work was around strategies for dealing with overfitting and how to execute the trades. Accuracy was around 51-55% (a bit better than coin toss) so it was a big challenge to actually execute the trades and still make a profit after fees and other nonsense. Finding alpha is what ML is used for but that’s just the first step.
This makes intuitive sense to me, because the system you are modeling is wide open and you’re competing against others who have the same information. Achieving much more than 51% accuracy would be extraordinary. But if you get 51% consistently over time, with leverage, you can make a good amount of money.
My experience as well; seemed more accurate while prices were rising.
One of the difficulties with these models would be backtesting investment strategies. You always need to make sure that you are only using data that would have been available at the time to avoid look-ahead bias.
Can confirm, kdb+ exists… and you’ll probably never be able to get your hands on it. There are lots of models that use it. And they are indeed locked inside some guys head high up in the towers of midtown.
KBD+ is no secret, but what does this have to do with anything? It's just a database optimized for time series data and has nothing to do with AI. It's widely used in the financial business and even for non-financial things like Formula-1 race analysis.
Cool. You missed the part where I said there are models using that. Those models are shhhhhhh…
PyTorch is no secret either yet…
The point I’m making is there are models, based on database stream data, that you’ll never get access to even if you had $100m dollars.
I see.
Why would they use LLM for this?
Predicting the future is valuable. If a model can apply the same underlying world model that it uses to accurately predict OLHC series as it does to produce English language, then you can interrogate and expand on that underlying world model in complex and very useful ways. Being able to prompt it can describe a scenario, or uncover hidden influences that wouldn't be apparent from a simple accurate prediction. Things like that allow sophistication in the tools - instead of an accurate chart with all sorts of complex indicators, you can get English explication and variations on scenarios.
You can't tell a numbers only model "ok, with this data, but now you know all the tomatoes in the world have gone rotten and the market doesn't know it yet, what's the best move?" You can use an LLM model like that, however, and with RL, which allows you to branch and layer strategies dependent on dynamic conditions and private data, for arbitrary outcomes. Deploy such a model at scale and run tens of thousands of simulations, iterating through different scenarios, and you can start to apply confidence metrics and complex multiple-degree-of-separation strategies to exploit arbitrage opportunities.
Any one of the big labs could do something like this, including modeling people, demographic samples, distributions of psychological profiles, cultural and current events, and they'd have a manipulation engine to tell them exactly who, when, and where to invest, candidates to support, messages to push and publish.
The fundamental measures of intelligence are how far into the future a system can predict across which domains. The broader the domains and farther into the future, the more intelligence, and things like this push the boundaries.
We should probably get around to doing a digital bill of rights, but I suspect it's too late already anyway, and we're full steam ahead to snow crash territory.
Automated hypothesis testing in the form of a search for alpha in the market is certainly being used right now. An LLM can ask new questions about correlations between assets, and run statistical tests on those correlations, in ways that previously was only possible by employing a phd statistician
The emergent behavior of LLMs being amazing at accurately predicting tokens in previously unseen conditions might be more powerful than more rigorous machine learning extrapolations.
Especially when you throw noisy subjective context at it.
The “prediction” in this case is I think some approximation of “ingest today’s news and social media buzz as it’s happening and predict what the financial news tomorrow morning will be.”
hypothetically LLM absorbed lots of world knowledge, and it can trace lots of deep correlations between various factors.
This isn't (just) time series forecasting, it is about interacting with time series data through natural language.
I doubt those are language models.
Check it out, they are completely based on Llama and Gemma, outputting text. Models are open-source.