Hi HN,
I’ve been working on a small research prototype called Epistemic Motion Engine (EME).
The idea is simple: instead of treating an LLM answer as a verdict, treat reasoning as a process you can observe.
Given a prompt (decision, claim, plan, or messy situation), EME runs the model through a sequence of small, controlled perturbations: assumption stress-tests, counterfactual shifts, and alternating consolidation vs. challenge. It records what stays stable, what breaks, where uncertainty grows, and what evidence would actually change the direction.
The output is a trace you can inspect. There are no hard-coded reasoning rules and it’s model-agnostic. The goal is not to “improve” answers, but to make uncertainty and load-bearing assumptions visible.
This is an early research prototype, not a product. I’m especially interested in failure modes: where this adds signal vs. where it’s just noise or model artifacts.
Public demo (no login, uses real models): https://eme.eagma.com
I’d appreciate blunt technical feedback, especially from people working on evals, interpretability, or reasoning under uncertainty.
so its a diary
ai cannot reason. it is not human
using a posh word like, perturbations, which really means anxiety or uneasiness, or a state of agitation, cannot be attributed to something that is not human (ai) and comes across to me a deceptive.
if you are going to sell this stuff, at least have the common decency and courtesy to use computer language when explaining what your ai can do, and not language that is only fit for humans.
Thanks for the pushback.
I’m not claiming human reasoning or feelings here. The model generates tokens; “reasoning” only happens in the reader’s head when you interpret the text. EME doesn’t inject any reasoning rules either. It just runs the same model multiple times under small, controlled input transformations (assumption flips, counterfactual constraints, consolidate vs. challenge) and logs the deltas.
What’s useful (when it is useful) is the comparative structure across runs: what stays stable, what flips, and which assumptions look load-bearing.
If you think this is just a diary/log, it’s super easy to test. Paste a real problem you actually care about right now (decision, plan, argument) and see whether the trace adds anything beyond a one-shot answer. Public demo, no login: https://eme.eagma.com