For UI purposes, sub-150ms animations can be very effective as "pro" interface behaviours. That's close to our best reaction time [1]. Good UI personality doesn't have to get in the way of pro-level efficiency.
One of the ways to achieve this is to not actually transition between states, but simply animate the "end bounce" of an introduced element, as if it was eased into position. So not actually slid from the left, for example, but rebounding the last few pixels from an imaginary slide. Our eyes just draw their conclusions to inform us of a movement, and in exchange the component is readable and usable immediately.
[1] ~100ms represents optimal reflex time in recent research. [2] Anything that requires user attention to interact after the component appears is very comfortable with a 150ms transition. One important note is that for components you can navigate across (i.e. one key shortcut invokes a modal state, another key runs a command in that modal), experienced users will "type" consecutive shortcuts in one go, and you must have the second behaviour responsive from frame 1.
[2] Some athletes seem to train down to ~80ms on very specific reflexes, which recently lead to race-start controversies when block timers disqualify sub-100ms reactions for runners.
Reaction time is unrelated to perceptible latency. You're not reacting to things; you are seeing the result of an action you requested. You already know it's coming. To say that delays less than your reaction time don't matter is like saying it doesn't matter if your flight is delayed by an hour because it takes 8 hours to cross the Atlantic.
Watching your own hand movements through your phone camera is a good demonstration of this. Set 60 Hz video mode, and the latency is probably less than 30 ms - but still extremely obvious.
it's quite a lot more than 30ms, as phone cameras do some real heavy-weight image processing to compensate for their tiny size, I'm talking neural networks and such. the throughput might be 60fps when it's all conveyor-ed but the latency sure isn't
> ~100ms represents optimal [human] reflex time in recent research.
For unpredictable inputs. Intervals between a human own actions or discrepancies in delays between successive external events can be effected or perceived with significantly greater precision, especially for people with e.g. music training, especially for percussionists. I’d bet on somewhere between one and two orders of magnitude more precision, that is single-digit milliseconds, at higher skill levels. (Chopin’s Fantaisie-Impromptu is among the easier rhythm-based parlour tricks and already requires staying below ~30ms of error. Alternatively, a single frame at 60fps is 17ms, and speedrunners can hit single frames of a game pretty reliably.)
If the animations are effectively 'cancellable', i.e. they don't block input or delay the change in state, this can be reasonable. You can put in a sequence of actions into a UI at a much faster pace than 100ms, if you have the muscle memory for it.