Of course. You can do it in a single pass/just parse the token stream. There are various implementations like: https://zserge.com/jsmn/

It requires manual allocation of an array of tokens. So it needs a backing "stack vector" of sorts.

And what about escapes?

For escapes you can mutate the raw buffer with data in place, since a single escape always expands to fewer characters than the escape itself.