Absolutely not.

500-1000ms is borderline acceptable.

Sub-300ms is closer to SOTA.

2000ms or more means people will hang up.

play "Just a second, one moment please <sounds of typing>".wave as soon as input goes quiet.

ChatGPT app has a audio version of the spinner icon when you ask it a question and it needs a second before answering.

I haaaaate the fake typing noises.

Alternate between that and

    play "ehh".wav

160ms is essentially optimal and you can get down to about 200ms AFAIK.

With what system?