accurate memory estimation is key here. it will crash if that accurate and it cant be generic for all local llm. each local llm has different context estimates.