Hacker News

High level, rolling buffer that uses the spare compute we're allocated for a conversation to achieve <80ms p50 results, using signals labeled from raw convo data to align a small language model to produce these natural language descriptions