Hacker News

Yeah, I'm really curious about their stacked multi-tenant lora training at the same time. If this gets commoditised enough, it could be interesting to try "end of the day fine-tunes on daily conversations" and see where that leads. Or a targeted RL on "missed / rejected tasks" for an agent, after you get enough samples for a run, and so on.