Hacker News

I tested the system LLM with a long article using two prompts: one asking for a summary in at most 20 words, and another asking for a one-sentence summary. In both cases, the model followed the instructions correctly. Regarding your second point in the link above: maximumResponseTokens: 500 corresponds to roughly 1,500–2,000 characters in English. For the AFM tokenizer, a token typically represents 3–4 characters. Could it be why you are getting large outputs? If you share your prompt(s), we’d be happy to take a closer look. You can reach us on Slack, Discord, or privately at root@mi12.dev