Mind me asking what your thoughts are on the overall quality of Apple's on-device LLMs? I've found that LanguageModelSession always returns very lengthy responses:
https://developer.apple.com/forums/thread/789182?answerId=85...
Mind me asking what your thoughts are on the overall quality of Apple's on-device LLMs? I've found that LanguageModelSession always returns very lengthy responses:
https://developer.apple.com/forums/thread/789182?answerId=85...
I tested the system LLM with a long article using two prompts: one asking for a summary in at most 20 words, and another asking for a one-sentence summary. In both cases, the model followed the instructions correctly. Regarding your second point in the link above: maximumResponseTokens: 500 corresponds to roughly 1,500–2,000 characters in English. For the AFM tokenizer, a token typically represents 3–4 characters. Could it be why you are getting large outputs? If you share your prompt(s), we’d be happy to take a closer look. You can reach us on Slack, Discord, or privately at root@mi12.dev