I may be wrong here, but blog-post seems AI written, with repetition of sequences like "the inference pipeline was rebuilt using architecture-aware fused kernels, optimized scheduling, and dis-aggregated serving". I don't know what that means without some code and proper context.

Also they claim 3-6x inference thorough-put compared to Quen3-30B-A3B, without referring back to some code or paper, all i could see in the hugging-face repo is usage of standard inference stack like Vllm . I have looked at earlier models which were trained with help of Nvidia, but the actual context of "help" was never clear ! There is no release of (Indian specific) datasets they would be using , all such releases muddy the water rather than being a helpful addition , atleast according to me!

Disagree, the post makes punctuation mistakes that only an Indian can make. So does your own comment.

Not a given. We've already seen LLMs that got SFT'd by "national teams" adopt ESL speech patterns.

They won’t make punctuation mistakes though.

Wouldn't they do exactly that if they were trained on enough text with punctuation mistakes?

No because of post training

[deleted]