It means applying specific rules about how text can be generated. For example, generating valid json reliably. Currently we use constrained decoding to accomplish this (e.g. the next token must be one of three valid options).
Now you can imagine giving an LLM arbitrary validity rules for generating text. I think that’s what they mean by “grammar”.