Can you mention what inference stack you're using? I've tried MTP several times with that model and it always seems to significantly cut my token generation speed from ~60 tokens/sec to ~40 (M3 Max).