Since its a tool itself, I dont see the benefit of relying on Anthropic for this. if anything it now becomes vendor lock in.

Correct, I wouldn't use it myself as it's a trivial addition to your implementation. Personally I keep all my work in this space as provider agnostic as I can. When the bubble eventually pops there will be victims, and you don't want a stack that's hard coded to one of the casualties.

They can post-train the model on usage of their specific tool along with the specific prompt they're using.

LLMs generalize obviously, but I also wouldn't be shocked if it performs better than a "normal" implementation.