did you build your own model? if not, which model performs the best so far?

No we didn't build one, we use the main foundation models. We have evals for each part of the workflow and different models perform better on different tasks, overall the majority of it uses Sonnet 4.