I've found mostly for context reasons its better to just have a grand overview of the systems and how they work together and feed that to the agent as context, it will use the additional files it touches to expand its understanding if you prompt well.

Does this essentially give the companies controlling these models access to our source code? That is, it goes into training future versions of the model?

Depends on the privacy practices of the people hosting the models, or are providing access to them. Most have an 'opt-out' of helping to train models.