Yes, they can. All Bluesky data (and ATProto in general) is publicly available.

It is already happening and nothing can be done against it at a protocol level: https://mashable.com/article/bluesky-ai-dataset-using-one-mi...