You can scale down as much as you want. You don't need to run full relay if you want to follow only a dozen of accounts. I bet you can run something like that on a raspberry pi or something similar. You will not get the search over all of the network, but that's something you don't get with your personal mastodon instances either.

Wouldn't you then not be able to see replies from anyone besides the dozen accounts your relay follows too? If I run a personal Mastodon instance and someone replies to one of my posts, their instance will send it directly to mine and I'll see it. My understanding of the ATProto architecture is that it doesn't support directed messaging like that.

The cost for consuming the firehose of the entire network is very low. So the actual cost that can blow up is storage and computation.

If you want to filter for events based on some heuristic (e.g. only from follows of server list), you can do that. You can then specialize that further. E.g. for ongoing threads that already pass your filter, you could add their IDs to an array, and accept all replies for those threads as well into your DB.

You already get a stream of everything so you can scale down what you write to DB to exactly the characteristics you need. Including keeping threads cohesive.