The database-per-tenant angle resonates. I hit a similar tradeoff building a financial data API—highly uneven access patterns where most entities are rarely queried but a few hundred get hit constantly.
We went a different direction: BigQuery for durable storage, but an in-memory layer that pre-loads the "gold" aggregated tables (~26MB of pre-computed data for ~4,000 companies) at startup. Cold queries fall back to BigQuery directly; warm queries skip it entirely. Reduced our query costs by ~88%, though that's partly a function of table size being small enough to fit in memory.
The scans-are-slow limitation you mention is where we'd still hit BigQuery directly anyway—full text search across 9M+ document sections can't be cached that way. So there's probably a hybrid approach where hot structured data lives in-memory and raw text stays in the warehouse.
What's your strategy for tables that grow beyond what fits in a memory budget? Does the page-group layout still help when you can't fit the whole table—do you see a way to do partial pre-loading by access frequency?