Hacker News

The appeal of a single queryable database for personal data makes a lot of sense - the hard part is always the ingestion and normalization, not the storage.

One pattern I have seen work well for the business version of this: a "company intelligence" database where everything known about a prospect company gets accumulated in one place over time. Homepage content, job postings, news mentions, funding history, tech stack signals, all deduplicated and queryable.

The challenge on the B2B side is the same as personal data: the data comes in from 8 different sources in 8 different formats, often with conflicts (two sources disagree on headcount, three sources have different founding dates). Your approach of controlling the schema from the start rather than trying to normalize later is the right call. Schema drift is what kills most long-term data projects.

What storage engine are you using? And how do you handle temporal data - do you snapshot state over time or just keep the latest version of each entity?