Hacker News

I love the idea of personal data storage and I want it to be the default, but I think there are some possibly insurmountable technical problems. This article doesn't mention schema once, and schemas make seamless data portability virtually impossible. I've spent a week making sure a simple CRUD app could change a string field to a UUID field without causing any outage or bugs.

You can export your data from Google or Facebook today, but then you need to write a copy of the source UI that faithfully replicates the way all those data fields are supposed to display. And tomorrow the source makes a change so what used to be one field is now two fields, oh and they also removed another field entirely so that data is just gone. Well, in future dumps anyway. Are you going to use the old schema or the new schema for your display? Is it possible to do both?

When everything is in data silos, you can freely and safely change data format, which is something that needs to happen a lot as applications evolve. Even in a data silo, doing this is pretty tricky and bugs and data loss are significant risks. If you're trying to sync between an unbounded number of data repositories where each repository has potentially conflicting relationships with the data schema, data loss is practically assured.

Another big problem is schema permissions and identity. I might have some piece of data that says "person A is allowed to see this set of fields" and another piece that says "person A is blocked from seeing this other set of fields." This gets synced to 3 different servers, one of those servers has no idea that userA is in fact person A. So you fail closed, but then the data on that server practically does not exist if the goal of this data repository is sharing some data with person A. You really can't do any sort of fine-grained access controls in a system where trust/identity/auditing is decentralized.