This seems very cool. Was going to suggest submitting it, but I see there was a fairly popular thread 5 months ago for anyone interested: https://news.ycombinator.com/item?id=45492803
The biggest savings for a service like GMail are going to be based around deduplication - e.g. if you can recognize that a newsletter went out to a thousand subscribers and store those all as deltas from a "canonical" copy - congratulations, that's >1000:1 compression, better than you could achieve with any general-purpose compression. Similarly, if you can recognize that an email is an Amazon shipping confirmation or a Facebook message notification or some other commonly repeated "form letter", you can achieve huge savings by factoring out all the common elements in them, like images or stylesheets.
This seems very cool. Was going to suggest submitting it, but I see there was a fairly popular thread 5 months ago for anyone interested: https://news.ycombinator.com/item?id=45492803
That is an interesting link.
Does gmail use a special codec for storing emails ?
The biggest savings for a service like GMail are going to be based around deduplication - e.g. if you can recognize that a newsletter went out to a thousand subscribers and store those all as deltas from a "canonical" copy - congratulations, that's >1000:1 compression, better than you could achieve with any general-purpose compression. Similarly, if you can recognize that an email is an Amazon shipping confirmation or a Facebook message notification or some other commonly repeated "form letter", you can achieve huge savings by factoring out all the common elements in them, like images or stylesheets.