Speaking as long-term (>15 years) user of Postgres in science, I am getting worried about the lack of columnar type of storage in Postgresql. As the datasets become bigger and bigger, the limitations of PG's storage are becoming more and more significant. I know there are various extensions (i.e. cetus) that may offer such functionality, but then you depend on that extension being supported in the future, as well additional complexity.

A bit of a shameless plug, but I've been working on this for a few months in the form of an extension: https://github.com/xataio/deltax

When I started I thought there's too much inherent overhead in using Postgres tables for storage and using the Postgres executor, so figured it would be pretty cool to match Timescale in performance. I didn't think it would be possible to get close to dedicated analytical DBs. But as the project progressed and the performance got better and better, I'm now firmly in the camp of doing analytics with Postgres + an extension.

Thanks for sharing. That looks promising, good luck!

You might be using the wrong database if that’s what you’re hoping for. Columnar databases are a separate category.

It’s like saying that you’re getting worried Apple doesn’t sell washing machines.

Postgres's whole origin story is basically to think outside the box and don't be constrained by existing thinking. Stonebraker thought existing databases were too limited in terms of their data types and expressiveness. He started Postgres as an evolution of Ingres (Postgres = Post Ingres) that added rich custom data types and a rewriting system based on rules.

Columnar and all the other fun stuff (JSON, GIS, inverted indexes, embedding vectors) is a natural progression of that thinking. With TimescaleDB, Hydra, Citus, pg_mooncake, etc. becoming very popular the last few years, there is a clear demand for an integrated experience.

(Stonebraker also thought one database shouldn't do everything, as described in his early 2000s "One Size Does Not Fit All" paper, and Stonebraker branched out into HStore/Vertica for columnar. In hindsight, I think that was appropriate for the time, but no longer a significant concern.)

> You might be using the wrong database if that’s what you’re hoping for. Columnar databases are a separate category.

I don't want a columnar database, I have a few tables which would be excellent fits for columnar storage. And the relational model doesn't say anything about the way data is stored.

PostgreSQL was a right tool for my task for many years. It is a question for PostgreSQL can adapt to a new reality of much bigger datasets or I have to switch to a new tool. And I am not the only user of Postgresql in this context. So it is easy to say in vacuum 'you are using the wrong database', but it's not something that can be easily changed with 100s of Tb of data, existing user workflows etc.

Columnar vs row based is a large architectural distinction. In theory it'd be great to support both, but this has also been an elusive and complex problem to solve. You're not going to see a simple "columnar type" on a row based system like this any time soon, that's trivializing the situation.

Andy Pavlo teaches about this and has done real work in this space, and certainly considers it to be non-trivial. His YouTube channel is spicy and worth looking into if you want to dive deep on the topic.

"Not something that can be easily changed" is not a valid rebuttal to "the tool no longer supports how the task I am working on has evolved over many years". This isn't 'saying in a vacuum'; 'has this tool ceased to be the right answer and how do I move to a better solution' is an exercise responsible app owners should engage in all the time. Shouldn't be different than things I presume your job touches on: if your research evolves to need to see in radio waves, do you complain the optical telescope you started with back in the day doesn't see radio waves and expect someone else to make that happen, or do you find a radio telescope?

You can either 1) accept the limits of the tool, at least until it becomes a priority for the tool developer to add whatever feature you want, 2) pay someone to develop the features you want, or 3) buck up and do what vast numbers of ops folks have done before you and move to something that does what you need.

Age old product question. The Honda Civic is much larger today than the original, because the original targeted young people on a budget. As they aged and had kids, they needed more room in the car. Most peoples’ insinct is to buy a newer version of the car they love, so the civic grew to accommodate the aging market.

No real point here other than an observation about how the installed base’s needs change, across industries.

Just because it's easy to say doesn't mean it's wrong.

By the same logic, you could say Microsoft Access should have all the capabilities of Postgres because it's painful for small businesses to move off of it when it's no longer a good fit for their needs.

The thing is that PG had introduced pluggable storage engines exactly for the reason I am talking about, and there have been few implementations of columnar storage using PG functionality, it's just they always stayed out of the tree. So I wasn't talking about some functionality that is completely out of scope.

But I agree in the end I may be forced to move elsewhere...

I don't agree, a relational database is not required to have a particular storage engine or format, that is an implementation detail.

There are other relational databases that have both kinds of storage engines and some use both on the same table (row based insert with column based migration and secondary column store indexes: https://learn.microsoft.com/en-us/sql/relational-databases/i...

Just like you can have b-tree based table storage vs heap in the case of index organized tables / clustered indexes (which pg doesn't have) you can choose column based instead the logical data model is still the same relational model.

I mean I don't disagree with you, but they did just add a graph database feature, which is about as orthogonal to relational database design as you can get.

From a computer science perspective, I'm not exactly sure how a transactional database would implement a columnar type. Postgres + CDC + an actual analytical database like ClickHouse would be your strongest bet at scale.

"As the datasets become bigger and bigger,"

I think the better way is to use Postgresql for new data and routinely archive off older data to data warehouse type database, to keep the Postgres one small.

(Many companies also now use a RDBMS alongside either a KV database or document store in main app)

Which document store do you prefer? And do you use it via Postgres using a foreign wrapper?

Two most recent companies:

The first was using SQL Server alongside CouchDB, an older company. CRUD Data into SQL, documents into Couch. (used separately, document Url in Couch stored in SQL and brought in by front end)

The second, startup with funding, was using Azure SQL Server (and complaining of the cost) and using Azure Cosmo (Key Value NoSQL) alongside it. Same relationship, no link between the two, it's the front end that drew data from both.

Just look up the pro and cons of each database, some are insert only, some are distributed, some are faster read then write etc. LLM will give you a quick comparison.

Also: https://db-engines.com/en/ranking

In this regard I do find DuckDB amazing.

Fine by me, as a 25 years user. More is not necessarily better, see redis.

I was about to post something similar.

There was a big fanfare about orioledb a while ago, and i think it got bought by people that wanted to push that into mainstream postgres?

Did it die somewhere along the road?

Supabase customers run it (marked alpha).

https://supabase.com/blog/orioledb-launch

And they continue to work on it.

https://supabase.com/blog/orioledb-patent-free

(supabase employee)

confirming that we're still working hard on this. aiming to be production-ready this year. benchmarks and compatibility testing are on track