I’m not saying it doesn’t have flaws, but the tidyverse is still the most coherent and functional ML/stat computing ecosystem I’ve ever used. R packages outside of the tidyverse can get pretty gnarly. Even the R stdlib is usually considered to be inconsistent and riddled with legacy cruft.
I would be in minority. But, I don't like tidyverse ecosystem. I prefer data.table for most of my uses.
Data.table is just so much faster, and the sql-like stntax is easier tonunderstand
I can never remember the data.table syntax, every time I use it I have to re-learn it. It doesn't feel very SQL to me either. There is an interface to use tidy syntax on data.tables and get's you 90% of the speed.
The core of the problem is that the tidyverse is trying to turn R into a user-friendly real-time calculator, rather than a tool for stable, deterministic, and literate data analysis.
"The real problem with this package is that it makes things easy, instead of hard"
That’s a rather glib interpretation of what I said.
I’m being rather charitable when I call the tidyverse “user-friendly”. That might be the goal of tidyverse devs, but it hasn’t been the reality since inception.
The point is even if we assume it is or will eventually become user-friendly, are we willing to accept the trade-offs that come with that?
I think the users have clearly delivered the verdict that it is friendlier than base R. Admittedly a low bar. Non R users, which do you prefer:
The first one by a mile.
Is this a joke?
It's certainly quite pleasant to work with...but I would rather use sql for etl, the backend be whatever it needs to be...
The real world data transformations can get gnarly very quickly and sql is the perfect common debiminator compared to dplyr which is still niche...
How do you feel about polars?
I’m a big fan of Polars. It’s really fast and memory efficient. With the lazy streaming functionality, I’ve been able to easily process 1 Tb+ data on a single machine (you do have to be careful to not do any operation that would cause the whole DF to materialize in that case).
It’s certainly miles better than Pandas, which has a terrible API in addition to being comically inefficient. In my group, we generally use it for any new work, and have also swapped out pandas for polars in critical spots of our existing code - the latter giving a huge benefit relative to the amount of work it took.
I largely agree with you on SQL being the common denominator, but there are some things that are just awkward in SQL, and much easier to do in Python or other general purpose language.