People would typically choose based on CRAN TaskViews or follow conventional methodologies, but what I notice from this is that R is truly a language used only by those who use it. And the people who use it are usually master's students or professors; it's rarely used at the undergraduate level. So even those with that level of academic background and training must have had their own implementation roadblocks. Could that be why the use of R has exploded with the help of AI? Looking at this, I think it's fair to understand that even domain experts found programming difficult. Seeing this, can we really say that AI is always bad? For some people, it has become both the hands and a voice for their words.
> Seeing this, can we really say that AI is always bad?
Is anyone arguing “AI is always bad”? I think the argument is clearly “the negatives outweigh the positives”.
You're right. I think I overstated it. Since English isn't my native language, I might have used some stronger words than intended. Thank you for pointing that out
There is some great stuff in R but from a software engineering level I'd much rather data scientists work in Python.
At risk of sounding like ChatGPT, it's not an R thing, it's a general thing. Turn [showdead] on in your profile and see how Show HN is flooded with AI slop projects and we all know GitHub is drowning in it.
I also think Python is a bit better. (Though, unlike you, my programming skills are directly tied to my livelihood, so it benefits me if one language can cover as much ground as possible. Being locked into a specific domain just narrows the number of jobs I can take on.) You're not wrong, but it makes me pretty sad that all my homepage submissions are marked as 'showdead' and no one ever sees them. Maybe my submissions would look like rubbish by your standards. But looking at it that way, there's also the gap between what people expect and what the site's filters decide.
I've got a very Clark Kent kind of a job doing very ordinary work at a university unit which is authoritative in its domain and don't talk a lot about what I do there because the last thing I want to do is have people think my opinions have anything to do with my employer (and the second to last thing I want to do is post statements to that effect!)
I code Java and Javascript by day and mostly Python for my side projects because of the practicality. I've always been the guy who can finish projects that other people couldn't by attending to the essential details that everybody else feels entitled to ignore.
As for your problem you are making the classic mistake of repeatedly posting links to your blog and nothing but links to your blog. If you were finding articles from other sources and posting (say) 10% from your own blog you wouldn't be tripping up the filters.
Sure I seem to have a led a glamourous life of enterprise sales, industrial espionage and always being ready to write and give a talk in 48 hours if a TED speaker gets kidnapped but the reality I haven't had time to fix the busted Python packages that my autoposter depends on. I am way too busy transforming into a fox when I go down the elevator and casting glamours on people and I am always tell this witch that I am the familiar of that witch (and the other way around) but on the 6th floor they have no idea who they are dealing with.
thanks!
A considerable amount of work for grad students is answering the question: "How the f#$% do I get this code to compile and run"
Some other researcher, often with limited skills in your native tongue, even more limited skills in software development best practices, wrote some code for a paper between 5 and 50 years ago and your PI has told you to use that code and some OTHER code together at the same time to validate some experiment he wants you to do.
In the past you would take days/weeks/months to get this to work, but with an LLM?
I'm envious of the grad students of today for the amount of nonsense which is bypassable.
The other half is: "What combination of packages and task views do I actually need to not reinvent the wheel for this particular type of analysis?"
And the third half is "what preprocessing and analysis methods I actually need".
Because I have never met a person who is great at that last part (methods theory) and sucks at the others (technical implementation; because the same work and effort leads one to train both). The issue is that AI solves all these problems at once, which will probably result in more academics understanding their methods and choices in preprocessing etc even less. At least this is what I have seen, and seen it getting worse.
I wish the problem was just finding the right packages. Web search, and mentoring/talking to colleagues are pretty good solutions to that. LLMs are more of a gamble here if one use them as an authoritative source, they may suggest the right package, or they may take you on a long trip to nowhere, depending on random factors.
One of the ways we can be hopeful that LLMs will have a positive effect is the ability to interrogate a researcher "so the LLM did this and this and this for you, what did YOU do?" and whatever that is can be the focused judgement, quality, and purpose of a researcher.
In other words, relieving the researcher of the slog of editing proposals, figuring out compiler configuration, and locating esoteric code from years past they can be judged on the quality of their actual contribution now that the menial tasks can be delegated to an LLM and reviewed by hopefully the expert researcher.
Programming is a lot easier than statistics bc it’s deterministic, whereas statistics is stochastic (that extends and encompasses deterministic functions).
AI speeds up learning, so I bet that’s what you’re noticing with R.
As an aside, the best programmers these days are probabilistic programmers (who write stochastic functions). Our languages are Stan and PyMC. Both can be called by Python or R, and AI writes all of them extremely well. So it seems to me that the underlying language matters less than ever.
I partially agree, but I also differ on some points. The part I agree with is that probabilistic programming is difficult and that advanced programmers tend to enjoy it. Where I differ is on the claim that programming is deterministic. At the script level, programming is deterministic and sequential, but once it crosses a certain threshold, it becomes absolutely probabilistic. That's because latency, locks, and asynchronous communication start to intervene. If programming were Non deterministic , C's undefined behavior wouldn't exist; everyone would have prevented it.
R these days mostly uses the tidyverse, which feels like a variant of DOP (Data-Oriented Programming). It's a kind of data flow, so it's different from typical OOP. I also occasionally work with statisticians (being a freelancer, ETL work is more common than you'd think), and I know what you mean by Stan and PyMC. I know they're powerful tools for Bayesian statistics and multilevel modeling. I know the basic syntax and examples, but I wouldn't say I know them well. My level is mainly focused on the scientists who hire me, and those tools still don't come up often in my country.
That said, I think we differ on the bigger picture because academic code isn't everything. Academic code is typically algorithm‑centric, like LeetCode problems, but most production work revolves around code hygiene and responsibility (algorithms are usually already established ones). Anyway, that's not the main point. What you said is mostly correct, but my focus was on something else: even people who studied at that level can be surprisingly clumsy at expressing themselves through programming. Regardless, thanks for your input, and I agree that AI is good at programming. But using a programming language generally means understanding its tradeoffs, and R is tricky in that regard since it feels like a mix of OOP and DOP variants
You don't have to use tidy verse. I never reached for it since base R does it all already. I write in a functional approach though and I understand a lot of pythonistas and others prefer the object oriented approach. R written in a function approach is lean and mean. Use your apply functions, kids.
I think saying academic code is algorithm centric is perhaps missing the larger userbase of academic code: not people writing the functions and vetting on simulated data, but the people actually using the functions on real world data.
This is why there is a seemingly uncaring attitude towards typical programming conventions. They do not matter. The code is pretty much a one off for the given analysis. It doesn't matter if it takes an hour to run or two weeks on the cluster. You are chasing the wrong dragon when you try and make your two week run time into something sensible. Spending effort on process for process sake and not the hypothesis building, discovery, and analysis.
It is a different planet than the world of professional CS where it is really about process and saving time and money, and results that aren't highly convenient to the bottom line are largely ignored. There is no bottom line to satisfy with research compute, only reporting what the evidence suggests and publishing this information.
Picking up on some dunning kruger effect here.
Programming isn’t even a field in the same way as prob&stats. Computer science does in fact have non-deterministic sub fields such as information theory.
There’ll always be boundary tending, true. Only a portion of CS deals with stochastic functions though, whereas all of statistics is stochastic. That makes a big difference, bc the world is complex.
Information theory doesn’t even incorporate utility.