Two talks come to mind here: Mike Acton's Data-Oriented Design and C++ [1] and Brian Cantrill's The Complexity of Simplicity [2].
Mike's talk argues that code solutions need not be modelled on the real world, and that different data creates different problems, which need different solutions. I can't do the talk justice, but it's had a big impact on me.
Brian's talk is about abstraction generally, and how it's difficult to find the "right" abstraction.
> Mike's talk argues that code solutions need not be modelled on the real world, and that different data creates different problems, which need different solutions.
I've always found it odd when even fairly smart engineers sometimes prioritize real-world metaphors over the actual needs of the codebase. Years ago when I was only a few years out of school, I was implementing a connection pool in Rust, and the most reasonable way to implement it was to have the connection hold a weak reference to the pool so that it could get checked back in automatically when dropped. My manager (an extremely experienced engineer) didn't like this idea because "a library holds library books, not the other way around". I didn't feel like this was a compelling reason to design things differently, but he refused to engage with the issue in any way other than through the lens of that metaphor. Eventually the impasse was solved by one of the other managers in my department suggested that while library books don't contain libraries, they do have the name of the library stamped in the back as a reference to where they should be returned, and I guess my manager found this to be a reasonable extension of the analogy. If I were more experienced, maybe I would have recognized that I could find a way to engage with the analogy like the other manager did without ceding the point, but even today I still feel that it was completely bizarre to insist on that as the canonical way to frame things rather than just considering the ramifications of the abstraction in the code and the experience of using the library based on it.
This is somewhat related:
I mention this a lot, but in researching Data-Oriented Design (what Mike was talking about), I came across Richard Fabian's DoD book [1] which talks a lot about database normalization and the like. I found that odd, because the low-level high-performance game code he was talking about certainly wasn't going to marshal data into a DB to run SQL queries on it.
It turns out the relational model has a lot of advantages though. Programmers use trees all the time, in OO, in structs containing structs, in objects pointing to other objects. It's easy to forget that trees are just a special case of graphs (ie. networks), and that there are many ways to represent networks that don't rely on encoding a tree structure directly.
So, I've been doing what Richard Fabian suggested and I lay out my data (on paper) into tables, then attempt to normalize it and see the connections. I really like this way of designing things.
My big issue is that doing DB-like operations is hellish in most programming languages, and if you really want to try and marshal your data into a real DB (say, SQLite or DuckDB via a library), then you have a big messy translation layer where you're trying to match things to SQL types and you have giant SQL strings everywhere.
I see C# has LINQ, which is a query languages embedded in the language. I wonder if that approach is best, and why hasn't it been adopted more broadly? It seems like there's a lot for programming language designers to explore in this dimension, though I wonder if it even matters now with the superintelligence tidal wave.
1. https://www.dataorienteddesign.com/dodmain/
I read about it just a few days ago! I don't use Java, either, but it looked awesome. Right now I'm using Rust, which feels limited in this capacity outside of ORMs.
This is something I've thought about a lot over the years, not in small part because the connection pooling work that I mentioned above was during my first few years out of college where I worked at MongoDB on some of their database client libraries. I know MongoDB gets a lot of criticism on these parts of the internet (which at least in terms of technical opinions is in my opinion a mix of stuff that's warranted, stuff that's a bit more nuanced than internet arguments might make it seem like, and some stuff that's mostly just holdovers from the very early days that hasn't applied to any version of the database people have used in the past decade), but one of the things I always found interesting about it is how it changes the experience from what you describe to one where the bulk of the work is figuring out the best way to model the data (where you have to care about things like "how 'many' is this 'one to many' relation" and "when I access this data, is there any other data I'd almost always expect to need to access at the same time?"), and if you've done that right, the queries themselves end up being a lot more straightforward to come up with (either single operations like "find this" or a pipeline of transformations starting from "find this" and then "do this to the output of the last stage", compared to the "inside out" way you sometimes have to wrap up subqueries in SQL with outer queries).
It's a reasonable take that changing the entire way that the database modeled everything under the hood is an overkill solution to the specific problem you mention compared to something like LINQ that can work on top of existing databases, but I can't help but wonder if there's a bit of inertia in how willing people are to challenge their usual ways of thinking about how data modeling might be possible to improve because a lot of people don't get exposed very much to anything other than the raw, string-like handling that you mention (which is annoying but at least SQL injections are a well-known thing nowadays and tend to be possible to avoid) or a full-blown ORM (which quite often ends up either being wildly inefficient or needing to drop back down into the raw SQL in some places to avoid the performance bottlenecks, which kinda defeats the entire point). A startup I worked at a few years ago actually had what I thought was a pretty clever solution to this problem, with their product generating OpenAPI/GraphQL APIs for a given database by inspecting the schema (with optional parameters to get back EXPLAIN data in the responses to verify that the query was what you wanted, and the ability to define custom routes with raw queries that were checked into shared version control with the schema migrations if you weren't happy with the query it generated as a way to properly separate concerns as an improvement over the traditional ORM workflow), but despite the idea seeming quite enticing to me from a technical standpoint, I guess it didn't show enough traction to be able to survive.
> My big issue is that doing DB-like operations is hellish in most programming languages, and if you really want to try and marshal your data into a real DB (say, SQLite or DuckDB via a library), then you have a big messy translation layer where you're trying to match things to SQL types and you have giant SQL strings everywhere.
I prefer having that translation layer especially when it's domain oriented. All the sql strings are collected in one isolated module, and the only exported symbols is a set of functions.
From Domain-Driven Design, what I learned is to be comfortable having different representation of the same data in different layers/subdomains. Something may be a fat object from the API, but I prefer having a collection of functions that each use a different part and have a caching layer to not actually do the expensive network call. That network call and the caching layer will be encapsulated in one module and the collection of functions will be the only thing visible.
Didn't we all spend years and years learning that OOP is about writing "class Car extends Vehicle"?
That's one of the things Mike was pushing back on (in the context of C++ / game dev; there's a good example ~20 min in). In case you haven't seen it, Casey Muratori's The Big OOPs: Anatomy of a Thirty-five-year Mistake digs into the different ideas and viewpoints surrounding OOP and how it coalesced into a dogma:
https://www.youtube.com/watch?v=wo84LFzx5nI
Honestly since college I've mostly learned that inheritance is more trouble than it's worth
See also:
Data-Oriented Programming: Reduce software complexity
by Yehonathan Sharvit
https://www.manning.com/books/data-oriented-programming
and from SICP:
2.4.3 Data-Directed Programming and Additivity
https://sarabander.github.io/sicp/html/2_002e4.xhtml#g_t2_00...
There's a reason why "naming things" is one of the two hard problems in Computer Science. Data-Oriented Design (DOD) and Data-Oriented Programming (DOP) are two different things which has caused a fair amount of confusion on HN before. Data-Directed Programming (DDP) appears to be a third, different thing.
In searching those threads, I came across a post from the author of the DOP book describing the difference between DOD, DOP, and DDP (a different DDP! Data Driven Programing; a fourth thing!) [2], and I see he also made the "naming things" joke in the first paragraph, so I guess my humour isn't that unique!
There's been quite a bit of discussion about DOD and some about DOP (and much conflation between the two) on HN, it can be interesting to read [3].
1. https://martinfowler.com/bliki/TwoHardThings.html
2. https://blog.klipse.tech/visualization/2021/02/16/data-relat...
3. https://hn.algolia.com/?q=Data+Oriented
this one comes to mind https://youtu.be/17KCHwOwgms