When I searched for papers on using LLMs, I found that typically, you can have an LLM generate code and then ask it to find GitHub projects similar to that code. Then you can learn by looking at the pull requests and seeing how they structure things
In the old days, if I wanted to understand why memory offsets, padding techniques, or data layout structures were written a certain way, I had to stare at a senior programmer's code all day or wait for them to reply. But LLMs, while they do flatter me, explain things at a level I can actually understand. And LLMs don't get annoyed.
-Why do you cut API boundaries this way?
-Why do you change the order of struct fields?
-Why do you deliberately insert padding?
Most of it depends on the background and context. Sometimes you add it, sometimes you don't. To understand this tacit knowledge, you need access to senior developers. But their attitude often depends on how promising the student is and what background they come from. On top of that, you don't have to rely on the respondent's mood, authority, or availability.
Programming is fundamentally a field that requires seniors. In my case, I had no such seniors at all. I learned to code by buying codebases from failed companies and studying them. My first job didn't hire me as an employee—they hired me as the CEO of a subcontracting company (because that was structurally more advantageous for the contract). So I wasn't given the patience to learn programming fundamentals gradually. I had to pay penalties if I failed. Most of the projects I worked on were the kind where failure meant bankruptcy for me. Naturally, there was no one to teach me.
Most of my knowledge comes from reverse-engineering the code I purchased.
People say LLM code contains falsehoods, but commercially sold code has always had falsehoods too. Honestly, if we're just talking ratios, LLM code has fewer falsehoods.
In that sense, I still think it's a matter of context. If LLM code is false, was human code ever really true? LLMs do lie. They generate plenty of incorrect code. But humans do the same thing. If a problem comes up, you just look it up then and there. For me, LLMs and humans aren't all that different.
Good programmers are ashamed to push anything less than good (at least in their own opinion) to popular public repos. Some of those same pedantic programmers have no problem pushing crap in enterprise repos, and feel absolved because they are pushed to focus on deadlines, new features, and refactoring is very rarely planned for. I did and managed a lot of corporate software development in companies big and small, and did my fair bit of M&As and looked at codebases of successful companies. I dont ever recall feeling impressed. And I am regularly impressed by the aesthetic qualities of popular open source packages. I think commercial code is mostly shit, with the exception of regulated, serious industries (power, space, flight, etc.).
To elaborate a bit more: open source is about 'symbolic capital' — it's about building a reputation that says, 'I can write code at this level.'
Commercial closed source, on the other hand, is about 'I need to make money by writing this.'
Generally, open source projects tend to have less code written over time, especially when the contributors aren't depending on it for their livelihood. But with commercial closed source, it's not uncommon to have to write 60,000 lines of code per month.
On top of that, open source rarely has to deal with requirements changing dramatically mid-development. With closed source, requirements often shift from the original plan, and you end up compromising code quality just to meet those changing specs. As a result, if you're comparing purely in terms of logical completeness, open source tends to be better.
For example, singletons are rarely used in modern open source, but they're still pretty common in commercial code these days.
When I searched for papers on using LLMs, I found that typically, you can have an LLM generate code and then ask it to find GitHub projects similar to that code. Then you can learn by looking at the pull requests and seeing how they structure things In the old days, if I wanted to understand why memory offsets, padding techniques, or data layout structures were written a certain way, I had to stare at a senior programmer's code all day or wait for them to reply. But LLMs, while they do flatter me, explain things at a level I can actually understand. And LLMs don't get annoyed.
There's a lot of tacit knowledge in programming.
-Why do you cut API boundaries this way? -Why do you change the order of struct fields? -Why do you deliberately insert padding?
Most of it depends on the background and context. Sometimes you add it, sometimes you don't. To understand this tacit knowledge, you need access to senior developers. But their attitude often depends on how promising the student is and what background they come from. On top of that, you don't have to rely on the respondent's mood, authority, or availability.
Programming is fundamentally a field that requires seniors. In my case, I had no such seniors at all. I learned to code by buying codebases from failed companies and studying them. My first job didn't hire me as an employee—they hired me as the CEO of a subcontracting company (because that was structurally more advantageous for the contract). So I wasn't given the patience to learn programming fundamentals gradually. I had to pay penalties if I failed. Most of the projects I worked on were the kind where failure meant bankruptcy for me. Naturally, there was no one to teach me.
Most of my knowledge comes from reverse-engineering the code I purchased.
People say LLM code contains falsehoods, but commercially sold code has always had falsehoods too. Honestly, if we're just talking ratios, LLM code has fewer falsehoods.
In that sense, I still think it's a matter of context. If LLM code is false, was human code ever really true? LLMs do lie. They generate plenty of incorrect code. But humans do the same thing. If a problem comes up, you just look it up then and there. For me, LLMs and humans aren't all that different.
What do you think of modern open-source codebases presently available to the public? Is closed-source/proprietary code that much better?
Closed, proprietary code is way, way worse.
Good programmers are ashamed to push anything less than good (at least in their own opinion) to popular public repos. Some of those same pedantic programmers have no problem pushing crap in enterprise repos, and feel absolved because they are pushed to focus on deadlines, new features, and refactoring is very rarely planned for. I did and managed a lot of corporate software development in companies big and small, and did my fair bit of M&As and looked at codebases of successful companies. I dont ever recall feeling impressed. And I am regularly impressed by the aesthetic qualities of popular open source packages. I think commercial code is mostly shit, with the exception of regulated, serious industries (power, space, flight, etc.).
Open source is much better. Closed source is mostly considered 'done' as long as it just works.
One is a 'craft,' the other is 'survival for delivery.'
To elaborate a bit more: open source is about 'symbolic capital' — it's about building a reputation that says, 'I can write code at this level.'
Commercial closed source, on the other hand, is about 'I need to make money by writing this.'
Generally, open source projects tend to have less code written over time, especially when the contributors aren't depending on it for their livelihood. But with commercial closed source, it's not uncommon to have to write 60,000 lines of code per month.
On top of that, open source rarely has to deal with requirements changing dramatically mid-development. With closed source, requirements often shift from the original plan, and you end up compromising code quality just to meet those changing specs. As a result, if you're comparing purely in terms of logical completeness, open source tends to be better.
For example, singletons are rarely used in modern open source, but they're still pretty common in commercial code these days.