In my experience, Go is one of the best LLM targets due to simplicity of the language (no complex reasoning in the type system or borrow checker), a high quality, unified, and language-integrated dependency ecosystem[1] for which source is available, and vast training data.

[1]: Specifically, Go community was trained for the longest time not to make backward-incompatible API updates so that helps quite a bit in consistency of dependencies across time.

I have never understood why people want to use LLMs for programming outside of learning. I have written Perl, C, C#, Rust, and Ruby professionally and to this day I feel like they would slow me down.

I have used golang in the past and I was not am still not a fan. But I recently had to break it out for a new project. LLMs actually make golang not a totally miserable experience to write, to the point I’m honestly astonished that people have found it pleasant to work with before they were available. There is so much boilerplate and unnecessary toil. And the LLMs thankfully can do most of that work for you, because most of the time you’re hand-crafting artisanal reimplementations of things that would be a single function call in every other language. An LLM can recognize that pattern before you’ve even finished the first line of it.

I’m not sure that speaks well of the language.

> I have never understood why people want to use LLMs for programming outside of learning

"I have never understood why people want to use C for programming outside of learning m. I have written PDP11, Motorola 6800, 8086 assembly professionally and to this day I feel like they would slow me down. I have used C in the past and I was not am still not a fan. But I recently had to break it out for a new project. Turbo C actually make C not a totally miserable experience to write, to the point I’m honestly astonished that people have found it pleasant to work with before they were available. There is so much boilerplate and unnecessary toil. And Turbo C with a macro library thankfully can do most of that work for you, because most of the time you’re hand-crafting artisanal reimplementations of things that would be a single function call in every other language. A macro can recognize that pattern before you’ve even finished the first line of it. I’m not sure that speaks well of the language."

They are enormously powerful tools. I cannot imagine LLMs not being one of the primary tools in a programmer's toolbox, well... for as long as coding exists.

Right now they are fancy autocompletes. That is enormously useful for a language where 90% of the typing is boilerplate in desperate need of autocompletion.

Most of the “interesting” logic I write is nowhere close to autocompleted successfully and most of it needs to be thrown out. If you’re spending most of your days writing glue that translates one set of JSON documents or HTTP requests into another I’m sure they’re wildly useful.

I don't know which models you are using, but in my experience they have been way more than fancy autocomplete today. I have had thousand line programs written and refined with just a few prompts. On the analysis and code review side, they have been even more impressive, finding issues and potential impacts of changes and describing the intent behind the code. I implore you to revisit good models like Gemini 2.5 Pro. To wit, there was an actual Linux kernel vulnerability in SMB protocol stack discovered with LLM a few days ago.

Even if we take the narrow use case of boilerplate glue code that transforms data from one place to another, that encompasses almost all programs people write, statistically. There was a running joke at Google "we are just moving protobufs." I would not call this "fancy autocomplete."

It comes back to the nature of the work; I've got a hobby project which is basically an emulator of CP/M, a system from the 70s, and there is a bug in it.

My emulator runs BBC Basic, Zork, Turbo Pascal, etc, etc, but when it is used to run a vintage C compiler from the 80s it gives the wrong results.

Can an LLM help me identify the source of this bug? No. Can I say "fix it"? No. In the past I said "Write a test-case for this CP/M BDOS function, in the same style as the existing tests" and it said "Nope" and hallucinated functions in my codebase which it tried to call.

Basically if I use an LLM as an auto-completer it works slightly better than my Emacs setup already did, but anything more than that, for me, fails and worse still fails in a way that eats my time.

> Can an LLM help me identify the source of this bug? No. Can I say "fix it"? No. In the past I said "Write a test-case for this CP/M BDOS function, in the same style as the existing tests"

These are all things I've done successfully with ChatGPT o1 and o3 in a 7.5kloc Rust codebase.

I find the key is to include all information which may be necessary to solve the problem in the prompt. That simple.

I wrote a summary of my issue on a github comment, and I guess I will try again

https://github.com/skx/cpmulator/issues/234#issuecomment-291...

But I'm not optimistic; all previous attempts at "identify the bug", "fix the bug", "highlight the area where the bug occurs" just turn into timesinks and failures.

It seems like your problem may be related to asking it to analyze the whole emulator _and_ compiler to find the bug. I'd recommend working first to pare the bug down to a minimal test case which triggers the issue - the LLM can help with this task - and then feed the LLM the minimal test case along with the emulator source and a description of the bug and any state you can exfiltrate from the system as it experiences the issue.

Indeed running a vintage, closed-source, binary under an emulator it's hard to see what it is trying to do, short of decompiling it, and understanding it. Then I can use that knowledge to improve the emulation until it successfully runs.

I suggested in my initial comment I'd had essentially zero success in using LLMs for these kind of tasks, and your initial reply was "I've done it, just give all the information in the prompt", and I guess here we are! LLMs clearly work for some people, and some tasks, but for these kind of issues I'd say we're not ready and my attempts just waste my time, and give me a poor impression of the state of the art.

Even "Looking at this project which areas of the CP/M 2.2 BIOS or BDOS implementations look sucpicious?", "Identify bugs in the current codebase?", "Improve test-coverage to 99% of the BIOS functionality" - prompts like these feel like they should cut the job in half, because they don't relate to running specific binaries also do nothing useful. Asking for test-coverage is an exercise in hallucination, and asking for omissions against the well-known CP/M "spec" results in noise. It's all rather disheartening.

> Indeed running a vintage, closed-source, binary under an emulator it's hard to see what it is trying to do, short of decompiling it, and understanding it.

Break it down. Tell the LLM you're having trouble figuring out what the compiler running under the emulator is doing to trigger the issue, what you've done already, and ask for it's help using a debugger and other tools to inspect the system. When I did this o1 taught me some new LLDB tricks I'd never seen before. That helped me track down the cause of a particularly pernicious infinite recursion in the geometry processing code of a CAD kernel.

> Even "Looking at this project which areas of the CP/M 2.2 BIOS or BDOS implementations look sucpicious?", "Identify bugs in the current codebase?", "Improve test-coverage to 99% of the BIOS functionality" - prompts like these feel like they should cut the job in half, because they don't relate to running specific binaries also do nothing useful.

These prompts seem very vague. I always include a full copy of the codebase I'm working on in the prompt, along with a full copy of whatever references are needed, and rarely ask it questions as general as "find all the bugs". That is quite open ended and provides little context for it to work with. Asking it to "find all the buffer overflows" will yield better results. As it would with a human. The more specific you can get the better your results will be. It's also a good idea to ask the LLM to help you make better prompts for the LLM.

> Asking for test-coverage is an exercise in hallucination, and asking for omissions against the well-known CP/M "spec" results in noise.

In my experience hallucinations are a symptom of not including the necessary relevant information in the prompt. LLM memories, like human memories, are lossy and if you force it to recall something from memory you are much more likely to get a hallucination as a result. I have never experienced a hallucination from a reasoning model when prompted with a full codebase and all relevant references. It just reads the references and uses them.

It seems like you've chosen a particularly extreme example - a vintage, closed-source, binary under an emulator - didn't immediately succeed, and have written off the whole thing as a result.

A friend of mine only had an ancient compiled java app as a reference, he uploaded the binary right in the prompt, and the LLM one-shotted a rewrite in javascript that worked first time. Sometimes it just takes a little creativity and willingness to experiment.

7.5 kloc is pretty tiny, sounds like you may be able to get the entire thing into the context.

Lots of Rust libraries are relatively small since Cargo makes using many libraries in a single project relatively easy. I think that works in favor of both humans and LLMs. Treating the context window as an indication that splitting code up into smaller chunks might be a good idea is an interesting practice.

I generally have to maintain the code I write, often by myself; thousands of lines of uninspired slop code is the last thing I need in my life.

Friction is the birth place of evolution.

Some people go to camping now and then to hunt their own food and feel connected to nature and feel that friction. They just won't want it every day. Just like they don't tend to generate the underlying uninspired assembly themselves. FWIW if your premise is the code they generate is necessarily unmaintainable compared to an average CS college graduate human baseline, I'd argue against that premise.

I've always found it fascinating how frequently I've seen the complaint about Go re: boilerplate and unnecessary toil, but in previous statements Rust was uttered with an uncritical breath. I agree with the complaint about Go, but I have the same problem with Rust. LLMs have made Rust much more joyful for me to write, and I am sure much of this is obviously subjective.

I do like automating all the endless `Result<T, E>` plumbing, `?` operator chains, custom error enums, and `From` conversions. Manual trait impls for simple wrappers like `Deref`, `AsRef`, `Display`, etc. 90% of this is structural too, so it feels like busy work. You know exactly what to write, but the compiler can't/won’t do it for you. The LLM fills that gap pretty well a significant percentage of the time.

But to your original point, the LLM is very good at autocompleting this type of code zero-shot. I just don't think it speaks ill of Rust as a consequence.

This is akin to saying that you prefer a horse to a car because you don't have to buy gas for a horse, it can eat for free so why use it?

The first cars were probably much less useful than horses. They didn’t go very far, gas pumping infrastructure wasn’t widely available, and you needed specialized knowledge to operate them.

Sure, they got better. But at the outset they were a pretty poor value proposition.

Well it certainly makes error handling easy. No need to reason about complex global exception handlers and non-linear control structures. If you see an error, return it as a value and eventually it will bubble up. If err != nil is verbose but it makes LLMs and type checkers happy.

I have never seen any AI system could explain correctly on the following Golang code:

    package main

    func alwaysFalse() bool {
     return false
    }

    func main() {
     switch alwaysFalse() // don't format the code
     {
     case true:
      println("true")
     case false:
      println("false")
     }
    }
> Go community was trained for the longest time not to make backward-incompatible API updates so that helps quite a bit in consistency of dependencies across time

Not true for Go 1.22 toolchains. When you use Go 1.21-, 1.22 and 1.23+ toolchains to build the following Go code, the outputs are not consistent:

    //go:build go1.21
    package main

    import "fmt"

    func main() {
     for counter, n := 0, 2; n >= 0; n-- {
      defer func(v int) {
          fmt.Print("#", counter, ": ", v, "\n")
          counter++
      }(n)
     }
    }

You're bringing up exceptions rather than a rule. Sure you can find things they mess up. The whole premise of a lot of the "AI" stuff is approximately solving hard problems rather than precisely solving easy ones.

The opposite is true, they sometimes guess correctly, even a broken watch is right two times a day.

I believe future AI systems can make correct answers. The rule is clearly specified in Go specification.

BTW, I haven't found an AI system can get the correct output for the following Go code:

    package main

    import "fmt"

    func main() {
        for counter, n := 0, 2; n >= 0; n-- {
            defer func(v int) {
                fmt.Print("#", counter, ": ", v, "\n")
                counter++
            }(n)
        }
    }

What do you base that prediction on? Without a fundamental shift in the underlying technology, they will still just be guessing.

Because I am indeed experiencing the fact that AI systems do better and better.

It can easily explain it with a little nudge.

Not sure why you feel smug about knowing such a small trivia, ‘gofmt’ would rewrite it to semicolon anyway.

I write code in notebook++ and never format my code. :D

Go is a great target for LLM because it needs so much boilerplate and LLMs are good at generating that.

AFAIK the borrow checker is not strictly needed to compile Rust. I think one of the GCC Rust projects started with only a compiler and deferred adding borrow checking later.

The borrow checker does not change behavior, so any correct program will be fine without borrow checking. The job of borrow checking is to reject programs only.

mrustc also does not implement a borrow checker.

Not that much different than a type checker in any language (arguably it is the same thing).

I have been using various LLMs extensively with Rust. It's not just borrow checker. The dependencies are ever-changing too. Go and Python seem to be the RISC of LLM targets. Comparatively, the most problematic thing about generated Go code is the requirement of using every imported package and declared symbol.

[deleted]