For those of you for who it is working: show your code, please.

I'll bite. Here's a 99.9% vibe-coded raw Git repository reader suitable for self-hosted or shared host environments:

https://repo.autonoma.ca/treetrek

There's still some work to do on the rendering side of model objects. Developing the syntax highlighting rules for 40 languages and file formats in about 10 minutes was amazing to see.

https://repo.autonoma.ca/repo/treetrek/tree/HEAD/render/rule...

Cool, thank you.

Edit, great example. What is your long term maintenance strategy, do you keep the original prompts around so you can refine them later or do you dig into the source?

Would love to see more of your workflow.

Here's one success I had -

https://github.com/sroerick/pakkun

It's git for ETL. I haven't looked at the code, but I've been using it pretty effectively for the last week or two. I wouldn't feel comfortable recommending it to anybody else, but it was basically one-shotted. I've been dogfooding it on a number of projects, had the LLM iterate on it a bit, and I'm generally very happy with the ergonomics.

That's a nice example, can you explain your 'one shot' setup in some more detail?

I don't have the prompt, but I used codex. I probably wrote a medium sized paragraph explaining the architecture. It scaffolded out the app, and I think I prompted it twice more with some very small bugfixes. That got me to an MVP which I used to build LaTeX pipelines. Since then, I've added a few features out as I've dogfooded it.

It's a bit challenging / frustrating to get LLMs to build out a framework/library and the app that you're using the framework in at the same time. If it hits a bug in the framework, sometimes it will rewrite the app to match the bug rather than fixing the bug. It's kind of a context balancing act, and you have to have a pretty good idea of how you're looking to improve things as you dogfood. It can be done, it takes some juggling, though.

I think LLMs are good at golang, and also good at that "lightweight utility function" class of software. If you keep things skeletal, I think you can avoid a lot of the slop feeling when you get stuck in a "MOVE THE BUTTON LEFT" loop.

I also think that dogfooding is another big key. I coded up a calculator app for a dentist office which 2-3 people use about 25 times a day. Not a lot of moving parts, it's literally just a calculator. It could basically be an excel spreadsheet, except it's a lot better UX to have an app. It wouldn't have been software I'd have written myself, really, but in about 3 total hours of vibecoding, I've had two revisions.

If you can get something to a minimal functional state without a lot of effort, and you can keep your dev/release loop extremely tight, and you use it every day, then over time you can iterate into something that's useful and good.

Overall, I'm definitely faster with LLMs. I don't know if I'm that much faster. I was probably most fluent building web apps in Django, and I was pretty dang fast with that. LLMs are more about things like "How do you build tests to prevent function drift" and "How can I scaffold a feedback loop so that the LLM can debug itself".

I like your pragmatic attitude to all this.

I think your prompts are 'the source' in a traditional sense, and the result of those prompts is almost like 'object code'. It would be great to have a higher level view of computer source code like the one you are sketching but then to distribute the prompt and the AI (toolchain...) to create the code with and the code itself as just one of many representations. This would also solve some of the copyright issues, as well as possibly some of the longer term maintainability challenges because if you need to make changes to the running system in a while then the tool that got you there may no longer be suitable unless there is a way to ingest all of the code it produced previously and then to suggest surgical strikes instead of wholesale updates.

Thank you for taking the time to write this all out, it is most enlightening. It's a fine line between 'nay sayer' and 'fanboi' and I think you've found the right balance.

Thanks for reading it! I didn't use an LLM, lol.

On documentation, I agree with you, and have gone done the same road. I actually built out a little chat app which acts as a wrapper around the codex app which does exactly this. Unfortunately, the UI sucks pretty bad, and I never find myself using it.

I actually asked codex if it could find the chat where I created this in my logs. It turns out, I used the web interface and asked it to make a spec. Here's the link to the chat. Sorry the way I described wasn't really what happened at all! lol. https://chatgpt.com/share/69b77eae-8314-8005-99f0-db0f7d11b7...

As it happens, I actually speak-to-texted my whole prompt. And then gippity glazed me saying "This is a very good idea". And then it wrote a very, very detailed spec. As an aside, I kind of have a conspiracy theory that they deploy "okay" and "very very good" models. And they give you the good model based on if they think it will help sway public opinion. So it wrote a pretty slick piece of software and now here I am promoting the LLM. Oof da!

I didn't really mention - spec first programming is a great thing to do with LLMs. But you can go way too far with it, also. If you let the LLM run wild with the spec it will totally lose track of your project goals. The spec it created here ended up being, I think, a very good spec.

I think "code readability" is really not a solved problem, either pre or post LLM. I'm a big fan of "Code as Data" static analysis tools. I actually think that the ideal situation is less of "here is the prompt history" and something closer to Don Knuth's Literate Programming. I don't actually want to read somebody fighting context drift for an hour. I want polished text which explains in detail both what the code does and why it is structured that way. I don't know how to make the LLMs do literate programming, but now that I think about it, I've never actually tried! Hmmm....

Hers is one: https://github.com/mohsen1/fesh

beats the best compression out there by 6% on average. Yet nobody will care because it was not hand written

That's a very interesting case. If you want I will look into this in more detail, I'm waiting for some parts so I have some time to kill.

Are you an expert in this field? I'm curious if the AI generated code here is actually good.

I've done some work on compression really long ago but I am very far from an expert in the field, in fact I'm not an expert in any field ;) The best I ever did was a way to compress video better than what was available at the time but wavelets overtook that and I have not kept current.

I'm curious about two things:

- is it really that much better (if so, that would by itself be a publishable result) where better is

  - not worse for other cases

  - always better for the cases documented
I think that's a fair challenge.

- is it correct?

And as a sidetrack to the latter: can it be understood to the point that you can prove it is correct? Unfortunately I don't have experience with your toolchain but that's a nice learning opportunity.

Question: are you familiar with

https://www.esa.int/Enabling_Support/Space_Engineering_Techn...

https://en.wikipedia.org/wiki/Calgary_corpus

https://corpus.canterbury.ac.nz/

As a black box it works. It produces smaller binaries. when extracted matching bit-by-bit to the original file.

I tested across 100 packages. better efficiency across the board.

But I don't know if I (or anyone) want to maintain software like this. Where it's a complete black box.

it was a fun experiment though. proves that with a robust testing harness you can do interesting things with pure AI coding

Why is this the attitude when it comes to AI? Can you imagine someone saying “please provide your code” when they claim that Rust sped up their work repo or typescript reduced errors in production?

Yes, I can absolutely imagine that.

Eh, sorry, I may have been too quick to judge, but in the past when I have shared examples of AI-generated code to skeptics, the conversation rapidly devolves into personal attacks on my ability as an engineer, etc.

I think the challenge is to not be over-exuberant nor to be overly skeptical. I see AI as just another tool in the toolbox, the fact that lots of people produce crap is no different from before: lots of people produced crappy code well before AI.

But there are definitely exceptions and I think those are underexposed, we don't need 500 ways to solve toy problems we need a low number of ways to solve real ones.

Some of the replies to my comment are exactly that, they show in a much more concrete way than the next pelican-on-a-bicycle what the state of the art is really capable of and how to achieve real world results. Those posts are worth gold compared to some of the junk that gets high visibility, so my idea was to use the opportunity to highlight those instead.

FWIW, I did a full modernization and redesign of a site (~50k loc) over a week with Claude - I was able to ensure quality by (ahead of time) writing a strong e2e test suite which I also drove with AI, then ensuring Claude ran the suite every time it made changes. I got a bunch of really negative comments about it on HN (alluded to in my previous comment - everything from telling me the site looked embarrassing, didn't deserve to be on HN, saying the 600ms load time was too slow, etc, etc, etc), so I mostly withdrew from posting more about it, though I do think that the strategy of a robust e2e suite is a really good idea that can really drive AI productivity.

Yes, that e2e suite is a must for long term support and it probably would be a good idea to always create something like that up front before you even start work on the actual application.

I think that it pays off to revisit the history of the compiler. Initially compilers were positioned as a way for managers to side step the programmers, because the programmers have too much power and are hard to manage.

Writing assembly language by hand is tedious and it requires a certain mindset and the people that did this (at that time programming was still seen as an 'inferior' kind of job) were doing the best they could with very limited tools.

Enter the compiler, now everything would change. Until the mid 1980s many programmers could, when given enough time, take the output of a compiler, scan it for low hanging fruit and produce hybrids where 'inner loops' were taken and hand optimized until they made optimal use of the machine. This gave you 98% of the performance of a completely hand crafted solution, isolated the 'nasty bits' to a small section of the code and was much more manageable over the longer term.

Then, ca. 1995 or so the gap between the best compilers and the best humans started to widen, and the only areas where the humans still held the edge was in the most intricate close-to-the-metal software in for instance computer games and some extremely performant math code (FFTs for instance).

A multitude of different hardware architectures, processor variations and other dimensions made consistently maintaining an edge harder and today all but a handful of people program in high level languages, even on embedded platforms where space and cycles are still at a premium.

Enter LLMs

The whole thing seems to repeat: there are some programmers that are - quite possibly rightly so - holding on to the past. I'm probably guilty of that myself to some extent, I like programming and the idea that some two bit chunk of silicon is going to show me how it is done offends me. At the same time I'm aware of the past and have already gone through the assembly-to-high-level track and I see this as just more of the same.

Another, similar effect was seen around the introduction of the GUI.

Initially the 'low hanging fruit' of programming will fall to any new technology we introduce, boilerplate, CRUD and so on. And over time I would expect these tools to improve to the point where all aspects of computer programming are touched by them and where they either meet or exceed the output of the best of the humans. I believe we are not there yet but the pace is very high and it could easily be that within a short few years we will be in an entirely different relationship with computers than up to today.

Finally, I think we really need to see some kind of frank discussion about compensation of the code ingested by the model providers, there is something very basic that is wrong about taking the work of hundreds of thousands of programmers and then running it through a copyright laundromat at anything other than a 'cost+' model. The valuations of these companies are ridiculous and are a direct reflection of how much code they took from others.

Why so every armchair reviewer can yell, "Slop!"?

Guidelines meditation for you.