The constraints enforced in the language still matter. A language which offers certain correctness guarantees may still be the most efficient way to build a particular piece of software even when it's a machine writing the code.
There may actually be more value in creating specialized languages now, not less. Most new languages historically go nowhere because convincing human programmers to spend the time it would take to learn them is difficult, but every AI coding bot will learn your new language as a matter of course after its next update includes the contents of your website.
> every AI coding bot will learn your new language
If there are millions of lines on github in your language.
Otherwise the 'teaching AI to write your language' part will occupy so much context and make it far less efficient that just using typescript.
I have not found this to be the case. My company has some proprietary DSLs we use and we can provide the spec of the language with examples and it manages to pick up on it and use it in a very idiomatic manner. The total context needed is 41k tokens. That's not trivial but it's also not that much, especially with ChatGPT Codex and Gemini now providing context lengths of 1 million tokens. Claude Code is very likely to soon offer 1 million tokens as well and by this time next year I wouldn't be surprised if we reach context windows 2-4x that amount.
The vast majority of tokens are not used for documentation or reference material but rather are for reasoning/thinking. Unless you somehow design a programming language that is just so drastically different than anything that currently exists, you can safely bet that LLMs will pick them up with relative ease.
> Claude Code is very likely to soon offer 1 million tokens as well
You can do it today if you are willing to pay (API or on top of your subscription) [0]
> The 1M context window is currently in beta. Features, pricing, and availability may change.
> Extended context is available for:
> API and pay-as-you-go users: full access to 1M context
> Pro, Max, Teams, and Enterprise subscribers: available with extra usage enabled
> Selecting a 1M model does not immediately change billing. Your session uses standard rates until it exceeds 200K tokens of context. Beyond 200K tokens, requests are charged at long-context pricing with dedicated rate limits. For subscribers, tokens beyond 200K are billed as extra usage rather than through the subscription.
[0] https://code.claude.com/docs/en/model-config#extended-contex...
That’s not true. I’m working on a language and LLMs have no problems writing code in it even if there exists ~200 lines of code in the language and all of them are in my repo.
Uh not really. I am already having Claude read and then one-shot proprietary ERP code written in vintage closed source language OOP oriented BASIC with sparse documentation.... just needed to feed it in the millions of lines of code i have and it works.
I'm sure claude does great at that, but it would be objectively better, for a large variety of reasons, if claude didn't have to keep syntax examples in it's context.
"i haven't been able to find much" != "there isn't much on the entire internet fed into them"
> but every AI coding bot will learn your new language as a matter of course after its next update includes the contents of your website.
That's assuming that your new, very unknown language gets slurped up in the next training session which seems unlikely. Couldn't you use RAG or have an LLM read the docs for your language?
Agreed - unpopular languages and packages have pretty shaky outcomes with code generation, even ones that have been around since before 2023.
Neither RAG nor loading the docs into the context window would produce any effective results. Not even including the grammar files and just few examples in the training set would help. To get any usable results you still need many many usage examples.
My own 100% hallucinated language experiment is very very weird and still has thousands of lines of generated examples that work fine. When doing complex stuff you could see the agent bounce against the tests here and there, but never produced non-working code in the end. The only examples available were those it had generated itself as it made up the language. It was capable of making things like a JSON parser/encoder, a TODO webapp or a command line kanban tracker for itself in one shot.
And yet it works well enough, regardless. I have a little project which defines a new DSL. The only documentation or examples which exist for this little language, anywhere in the world, are on my laptop. There is certainly nothing in any AI's training data about it. And yet: codex has no trouble reading my repo, understanding how my DSL works, and generating code written in this novel language.
In addition, I think token efficiency will continue to be a problem. So you could imagine very terse programming languages that are roughly readable for a human, but optimized to be read by LLMs.
That's an interesting idea. But IMO the real 'token saver' isn't in the language keywords but it's in the naming of things like variables, classes, etc.
There are languages that are already pretty sparse with keywords. e.g in Go you can write 'func main() string', no need to define that it's public, or static etc. So combining a less verbose language with 'codegolfing' the variables might be enough.
I'm not an expert in LLMs, but I don't think character length matters. Text is deterministically tokenized into byte sequences before being fed as context to the LLM, so in theory `mySuperLongVariableName` uses the same number of tokens as `a`. Happy to be corrected here.
Go is one of the most verbose mainstream programming languages, so that's a pretty terrible example.
To you maybe, but Go is running a large amount of internet infrastructure today.
How does that relate to Go being a verbose language?
Its not verbose to some of us. It is explicit in what it does, meaning I don't have to wonder if there's syntatic sugar hiding intent. Drastically more minimal than equivalent code in other languages.
Verbosity is an objective metric.
Code readability is another, correlating one, but this is more subjective. To me go scores pretty low here - code flow would be readable were it not for the huge amount of noise you get from error "handling" (it is mostly just syntactic ceremony, often failing to properly handle the error case, and people are desensitized to these blocks so code review are more likely to miss these).
For function signatures, they made it terser - in my subjective opinion - at the expense of readability. There were two very mainstream schools of thought with relation to type signature syntax, `type ident` and `ident : type`. Go opted for a third one that is unfamiliar to both bases, while not even having the benefits of the second syntax (e.g. easy type syntax, subjective but that : helps the eye "pattern match" these expressions).
Every time I hear complaints about error handling, I wonder if people have next to no try catch blocks or if they just do magic to hide that detail away in other languages? Because I still have to do error handling in other languages roughly the same? Am I missing something?
Maybe not a perfect example but it’s more lightweight than Java at least haha
If by lightweight you mean verbosity, then absolutely no.
In go every third line is a noisy if err check.
Well LLMs are made to be extremely verbose so it's a good match!
I think there's a huge range here - ChatGPT to me seems extra verbose on the web version, but when running with Codex it seems extra terse.
Claude seems more consistently _concise_ to me, both in web and cli versions. But who knows, after 12 months of stuff it could be me who is hallucinating...
I think I remember seeing research right here on HN that terse languages don't actually help all that much
I would be very interested in this research... I'm trying to write a language that is simple and concise like Python, but fast and statically typed. My gutfeeling is that more concise than Python (J, K, or some code golfing language) is bad for readability, but so is the verbosity of Rust, Zig, Java.
Those constraints can be enforced by a library too. Even humans sometimes make a whole new language for something that can be a function library. If you want strong correctness guarantees, check the structure of the library calls.
Programming languages function in large parts as inductive biases for humans. They expose certain domain symmetries and guide the programmer towards certain patterns. They do the same for LLMs, but with current AI tech, unless you're standing up your own RL pipeline, you're not going to be able to get it to grok your new language as well as an existing one. Your chances are better asking it to understand a library.
> every AI coding bot will learn your new language as a matter of course after its next update includes the contents of your website.
How will it "learn" anything if the only available training data is on a single website?
LLMs struggle with following instructions when their training set is massive. The idea that they will be able to produce working software from just a language spec and a few examples is delusional. It's a fundamental misunderstanding of how these tools work. They don't understand anything. They generate patterns based on probabilities and fine tuning. Without massive amounts of data to skew the output towards a potentially correct result they're not much more useful than a lookup table.
It's wild to me the disconnect between people who actually use these tools every day and people who don't.
I have done exactly the above with great success. I work with a weird proprietary esolang sometimes that I like, and the only documentation - or code - that exists for it is on my computer. I load that documentation in, and it works just fine and writes pretty decent code in my esolang.
"But that can't possibly work [based on my misunderstanding of how LLMs work]!" you say.
Well, it does, so clearly you misunderstand how they work.
The reason it works so well is that everyone’s “personal unique language” really isn’t all that different from what’s been proposed before, and any semantic differences are probably not novel. If you make your language C + transactional memory, the LLM probably has enough information about both to reason about your code without having to be trained on a billion lines.
Probably if you’re trying to be esoteric and arcane then yeah, you might have trouble, but that’s not normally how languages evolve.
No, mine's a esoteric declarative data description/transform language. It's pretty damn weird.
You may underestimate the weirdness of existing declarative data transformation languages. On a scale of 1 to 10, XSLT is about a 2 or 3.
Mine's a weird, bad copy of Ab Initio's DML. https://www.google.com/search?q=ab+initio+dml+language
My comment is based precisely on using these tools frequently, if not daily, so what's wild is you assuming I don't.
The impact that lack of training data has on the quality of the results is easily observable. Try getting them to maintain a Python codebase vs. e.g. an Elixir one. Not just generate short snippets of code, but actually assist in maintaining it. You'll constantly run into basic issues like invalid syntax, missing references, use of nonexistent APIs, etc., not to mention more functional problems like dead, useless, or unnecessarily complicated code. I run into these things with mainstream languages (Go, Python, Clojure), so I don't see how an esolang could possibly fair any better.
But then again, the definitions of "just fine" and "decent" are subjective, and these tools are inherently unreliable, which is where I suspect the large disconnect in our experiences comes from.
They don't understand anything, but they sure can repeat a pattern.
I'm using Claude Code to work on something involving a declarative UI DSL that wraps a very imperative API. Its first pass at adding a new component required imperative management of that component's state. Without that implementation in context, I told Claude the imperative pattern "sucks" and asked for an improvement just to see how far that would get me.
A human developer familiar with the codebase would easily understand the problem and add some basic state management to the DSL's support for that component. I won't pretend Claude understood, but it matched the pattern and generated the result I wanted.
This does suggest to me that a language spec and a handful of samples is enough to get it to produce useful results.