Yaml is my #1 failure in devops. That so many have resigned themselves to this limit and no longer seek to improve, it's disappointing. Our job is to make things run better and easier, yet so many won't recognize the biggest pains in their own work. Seriously, is text templating an invisibly scoped language really where you think the field has reached maturity?
Write it in a higher level language and generate the YAML from that. See the YAML as a wire protocol, not something you author things in.
exactly, why interop with everything that exists today is important
however, you don't want config being turing complete, that creates a host of other problems at a layer you don't want them
I know what you mean, but there seems to be some kind of misplaced fear about this which has led us down the garden path of unmaintainable config (or even trying to jinja template it!)
If your config is turing complete and consumed as-is, then without a lot of discipline you can dig yourself into a hole, sure.
If you're producing YAML that is not turing complete, that constraint means you have to code in a way that produces deterministic output. It's actually very safe, and YAML maps 1:1 to types in something like Python.
My favourite go-to example is for AWS Cloudformation:
https://github.com/cloudtools/troposphere
Yaml has it's place and it is great for describing what your single microservice needs.
YAML is okay for writing structured prose for humans. It’s terrible for anything consumed by programs because even that single microservice has a high likelihood of some problem caused by YAML’s magic typing, silent data loss due to indentation, etc. unless you pair it with a separate validation tool chain, making the argument for simplicity increasingly dubious.
Validation is required, yes.
Sure, so at that point how much are we really saving versus using a better alternative? Using YAML correctly is harder because you need not only to do the validation everything needs to do but also doing other things specific to YAML to avoid problems created by YAML rather than the problem domain. For example, if typing less is my goal isn’t it easier to, say, always quote country_name rather than have to run a separate validator which catches the Norway problem?
Why not pick a config language that works with our current config formats, looks like out current config files, and addresses many of the dumb problems that arise only in current config choices?
It doesn't have schemas nor does it scale. It has no valid place because invisibly scoped languages are a terrible idea.
It's certainly insufficient, look at what happened to Helm
I’m with you that it’s terrible, but it very much does have schemas! The vast vast majority of YAML-based big APIs (k8s, helm, compose, and so on) all absolutely do check documents against schemas (not just ad-hoc validation rules) internally.
The real issue is two things: the smaller one is that there’s no single or self-describing schema system (like XML supports); the larger thing is that most YAML schema validations prioritize supporting extremely permissive and complex input documents over being predictable and appropriately restrictive. And that’s a harder problem to fix, because it has more to do with priorities and community conventions.
If people wanted strict schemaful YAML to be the norm, they would have consolidated on one of the many tools that does that by now. The issue is, people don’t want that; they want extremely flexible and open-ended APIs. YAML as currently practiced is conducive to that goal, but it’s the goal that leads to issues, not the choice of (bad, I agree) data language.
No yaml schema will save you when your HelmRelease will arbitrarily merge together your yaml files on top of kustomize on top of whatever else.
In practice schemas are mostly useless in my experience because people bend yaml as if they really really want a programming language instead.
JSON so much easier in my experience and less prone to error
JSON does not have comments, no JSON5 is not the answer either
Think bigger, it's not something you are using today. The next config language should have schemas built in and support for modules/imports so we can do sharing/caring. It should look and feel like config languages and interoperate with all of those that we currently use. It will be a single configuration fabric across the SDLC.
This exists today for you to try, with CUE
I've been cooking up something the last few weeks for those interested, CUE + Dagger
https://github.com/hofstadter-io/hof/tree/_next/examples/env
Like XML? :)
Like Python?
CUE
Why not Python?
Typing is bolted on rather than a native concept, for one.
Why is that a problem?
Because types are important and having them be a native part of the language creates opportunities for error checking, editor completions, and LLM bounding.
Invisible scoping and turning complete
Python is better than bash in ops, been using more Go in this space
Config is another beast and separate languages
I’m not sold that config is a complex enough domain to necessitate another language. What problems is CUE solving when compared to python and why are those problems substantial enough to make it worth learning a new language?
That's exactly the thing -- complexity. Cue bounds complexity, like json, yaml, and toml. But it offers more composeability than any of them.
Given that we now have TOML, JSON, INI, CSV, YAML, etc it seems we are converging on either JSON, YAML or TOML. There is too much inertia behind those three and not much behind CUE right now.
CUE works with all of those languages, so it doesn't matter what the tools or others are using. I can always apply CUE at any point to output their required format as needed.
Keep your legacy config and mess if you want, you're the one missing out
Also, I don't see TOML in the wild enough and the others have been around long enough, I must chuckle and not take seriously these claims about "inertia"
I’m not claiming inertia makes TOML ‘best’, just that it’s clearly not blocked by inertia either. Cargo standardized on TOML years ago, and GitLab Runner has relied on it for a long time. If a format can win in major ecosystems, “people won’t adopt anything new” isn’t the whole story.”
I genuinely despise the identing requirements of YAML.
For comments, I use a _comment field for my custom JSON reading apps
I dislike the idea of _comment because it’s something that is parsed and becomes part of the data structure in memory. Comments should be ignored and not parsed.
When I wrote a custom deployment tool for some lab deployments, my Python based tool used JSON as the config language and comments were parsed I guess but not part of my data structure. They were dropped
yeah, this is what I'm talking about, innovation has stopped and we do dirty hacks like `imports: [...]` in yaml and `_comment` in json
How are people not embarrassed by this complete lack of quality in their work?
I don’t think we need anything formal resembling XML like JSON. It was originally meant for over the wire payloads and people like myself use it for more than that
You're still thinking "good enough". I'm advocating for the "we can do so much better" attitude
The current popular config choices cause a lot of extra work, bugs, and effort. Is improving the status quo not a worthy goal anymore? Are we at a point in history throwing our hands up and saying meh, I deal with this... is basically where people are today? (I'm somewhat a believer of this based on anecdata and vibes)
The uncomfortable reality is that config formats don’t win by being best. They win by being:
1. already installed everywhere,
2. easy to parse in every language,
3. supported by editors/linters/CI tools,
4. stable enough that vendors bet on them.
The config language we write does not have to be the same thing the programs read. Same analogy to compilers and assembly