Why I built Skir: https://medium.com/@gepheum/i-spent-15-years-with-protobuf-t...
Quick start: npx skir init
All the config lives in one YML file.
Website: https://skir.build
GitHub: https://github.com/gepheum/skir
Would love feedback especially from teams running mixed-language stacks.
This seems a Chesterton's fence fail.
protobuf solved serialization with schema evolution back/forward compatibility.
Skir seems to have great devex for the codegen part, but that's the least interesting aspect of protobufs. I don't see how the serialization this proposes fixes it without the numerical tagging equivalent.
Did you look at other formats like Avro, Ion etc? Some feedback:
1. Dense json
Interesting idea. You can also just keep the compact binary if you just tag each payload with a schema id (see Avro). This also allows a generic reader to decode any binary format by reading the schema and then interpreting the binary payload, which is really useful. A secondary benefit is you never ever misinterpret a payload. I have seen bugs with protobufs misinterpreted since there is no connection handshake and interpretation is akin to 'cast'.
2. Compatibility checks
+100 there's not reason to allow breaking changes by default
3. Adding fields to a type: should you have to update all call sites?
I'm not so sure this is the right default. If I add a field to a core type used by 10 services, this requires rebuilding and deploying all of them.
4. enum looks great. what about backcompat when adding new enum fields? or sometimes when you need to 'upgrade' an atomic to an enum?
Thanks for the feedback.
0. Yes, I looked at Avro, Ion. I like Protobuf much better because I think using field numbers for field identity, meaning being able to rename fields freely, is a must.
1. Yes. Skir also supports that with binary format (you can serialize and deserialize a Skir schema to JSON, which then allows you to convert from binary format to readable JSON). It just requires to build many layers of extra tooling which can be painful. For example, if you store your data in some SQL engine X, you won't be able to quickly visualize your data with a simple SELECT statement, you need to build the tooling which will allow you to visualize the data. Now dense JSON is obviously not idea for this use case, because you don't see the field names, but for quick debugging I find it's "good enough".
3. I agree there are definitely cases where it can be painful, but I think the cases where it actually is helpful are more numerous. One thing worth noting is that you can "opt-out" of this feature by using `ClassName.partial(...)` instead of `ClassName()` at construction time. See for example `User.partial(...)` here: https://skir.build/docs/python#frozen-structs I mostly added this feature for unit tests, where you want to easily create some objects with only some fields set and not be bothered if new fields are added to the schema.
4. Good question. I guess you mean "forward compatibility": you add a new field to the enum, not all binaries are deployed at the same time, and some old binary encounters the new enum it doesn't know about? I do like Protobuf does: I default to the UNKNOWN enum. More on this: - https://skir.build/docs/schema-evolution#adding-variants-to-... - https://skir.build/docs/schema-evolution#default-behavior-dr... - https://skir.build/docs/protobuf#implicit-unknown-variant
> meaning being able to rename fields freely, is a must.
avro supports field renames though.
3. on second thought i believe you'd only have to deploy when you choose. the next build will force you to provide values (or opt into the default). so forcing inspection of construction sites seems good.
> Skir is a universal language for representing data types, constants, and RPC interfaces. Define your schema once in a .skir file and generate idiomatic, type-safe code in TypeScript, Python, Java, C++, and more.
Maybe I'm missing some additional features but that's exactly what https://buf.build/plugins/typescript does for Protobuf already, with the advantage that you can just keep Protobuf and all the battle hardened tooling that comes with it.
Buf plus Protobuf already give you multi-language codegen, a compact tag-based binary format with varint encoding, gRPC service generation, and practical tools like protoc, descriptor sets, ts-proto, and buf's breaking-change checks.
If Skir wants to be more than prettier syntax it needs concrete wins, including well-specified schema evolution rules that map cleanly to the wire, clear prescriptions for numeric tag management and reserved ranges, first-class reflection and descriptor compatibility, a migration checker, and a canonical deterministic encoding for signing and deduplication. Otherwise you get another neat demo format that becomes a painful migration when ops and clients disagree on tag semantics.
The entire original post, it seems, is dedicated to explaining why Skir is better than plain Protobuf, with examples of all the well-known pain points. If these are not persuasive for you, staying with Protobuf (or just JSON) should be a fine choice.
[flagged]
If you are fine enough with protobufs so that you're not actively looking for alternatives, maybe you should not spend the effort.
+1
Copying from blog post [https://medium.com/@gepheum/i-spent-15-years-with-protobuf-t...]:
""" Should you switch from Protobuf?
Protobuf is battle-tested and excellent. If your team already runs on Protobuf and has large amounts of persisted protobuf data in databases or on disk, a full migration is often a major effort: you have to migrate both application code and stored data safely. In many cases, that cost is not worth it.
For new projects, though, the choice is open. That is where Skir can offer a meaningful long-term advantage on developer experience, schema evolution guardrails, and day-to-day ergonomics. """
Skir has exactly the same goals as Protobuf, so yes, that sentence can apply to Protobuf as well (and buf.build). I listed some of the reasons to prefer Skir over Protobuf in my humble opinion here: https://medium.com/@gepheum/i-spent-15-years-with-protobuf-t... Built-in compatibility checks, the fact that you can import dependencies from other projects (buf.build offers this, but makes you pay for it), some language designs (around enum/oneof, the fact that adding fields forces you to update all constructor code sites), the dense JSON format, are examples.
https://capnproto.org/ has been my goto since forever. Made by the protobuf inventor
*Made by the proto2 implementor, Kenton Varda
https://news.ycombinator.com/user?id=kentonv
I've been dabbling with the newer Cap'n Web, whose nicely descriptive README's first line says:
> Cap'n Web is a spiritual sibling to Cap'n Proto (and is created by the same author), but designed to play nice in the web stack.
It's just JSON, which has up and down sides. But things like promise pipelining are such a huge upside versus everything else: you can refer to results (and maybe send them around?) and kick off new work based on those results, before you even get the result back.
This is far far far superior to everything else, totally different ball-game.
I've been a little rebuffed by wasm when I try, keep getting too close to some gravitational event horizon & get sucked in & give up, but for more data-throughput oriented systems, I'm still hoping wrpc ends up being a fantastic pick. https://github.com/bytecodealliance/wrpc . Also Apache Arrow Flight, which I know less about, has mad traction in serious data-throughput systems, which being adjacent to amazingly popular Apache Arrow makes sense. https://arrow.apache.org/docs/format/Flight.html
> For optional types, 0 is decoded as the default value of the underlying type (e.g. string? decodes 0 as "", not null).
In the "dense JSON" format, isn't representing removed/absent struct fields with `0` and not `null` backwards incompatible?
If you remove or are unaware of a `int32?` field, old consumers will suddenly think the value is present as a "default" value rather than absent
That is correct and that is a good catch, the idea though is that when you remove a field you typically do that after having made sure that all code no longer read from the removed field and that all binaries have been deployed.
How does this work if, for example, you persist the data in a database?
I found that our work is somewhat similar. However, I mainly focus on HTTP and JSONRPC: https://news.ycombinator.com/item?id=47306983
Unfortunately, I really like postfix types, but IDL itself doesn't support them.
This vs JTD?
That “compact JSON” format reminds me if the special protobufs JSON format that Google uses in their APIs that has very little public documentation. Does anyone happen to know why Google uses that, and to OP, were you inspired by that format?
I don't know the reason TextFormat was invented, but in practice it's way easier to work with TextFormat than JSON in the context of Protos.
Consider numeric types -
JSON: number aka 64-bit IEEE 754 floating point
Proto: signed and unsigned int 8, 16, 32, 64-bit, float, double
I can only imagine the carnage saved by not accidentally chopping of the top 10 bits (or something similar) of every int64 identifier when it happens to get processed by a perfectly normal, standards compliant JSON processor.
It's true that most int64 fields could be just fine with int54. It's also true that some fields actually use those bits in practice.
Also, the JSPB format references tag numbers rather than field names. It's not really readable. For TextProto it might be a log output, or a config file, or a test, which are all have ways of catching field name discrepancies (or it doesn't matter). For the primary transport layer to the browser, the field name isn't a forward compatible/safe way to reference the schema.
So oddly the engineers complaining about the multiple text formats are also saved from a fair number of bugs by being forced to use tools more suited to their specific situation.
I think you may be referring to JSPB. It's used internally at Google but has little support in the open-source. I know about it, but I wouldn't say I was inspired by it. It's particularly unreadable, because it needs to account for field numbers being possible sparse. Google built it for frontend-backend communication, when both the frontend and the backend use Protobuf dataclasses, as it's more efficient than sending a large JSON object and also it's faster to deserialize than deserializing a binary string on the browser side. I think it's mostly deprecated nowadays.
I don't know but if I had to guess.
1. Google uses protobufs everywhere, so having something that behaves equivalently is very valuable. For example in protobuf renaming fields is safe, so if they used field names in the JSON it would be protobuf incompatible.
2. It is usually more efficient because you don't send field names. (Unless the struct is very sparse it is probably smaller on the wire, serialized JS usage may be harder to evaluate since JS engines are probably more optimized for structs than heterogeneous arrays).
3. Presumably the ability to use the native JSON parsing is beneficial over a binary parser in many cases (smaller code size and probably faster until the code gets very hot and JITed).
Also, flexbuffers.
Apart from the comparison with Protobuf, how does it compare to flatbuffers, capnproto, messagepack, jsonbinpack... ?
Impressive. Some interop with established standards such as OpenAPI or gRPC would make it an easier sell for non-greenfield projects
Thanks. Definitely agree, will try to think about what that could look like.
Looks nice. But what are the use cases of this? I'm still trying to figure that out.
Thanks! Main use case (similarly to Protobuf) is when you need to exchange data types between systems written in different languages. Like Protobuf, it can also be used in a mono-linguistic system, when you want to serialize systems and have strong guarantees that you will be able to deserialize your data in the future (when you use classic serialization libraries like Pydantic, Java Serialization etc., it's easy to accidentally modify a schema and break the ability to deserialize old data.)
Like this but zero copy, easy migration/versioning, Rust and WASM support.
Notably missing both Go and Rust
Absolutely, I am planning to add these 2 as well as C# by June. Working on it now.
Do you have a newsletter or how should we know when C# will be available?
How does this compare with https://connectrpc.com/ as that project seems to share similar goals
If I may suggest, Swift support will be more than appreciated, to consider it for a viable protocol for connecting backend with mobile applications.
Yeah totally fair. I targeted Dart because of Flutter, but I think I will include Swift in the next wave of languages, after Rust, Go and C#.
I spent some time in the actual compiler source. There's real work here, genuinely good ideas.
The best thing Skir does is strict generated constructors. You add a field, every construction site lights up. Protobuf's "silently default everything" model has caused mass production incidents at real companies. This is a legitimately better default.
Dense JSON is interesting but the docs gloss over the tradeoff: your serialized data is [3, 4, "P"]. If you ever lose your schema, or a human needs to read a payload in a log, you're staring at unlabeled arrays. Protobuf binary has the same problem but nobody markets binary as "easy to inspect with standard tools." The "serialize now, deserialize in 100 years" claim has a real asterisk. Compatibility checking requires you to opt into stable record IDs and maintain snapshots. If you skip that (and the docs' own examples often do), the CLI literally warns you: "breaking changes cannot be detected." So it's less "built-in safety" and more "safety available if you follow the discipline." Which is... also what Protobuf offers.
The Rust-style enum unification is genuinely cleaner than Protobuf's enum/oneof split. No notes there, that's just better language design.
Minor thing that bothered me disproportionately: the constant syntax in the docs (x = 600) doesn't match what the parser actually accepts (x: 600).
The weirdest thing that bugged the heck out of me was the tagline, "like protos but better", that's doing the project no favors.
I think this would land better if it were positioned as "Protobuf, but fresh" rather than "Protobuf, but better." The interesting conversation is which opinions are right, not whether one tool is universally superior.
Quite frankly, I don't use protobuf because it seems like an unapproachable monolith, and I'm not at FAANG anymore, just a solo dev. No one's gonna complain if I don't. But I do love the idea of something simpler thats easy to wrap my mind around.
That's why "but fresh" hits nice to me, and I have a feeling it might be more appealing than you'd think - ex. it's hard to believe a 2 month old project is strictly better than whatever mess and history protobufs gone through with tons of engineers paid to use and work on it. It is easy to believe it covers 99% of what Protobuf does already, and any crazy edge cases that pop up (they always do, eventually :), will be easy to understand and fix.
Thank you so much for taking the time to dig into the compile source code and the thorough comment you left.
For dense JSON: the idea is that it is often a good "default" choice because it offers a good tradeoff across 3 properties: efficiency (where it's between binary and readable JSON), persistability (safe to evolve shema without losing backward compatibility), and readability (it's low for the reasons you mentioned, but it's not as bad as a binary string). I tried to explain this tradeoff in this table: https://skir.build/docs/serialization#serialization-formats
I hear your point about the tagline "like protos but better" which I hesitated to put because it sounds presumptuous. But I am not quite sure what idea you mean to convey by "fresh"?
Not the parent but I infer “fresh” as meaning a new approach to an old problem (with the benefits of experience baked in). A synonym of “modern” without the baggage.
Fair. I changed the tagline on the website to "A modern alternative to Protocol Buffer". Thanks for the feedback.
100%, danke
Cheers :) (other replier was right on "fresh", "fresh" definitely wasn't right)
Also, thank you for flagging the constant syntax problem (x = 600) on the website. Fixed.
> Minor thing that bothered me disproportionately: the constant syntax in the docs (x = 600) doesn't match what the parser actually accepts (x: 600).
You’re a better man than me. If the docs can’t even get the syntax right, that’s a hard no from me.
Also, fwiw, you’ve got a few points wrong about protos. Inspecting the binary data is hard, but the tag numbers are present. You need the schema, but at least you can identify each element.
Also, I disagree on the constructor front. Proto forces you to grapple with the reality that a field may be missing. In a production system, when adding a new field, there will be a point where that field isn’t present on only one side of the network call. The compiler isn’t saving you.
Fresh is more honest than better, and personally, I wouldn’t change it.
> Also, I disagree on the constructor front. Proto forces you to grapple with the reality that a field may be missing. In a production system, when adding a new field, there will be a point where that field isn’t present on only one side of the network call. The compiler isn’t saving you.
I agree it's important for users to understand that newer fields won't be set when they deserialize old data -- whether that's with Protobuf or Skir. I disagree with the idea that not forcing you to update all constructor call sites when you add a field will help (significantly) with that. Are you saying that because Protobuf forces you to manually search for all call sites when you add a field, it forces you to think about what happens if the field is not set at deserialization, hence, it's a good thing? I'm not sure that outweighs the cost of bugs introduced by cases where you forget to update a constructor call site when you add a field to your schema.
Respectfully, I’ve never forgotten a call site, but also yes. In a hypothetical HelloWorld service, the HelloRequest and HelloResponse generally aren’t used anywhere except a rpc caller and rpc handler, so it’s not hard to “remember” and find the usage.
Some callers may not need to update right away, or don’t need the new feature at all, and breaking the existing callers compilation is bad. If your caller is a different team, for example, and their CICD breaks because you added a field, that’s bad. Each place it’s used, you should think about how it’ll be handled, BUT ALSO, your system explicitly should gracefully handle the case where it’s not uniformly present. It’s an explicit goal of protos to support the use case where heterogeneous schema versions are used over the wire.
If a bug is introduced because the caller and handler use different versions, the compiler wasn’t going to save you anyways. That bug would have shown up when you deploy or update the client and server anyways - unless you atomically update both at once. You generally cannot guarantee that a client won’t use an outdated version of the schema, and if things break because of that, you didn’t guard it correctly. That’s a business logic failure not a compilation failure.
I would recommend exploring OpenRPC for those who have not yet seen it. It brings protocol-buffer-like definitions (components), RPC definitions and centralised error definitions.
Obligatory dense field numbers seems like a massive downside, the problems of which would become evident after a busy repo has been open for a few days.
It's not obligatory. Basically Protobuf gives you a choice between (1) binary format, (2) readable JSON. Skir gives you a choice between (1) binary format, (2) readable JSON, (3) dense JSON. It recommends dense JSON as the "default choice", but it does not force it. The reason why it's recommended as the default choice is because it offers a good tradeoff between efficiency (only a bit less compact than binary), backward compatibility (you can rename fields safely unlike readable JSON) and debuggability (although it's definiely not as good as readable JSON, because you lose the field numbers, it's decent and much better than binary format)
Looks really like Prisma to me: https://www.prisma.io/docs/orm/prisma-schema/overview#exampl...
Why build another language instead of extending an existing one?