343 points by surprisetalk 5 days ago | 36 comments

Surprised how little comment this post has, this is an insane improvement.

I've been using Electric SQL but Automerge 3.0 seems to be the holy grail combining local first approach to CRDT?

Wondering if I should ditch Electric SQL and switch to this instead. I'm just not sure what kind of hardware I need to run a sync server for Automerge and how many users reads/writes it can support.

ElectricSQL is pretty good too but its still not quite there and implementing local first means some features related to rollback are harder to apply.

I'm still very new to this overall but that 10x memory boost is welcome as I find with very large documents the lag used to be very noticeable.

It really depends on your use case. If you want people collaborating on a rich text document, Automerge or yjs are probably great.

If you want to have local first application data where a server is the authority, ElectricSQL is probably going to serve you best.

That said there are so many approaches out there right now, and they're all promising, but tricky.

  > there are so many approaches out there right now
I'm almost to the point where I'll need one of these solutions. I'm fleshing out the corner cases now. I'd appreciate if you mention some of the solutions I should be looking at, and the trade offs. I'd also appreciate if you could mention non-obvious pitfalls.

The use case is a voice note aggregation system, the notes are stored on S3 and cached locally to desktops and mobile applications. There are transcriptions, AI summaries, user annotations, and structured metadata associated with each voice note. The application will be used by a single human, but he might not always remember to sync or even have an internet connection when he wants to.

Thank you!

If you're building your app for yourself, you likely don't need CRDTs at all.

I don't know much about automerge or other local-first solutions, but a local-first solution that doesn't deal with CRDTs is likely a much better fit for you.

Thank you. I meant that every user will only be interacting with his own files. But yes, they're already are and will be additional users with their own files.

The performance improvements are impressive:

> In Automerge 3.0, we've rearchitected the library so that it also uses the compressed representation at runtime. This has achieved huge memory savings. For example, pasting Moby Dick into an Automerge 2 document consumes 700Mb of memory, in Automerge 3 it only consumes 1.3Mb!

> Finally, for documents with large histories load times can be much much faster (we recently had an example of a document which hadn't loaded after 17 hours loading in 9 seconds!).

I wonder if this is accomplished using controlled buffers in AsyncIterators. I recently built a tool for processing massive CSV files and was able to get the memory usage remarkably low, and control/scale it almost linearly because of how the workers (async iterators) are spawned and their workloads are managed. It kind of blew me away that I could get such fine-tuned control that I'd normally expect from Go or Rust (I'm using Deno for this project).

I'm well above 1.3mb, and although I could get it down there, performance would suffer. I'm curious how fast they sync this data with such tiny memory usage. If the resources were available before, despite using 700mb of memory, was it still faster?

These people are definitely smarter than I am so maybe their solution is a lot more clever than what I'm doing

edit: Oh, they did this part with Rust. I thought it was written in JS. I still wonder: how'd they get memory usage this low, and did it impact speed much? I'll have to dig into it

> I recently built a tool for processing massive CSV files and was able to get the memory usage remarkably low

is it OSS? i'd like to benchmark it against my csv parser :)

No, it's very specific to some watershed sensing data that comes from a bunch of devices strewn about the coast of British Columbia. I'd love to make it (and most of the work I do) OSS if only to share with other scientific groups doing similar work.

Your parser is almost certainly better and faster :) Mine is tailored to a certain schema with specific expectations about foreign keys (well, the concept and artificial enforcement of them) across the documents. This is actually why I've been thinking about using duckdb for this project; it'll allow me to pack the data into the db under multiple schemas with real keys and some primitive type-level constraints. Analysis after that would be sooo much cleaner and faster.

The parsing itself is done with the streams API and orchestrated by a state chart (XState), and while the memory management and concurrency of the whole system is really nice and I'm happy with it, I'm probably making tons of mistakes and trading program efficiency for developer comforts here and there.

The state chart essentially does some grouping operations to pull event data from multiple CSVs, then once it has those events, it stitches them together into smaller portions and ensures each table maps to each other one by the event's ID. It's nice because grouping occurs from one enormous file, and it carves out these groups for the state chart to then organize, validate, and store in parallel. You can configure how much it'll do in parallel, but only because we've got some funny practices here and it's a safety precaution to prevent tying up too many resources on a massive kitchen-sink server on AWS. Haha. So, lots of non-parsing-specific design considerations are baked in.

One day I'll shift this off the giga-server and let it run in isolation with whatever resources it needs, but for now it's baby steps and compromises.

thanks!

They say: "In Automerge 3.0, we've rearchitected the library so that it also uses the compressed representation at runtime. This has achieved huge memory savings."

Right, this didn't click at first but now I understand. I can actually gain similar benefits with my project by switching to storing the data as parquet/duckdb files; I had no idea the potential gains from compressed representations are so significant, so I'd been holding off on testing that out. Thanks for the nudge on that detail!

Probably because i still don't understand what this thing exactly does (and i'm not doing tech since yesterday)

High upvote/comment ratio is a sign of a quality post, honestly. Sometimes all you can do is upvote.

Related. Others?

Show HN: Pg_CRDT – CRDTs in Postgres Using Automerge - https://news.ycombinator.com/item?id=43655920 - April 2025 (4 comments)

Automerge: A library of data structures for building collaborative applications - https://news.ycombinator.com/item?id=40976731 - July 2024 (58 comments)

Automerge-Repo: A "batteries-included" toolkit for local-first applications - https://news.ycombinator.com/item?id=38193640 - Nov 2023 (43 comments)

Automerge 2.0 - https://news.ycombinator.com/item?id=34586433 - Jan 2023 (89 comments)

Automerge CRDT – Build local-first software - https://news.ycombinator.com/item?id=30881016 - April 2022 (8 comments)

Automerge: A JSON-like data structure (a CRDT) that can be modified concurrently - https://news.ycombinator.com/item?id=30412550 - Feb 2022 (69 comments)

Automerge: a new foundation for collaboration software [video] - https://news.ycombinator.com/item?id=29501465 - Dec 2021 (29 comments)

Automerge: A library [..] for building collaborative applications in JavaScript - https://news.ycombinator.com/item?id=24791713 - Oct 2020 (1 comment)

Automerge: JSON-like data structure for building collaborative apps - https://news.ycombinator.com/item?id=16309533 - Feb 2018 (98 comments)

I have a question about Automerge that maybe someone here can answer. I have a lot of code written for some custom CRDTs I've made. (This handles syncing them with my server and between devices and so on.) My data model is that each device gets a unique ID, and then can share "events" which must be sequential for a given ID. The events from all the different devices are then collected and replayed. I'm curious if it would be possible to fit automerge to this framework? All I would need is an `apply` function that takes an event and a document and produces a new document. (I assume I would miss out on the super-efficient compressed representation described in this article, but I'm curious)

Replaying immutable events in a deterministic order doesn't fit so well with Automerge; Automerge is more designed for apps where you can represent the mutable state of your application as an Automerge doc. https://livestore.dev/ might be a better fit for you.

I’m also interested in this. I have a similar use case, to implement “cross device sync” functionality for a local-only webapp. I tried out automerge but it felt like it’s meant for syncing data when multiple users collaborate, and not data sync for a single user who is expected to use only one device at a time (I could be wrong about this).

I have implemented a POC sync mechanism via central server and I believe it’s simpler as it takes advantage of certain assumptions about the app. I’ve yet to productionize it so I am interested in knowing if my understanding is correct or if there are other existing solutions for this use case.

I am in somewhat of the same boat for https://parture.org. Have a quite large CRDT system with unique ID's that is also type-safe, does not rely on serde_json::Value juggling, every CRDT is structurally valid and it knows what CRDT's cannot be applied to a Rust struct based on some business logic. I am wondering whether such checks (type-safety, business logic) can be worked into the CRDT application process. Automerge seems mostly meant for text editing, but they do have Autosurgeon though it hasn't been updated in a while

A few questions:

1. I can see there's an example of using it with React and Prosemirror, what's the gap to using it with Tiptap (for those who don't know, it's an abstraction on top of Prosemirror that aims to streamline the task of building editors)?

2. Is there any prior art or room in the design for supporting permissioned blocks of content _within_ a document? i.e things which some users aren't allowed to view (or edit)

1. You can use TipTap with it: just have to wrap your existing schema with automerge attributes. Undo redo would also swap out.

Is there info anywhere on the structure of the semi-lattice they are using for their CRDT?

Is the map based on a multi-value register or a last-writer-wins register?

See the docs: https://automerge.org/docs/reference/documents/conflicts/

Thank you.

From the doc

> Automerge uses a combination of LWW (last writer wins) and multi-value register. By default, if you read from doc.foo you will get the LWW semantics, but you can also see the conflicts by calling Automerge.getConflicts(doc, 'foo') which has multi-value semantics.

> Note that "last writer wins" here is based on the internal ID of the opeartion [sic], not a wall clock time. The internal ID is a unique operation ID that is the combination of a counter and the actorId that generated it. Conflicts are ordered based on the counter first (using the actorId only to break ties when operations have the same counter value).

Seems like they use LWW with Lamport clocks to order operations and a unique ID for each client as a tie-breaker.

What sort of applications is this used for? I'm a technical writer, and my team is facing versioning challenges for sections of documents. I'm wondering if this could be useful.

can you elaborate on what versioning issues you are facing?

Is this Javascript only?

It's written in Rust, but JavaScript is the primary friendly interface. https://github.com/automerge/automerge

There is also a C api wrapper, not sure the state of it wrt this latest release.

Needs benchmarks with yjs

If you are after performance see jsonjoy.

The new automerge is apparently much faster than it was before. (I haven't run benchmarks though, just been told that by the core developers.)

I'd love some performance benchmarks.

a number of these sync engines have been growing popular, most notably convex and zero (altho both of course are very different from automerge)--this one's rust/c api makes it more interesting, i wonder if an implementation for terminals uis could be possible?

are move operations for trees implemented now?

IIRC, Kleppmann built a prototype for it but it’s not included in Automerge yet.

[dead]