Hacker News

If it relied on OpenZFS, then I wouldn't be too surprised.

The whole ARC thing for example, sidestepping the general block cache, feels like a major hack resulting from how it was brutally extracted from Solaris at the time...

The way zfs just doesn't "fit" was why I had hope for btrfs... ZFS is still great for a file server, but wouldn't use it on a general purpose machine.

drewg123 3 days ago [ - ]

Solaris had a unified page cache, and ARC existed separately, along side of it there as well.

One huge problem with ZFS is that there is no zero copy due to the ARC wart. Eg, if you're doing sendfile() from a ZFS filesystem, every byte you send is copied into a network buffer. But if you're doing sendfile from a UFS filesystem, the pages are just loaned to the network.

This means that on the Netflix Open Connect CDN, where we serve close to the hardware limits of the system, we simply cannot use ZFS for video data due to ZFS basically doubling the memory bandwidth requirements. Switching from UFS to ZFS would essentially cut the maximum performance of our servers in half.

arghwhat 2 days ago [ - ]

I also imagine you wouldn't benefit from ZFS there either, even if the ARC wasn't there. You have a single application and can presumably accept occssional data loss (just fetch content upstream). Just need to handle bitrot detection, but there's ways to get around that application-side.

Better to just have the filesystem get out of the way and just focus on being good at raw I/O scheduling.

I wonder if FreeBSD is going to get something io_uring-esque. That's one of the more interesting developments in kernel space...

drewg123 2 days ago [ - ]

There are benefits to ZFS for spinning drives. Eg, a metadata-only L2 ARC on an NVME drive, with the data coming from a spinning drive would likely perform better than UFS. This is because with UFS, the head has to move around at times to read metadata, where with ZFS and metadata cached on NAND, it could just read data in an ideal case.

FreeBSD has "fire and forget" behavior in the context of several common hot paths, so the need for io_uring is less urgent. Eg, sendfile is "fire and forget" from the application's perspective. If data is not resident, the network buffers are pre-allocated and staged on the socket buffer. When the io completes, the disk interrupt handler then flips the pre-staged buffers (whose pages now contain valid data) to "ready" and pokes the TCP state machine.

Similarly, FreeBSD has OpenBSD inspired splice which is fire-and-forget once 2 sockets are spliced.

arghwhat a day ago [ - ]

For filesystems where in-memory cache is insufficient, maybe a generic ephemeral inode/dentry cache system - L2ARC without zfsisms - would be useful...

But to be fair, we are approaching the point where spinning rust stops making sense for even the remaining use cases, and so designing new optimizations specifically for it might be a bit silly now.

johannes1234321 3 days ago [ - ]

Even on Solaris the ARC existed. ZFS replaces a lot of systems traditionally not directly related to a Filesystem implementation.

For instance using the `zfs` tool one wouldn't only configure file system properties, but also control NFS exports, which traditionally was done using /etc/exports.

p_l 3 days ago [ - ]

This was done as part of major UI/UX reshaping in Solaris 10 to make sysadmin lives easier, what it ultimately does is... Edits exports file..

ZFS and ZPOOL tools provide accesses to multiple different subsystems in ways that make more sense to end user, a lot like LVM and LUKS do on top of device mapper these days

goku12 3 days ago [ - ]

Can you elaborate the last paragraph? In what way doesn't zfs fit? (I couldn't make it out from the first two paragraphs.) Where did btrfs fall short of your expectations? Why would you avoid zfs on general purpose machines if you deem it good enough for file servers?

arghwhat 2 days ago [ - ]

ZFS managing its own cache that sidesteps the existing pagecache infrastructure and requires dedicating more memory for it to function well than you otherwise would with any other filesystem.

ZFS sidestepping conventional device and mount handling with the way it "imports"/"exports" ZFS pools, usually auto-mounting all datasets straight to your system root by default - if you have a dataset named "tank", it automounts to "/tank".

ZFS operation itself being an inexact science of thousands of per-dataset flags and tunables (again not set through the common path with mount flags), and unless you run something like TrueNAS that sets them for you it's probably best to pretend it's not there.

Common configuration with decent performance commonly involving complexities like L2ARC.

It's far too invasive and resource consuming for a general purpose machine, and does not provide notable benefit there. A dedicated file server won't care about the invasiveness and will benefit from the extra controls.

btrfs fell short by still not having its striped RAID game under control, only being "production grade" on single disks and mirrors for the longest time - probably still?

goku12 3 days ago [ - ]

@arghwhat: To clarify, this isn't a rhetorical question. I'm interested in your technical insight on the subject - especially the comparisons.

curt15 2 days ago [ - ]

IBM's patent on ARC expires in 2026. Could that make its way into Linux upstream then?

pmarreck 3 days ago [ - ]

I've been booting off ZFS-on-root for years.