Hacker News

brian-armstrong 7 hours ago [ - ]

Powering the SSD on isn't enough. You need to read every bit occasionally in order to recharge the cell. If you have them in a NAS, then using a monthly full volume check is probably sufficient.

derkades 7 hours ago [ - ]

Isn't that the SSD controller's job?

brian-armstrong 7 hours ago [ - ]

It would surely depend on the SSD and the firmware it's running. I don't think you can entirely count on it. Even if it were working perfectly, and your strategy was to power the SSD on periodicially to refresh the cells, how would you know when it had finished?

ethin 6 hours ago [ - ]

NVMe has read recovery levels (RRLs) and two different self-test modes (short and long) but what both of those modes do is entirely up to the manufacturer. So I'd think the only way to actually do this is to have host software do it, no? Or would even that not be enough? I mean, in theory the firmware could return anything to the host but... That feels too much like a conspiracy to me?

seg_lol 5 hours ago [ - ]

Do you know any firmware engineers?

Izkata 7 hours ago [ - ]

Huh. I wonder if this is why I'd sometimes get random corruption on my laptop's SSD. I'd reboot after a while and fsck would find issues in random files I haven't touched in a long time.

gruez 6 hours ago [ - ]

If you're getting random corruption like that, you should replace the SSD. SSDs (and also hard drives) already have built-in ECC, so if you're getting errors on top, it not just random cosmic rays. It's your SSD being extra broken, and doesn't bode too well for the health of the SSD as a whole.

Izkata 4 hours ago [ - ]

I bought a replacement but never bothered swapping it. The weird thing is the random corruption stopped happening a few years ago (confirmed against old backups, so it's not like I'm just not noticing).

brian-armstrong 7 hours ago [ - ]

It's quite possible. Some SSDs are worse offenders for this than others. I have some Samsung 870 EVOs that lost data the way you described. Samsung knew about the issue and quietly swept it under the rug with a firmware update, but once the data was lost, it was gone for good.

PunchyHamster 6 hours ago [ - ]

Huh, I thought I got some faulty one, mine died shortly after warranty ended (and had a bunch of media errors before that)

ethin 6 hours ago [ - ]

I ran into this firmware bug with the two drives in my computer. They randomly failed after a while -- and by "a while" I mean less than a year of usage. Took two replacements before I finally realized that I should check for an fw update

formerly_proven 7 hours ago [ - ]

Unless your setup is a very odd Linux box, fsck will never check the consistency of file contents.

Izkata 4 hours ago [ - ]

It found problems in the tree - lost files, wrong node counts, other stuff - which led to me finding files that didn't match previous backups (and when opened were obviously corrupted, like the bottom half of an image being just noise). Once I found this was a problem I've also caught ones that couldn't be read (IOError) that fsck would delete on the next run.

I may not have noticed had fsck not alerted me something was wrong.

suspended_state 6 hours ago [ - ]

But metadata is data too, right? I guess the next question is, would it be possible for parts of the FS metadata to remain untouched for a time long enough for the SSD data corruption process to occur.

giantrobot 2 hours ago [ - ]

A ZFS scrub (default scheduled monthly) will do it.