I guess you could run sata ssds with this, but I suspect more people would be running spinners... Spinners certainly have buffers to manage the difference between bus speeds and media speeds; I think most sata ssds do too.
The HBA may not have or need much buffering, as transfer to/from system memory with DMA should be pretty fast... I'd guess enough memory to fully buffer one outgoing command for each port, and one or two to buffer a response for each port as well; they may have more, but not much is necessary.
Some multi-port sata cards do terrible things though. Rather than using a proper controller for the number of ports, they have a one port controller and then a sata port multiplier. These do result in a meaningful restriction in bandwidth, and some multipliers require waiting for a response before communicating with another drive which would result in poor throughput for most NAS workflows.
A particularly terrible idea would be a port multiplier hooked up as a m.2 sata device, rather than a pci-e sata controller. I don't know that I've seen that one (edit, found one on amazon), but I'm pretty sure I've seen the controller + port multiplier combo. It's definitely worth finding out what controller is on the board before buying to avoid surprises.
I think this latter design pattern is what the m.2 6 sata cards do. It's typically something like an ASM1166 chip. (the example below) or a Jmicron equivalent.
They say the bandwidth far exceeds the individual SATA port speeds. But, there's little to no visible buffer on the card.
It's not an HBA, as much as a port "multiplier"
https://www.newegg.com/orico-pm2ts6-bp-pci-express-to-m-2-ca...
The ASM1166 is a controller. Check the datasheet [1]. Otoh, this thing [2] is a port multiplier with the JMB575 [3].
You don't need a lot of buffer to be an AHCI controller. You need enough space for the work queue that the driver writes, for write requests, you can have one sector sized buffer, DMA to fill it and then send it to the disk, but if you have two buffers, you can max throughput without doing anything fancy. For reads, chances are you'll be able to DMA out faster than the disk filled, two buffers is probably enough there, too.
So I don't know how much space you need for the command queue, but NCQ max is 32, let's say each queue entry is 256 bytes cause that's probably way more than enough. For each port, then you'd need 8k for the command queue, 8k for write buffers, and 8k for read buffers. Or a total of 144k of memory for a 6 port device... That's not a lot of ram to stuff into a controller ASIC.
Maybe you want a little more ram, in case the driver gets behind. You don't need extra buffers for writes... if the DMA is too slow, you just lose throughput, but no big deal, DMA will only be too slow if the system memory controller is very busy, and you can let the disk idle then. You might want to be able to fully buffer all reads, even if you can't DMA them out though. Then you don't have to worry about what happens if a read comes in and you don't have anywhere to put it. In that case, you'd have 8k command buffer, 8k write buffer, 32x4k = 128k read buffer = 144k per port, 864k for a 6 port device, still not enough that you need an external ram.
[1] https://www.asmedia.com.tw/product/45aYq54sP8Qh7WH8/58dYQ8bx...
[2] https://www.amazon.com/ChenYang-Function-Converter-Adapter-J...
[3] https://www.jmicron.com/file/download/893/JMB575.pdf