we should publish some; the high-order effect seems to be that LoRAs significantly hurt small model performance vs FFT, with less of an effect for large models. This is maybe because large models have more built-in skills and thus a LoRA suffices to elicit the existing skill, whereas for small models you need to do more actual learning (holding # parameter updates constant). In general I think it's better to get a performant small model with FFT than a performant large model with a large LoRA, which is why we default to FFT, but I agree that we should publish more details here.
Thanks! Personally I found FFT is not necessarily a strict improvement over (Q)LoRA as it can sometimes more easily lead to instability in the model, hence the bit of extra scrutiny.
Curious to see your thoughts and results whenever you get something out.