And it's a 4B model. I worry that nontechnical users will dramatically overestimate its accuracy and underestimate hallucinations, which makes me wonder how it could really be useful for academic research.
And it's a 4B model. I worry that nontechnical users will dramatically overestimate its accuracy and underestimate hallucinations, which makes me wonder how it could really be useful for academic research.
valid point. its more of a stepping stone towards larger models. we're figuring out what the best way to do this is before scaling up.
If there's very little text before the internet, what would scaling up look like?