Hacker News

new | ask | show | jobs

cyanydeez 3 hours ago [ - ]

not at the vram sizes that control how much context to load; also, GPUs arn't as effiecient as direct inference.

wmf 39 minutes ago [ - ]

OK, B70.