Hacker News

new | ask | show | jobs

numpad0 2 days ago [ - ]

Even in tensor parallel modes? I thought it could only work if you're fine stalling all but n GPU for n users at any given moments.