Hacker News

Hopefully like this (but smarter): https://chatjimmy.ai/

This is genuinely confusing to my senses. The future is going to be so strange/neat/me unemployed.

> strange/neat/me unemployed

I'm not sure if that's what you were going for, but I read it as if it were written by The Board in the game Control, and found myself with the appropriate level of existential dread.

syhol 10 hours ago [ - ]

We love/help/replace you

mh- 18 hours ago [ - ]

and I haven't played that game, so I read it in Ralph Wiggum's voice.. which also feels appropriate.

I'm in danger.

matheusmoreira 13 hours ago [ - ]

The future is totally illegible to me. I love these AI models, but I feel like I'm going to be jobless within 10 years.

Anomie is at an all time high right now.

the_af 7 hours ago [ - ]

10 years? An optimist, I see.

razodactyl 10 hours ago [ - ]

Yeah. It keeps catching me off guard that it answered me already.

kkotak 16 hours ago [ - ]

Why is the insane speed of 13KTPS of this site is not more on the the top of the AI conversations?

mike_hearn 7 hours ago [ - ]

Because there's been nothing to discuss since their announcement. Their API access immediately closed due to overwhelming demand and they didn't fab newer models than Llama3 yet.

Probably they will make bank selling to HFT for a while.

Ey7NFZ3P0nzAe 15 hours ago [ - ]

It's pretty well known by now.

chromadon 12 hours ago [ - ]

I asked it for a block of C++ code and it hit 14,189 tok/s. I assume it cached someone else's session?

fcsp 11 hours ago [ - ]

No - it's custom silicon https://news.ycombinator.com/item?id=48693490

mlrtime 7 hours ago [ - ]

Because I just tested it and it took 3-4 clarifications before it actually gave a correct response vs gemini/google search. It's not great, but good.

I'd rather wait 3x as long.

jeingham 4 hours ago [ - ]

This caused me to have some sense what blistering fast AI actually is. What it means for the future is a question that remains.

niyazpk 21 hours ago [ - ]

Wow.. what?! How is this so fast?! Where can I read more?

fcsp 20 hours ago [ - ]

Funnily enough, pasting your comment straight into Jimmy leads to a... Funnily suboptimal answer that does not answer the question.

As someone else already contributed, this is driven by a Canadian startup taalas that basically makes chips that are llms, so everything is very fast but also, baked into the chip. Once this kind of stuff is a commodity in like 10 years, our world will be very, very different.

hajile 15 hours ago [ - ]

Taalas HC1 AI uses Llama 3.1 8B, but takes up a massive 53B transistors and 815mm2 on TSMC N6 (nearly at the reticle limit of 858mm2). N2 is a little less than 3x as dense (110MTr/mm2 vs 313MTr/mm2).

This chip would still be 272mm2 on N2 which is an eye-watering $30k/wafer and bigger than a 9950x or Nvidia 5070.

This just isn't feasible. Some of the latest-gen LLMs seem to have 5-10T parameters or about 1000x more. I don't know that taping out just one chip makes economic sense let alone the 300-1000 chips required for a cutting-edge model. Things like continuing education so your model knows about the latest NPM packages or world news is super important, but seems like it would require new chips.

There are a TON of uses for an 8B parameter models on the edge, but this is WAY too big to put on the edge of anything. Something like a 10mm2 100m parameter voice model might be feasible on the edge, but only for expensive devices, but most of those are TSMC 28nm (up to 29MTr/mm2) or GF FDX22 (up to 40MTR/mm2) which would increase the AI chip to the point where it would absolutely dominate the BOM.

lelanthran 14 hours ago [ - ]

> Things like continuing education so your model knows about the latest NPM packages or world news is super important, but seems like it would require new chips.

They probably have a few ideas around that. Me, personally, I'd have one main expensive chip (replaced every 10 years, or whatever), with a secondary cheap chip in front of it that gets replaced every year or so.

The secondary chip could act the way RAG does, or perhaps both chips together can act as LoRA.

Either way, 99.999% of the knowledge is static, you just need to fine-tune the weights with that remaining 0.001% knowledge, which can be done using RAG or LoRA on a much smaller (thus cheaper) disposable chip.

hajile 2 hours ago [ - ]

The better solution would be making part of the chip cluster use something like FPGA which can be reprogrammed.

Text to speech or diagnostics equipment where the core model is relatively small and never changes seems like the ideal application. You might be able to fit something in the 25-30B range in 2nm to 14A, but it would need a way to update.

Large models are simply out of the question in my opinion. If you need 400+ different chip designs, it’ll be billions of dollars to tape out before you even make the first chip.

lelanthran an hour ago [ - ]

> The better solution would be making part of the chip cluster use something like FPGA which can be reprogrammed.

I'm not sure I follow (It's late, I am tired and I haven't had my dinner yet. That's my stupid trifecta!)

The original chip has the weights, so it's literally just a bunch of on-die (read-only) memory cells. The FPGA, while you could use it for the memory cells, would be way too expensive to use as pure memory. Typically one would hook up (read-only) storage to it, so you still need that read-only chip anyway.

The FPGA is just the compute bits, but this chip has on-die weights, not just compute.

I was proposing that the they have the base weights on a primary (permanent) chip, and have a secondary (replaceable) smaller chip with weights for a specific use-case, or for fine-tuning with new knowledge/updates to the model.

The matrices can be multiplied LoRA style, applying the matrix in the secondary chip to the primary chip, resulting in up-to-date weights through which the prompt is pushed.

fcsp 11 hours ago [ - ]

Yeah, they're clearly just starting out and just shipped their very first proof of concept. But to me, their plans seem generally reasonable https://taalas.com/the-path-to-ubiquitous-ai/, and like I wrote, if this kind of thing succeeds and could become some kind of cheaply producible commodity component, I think there's huge value in that. Alas, maybe not as a frontier model replacement, but say 10 years from now you can drop a cheap raspberry pi like device in your Lan and have a fast local engine for things like text sentiment analysis, text summarisation, voice recognition, basic vision and things like that, that would be pretty exciting to me (but maybe as you outlined, impossible in practice)

hajile an hour ago [ - ]

There is a reasonable kernel of an idea here, but only if you dial expectations WAY back. The 10 years speculation is just wrong though. Even in 10 years, their 8B param model isn't going to be in consumer devices.

6nm is just 7nm++ and the process will be a decade old in a few months. In the decade since, we've only had a slightly less than 3x increase in transistor density and that's including EUV, BSPD, and GAAFET (which means progress is likely going to slow down even more).

Even if we hit another 3x increase, their 815mm2 design will still be a bit over 90mm2. For comparison, the entire M5 Pro/Max CPU die is just 61.7nm.

If our current progress somehow holds (not likely), even 20 years from now the 8B model would be 30mm2. You need 30 years of dead consistent progress to get it down to an includable 10mm2.

As you can see, this doesn't make sense to invest in. As to the stuff like voice recognition or basic vision, these can often fit within 100m parameter models which would be around 10mm2 on their current 6nm design. That's doable today in custom edge computing devices.

The other possible use is cheap fallback models for AI companies. Moving to N2 and shrinking chips to 600mm2 to improve yields a bit would give about 50B parameters with 3 chips plus another FPGA-ish programmable chip for continuing training and interconnects for everything. You'd need hundreds of thousands of chips produced for that exact AI model just to get costs below $100,000 per board.

That seems like a lot of money for the AI model you are essentially giving away, but maybe it still beats the power and price of GPU server racks.

HaloZero 15 hours ago [ - ]

the flash models have fallen in size at least between deep seek models. Is there a limit to the shrinking capacity of the models?

juleiie 10 hours ago [ - ]

That’s why this stuff should be a government mega project ultimately.

It is not market viable but it is sure as heck revolutionary. Like an atomic bomb but including more… peaceful uses.

That’s exactly where government should take rein like with ISS etc. However the models are too rapidly advancing for now for it to make sense

hajile 25 minutes ago [ - ]

The government isn't going to be making chip fabs go any faster which is the biggest limitation here.

The second big issue is that it takes months to fab chips meaning your hardware AI is months to maybe a year or more behind the times when it lands.

I do think it makes sense for something like a medical scanner where the model simply doesn't need constant updates, but that doesn't need government involvement to ship.

dmd 20 hours ago [ - ]

https://taalas.com/

ayewo 14 hours ago [ - ]

Taalas https://taalas.com/the-path-to-ubiquitous-ai/

Previous HN discussion: https://news.ycombinator.com/item?id=47103661

victorbjorklund 10 hours ago [ - ]

Damn that is crazy.

archon810 10 hours ago [ - ]

This is the reaction every time it's posted, and deservedly so.

vitorgrs 14 hours ago [ - ]

Not opening here... HN killed?

Bombthecat 11 hours ago [ - ]