Hacker News

You can do that in a single matmul of course.

So can you take an arbitrary transformer and somehow turn it into a compact set of low-power fast gates by some algorithm?

measurablefunc a day ago [ - ]

I think you're misunderstanding the joke.

medi8r a day ago [ - ]

Yes joke is:

    [A B]

times

    [1]
    [1]

    [A+B]

hyperhello a day ago [ - ]

From context then, I infer that a transformer is not comprised of matrix multiplications, because it would simply be one that adds two 10-digit numbers.

medi8r a day ago [ - ]

A transformer tokenizes input, does a bunch of matmul and relu set up in a certain way. It doesn't get to see the raw number (just like you don't when you look at 1+1 you need visual cortex etc. first.)

Lerc 21 hours ago [ - ]

Notably the difference is that ten digits is not the same thing as a number. One might say that turning it into a number might be the first step, but Neural nets being what they are, they are liable to produce the correct result without bothering to have a representation any more pure than a list of digits.

I guess the analogy there is that a 74ls283 never really has a number either and just manipulates a series of logic levels.

Filligree 21 hours ago [ - ]

So the question is, why do we tokenise it in such a way that it makes everything harder?

medi8r 16 hours ago [ - ]

There is no encoding that makes everything easier. You trade off maths for general intelligence. Now we are at a point where the LLM can just choose to use a normal calculator anyway!

sureglymop 2 hours ago [ - ]

Possibly unrelated but something I never fully understood: while we can't create a perfect parser for natural language, why don't we optimistically parse it to extract semantics and feed that into LLMs as well?

akoboldfrying 18 hours ago [ - ]

The tokenisation needs to be general -- it needs to be able to encode any possible input. It should also be at least moderately efficient across the distribution of inputs that it will tend to see. Existing tokenisation schemes explicitly target this.

a day ago [ - ]

[deleted]