> I still suspect that 5 kilobits should be more than enough for someone actually good at TM golfing.

Coincidentally, that is about the size of Penrose's UTM [1] which takes 5495 bits in its own encoding. I suspect that one is only linearly optimal, so it might take another few thousand bits to make it additively optimal.

> Consider a thought experiment, where you try to construct a UTM by making a TM that simply evaluates a BLC term on the tape. This TM by itself would still not be 'additively optimal' by your rules! It would also need an incremental input encoder to convert packed bits into BLC terms, as well as a lazy evaluation model to trigger it only when needed.

Yes; that's what it means to evaluate BLC programs. If the UTM does that, then it's additively optimal. It would be a TM implementation of the Universal Lambda Machine [2]. Most implementations you find there also have subroutines doing the conversion from bits to terms and back.

> for any description D, bit string s, and bit stream t, there must exist an encoding <D, s>, no longer than <D> + s, such that <D, s> + t is equivalent to <D> + s + t. (That is, adding a fixed prefix before the input cannot extend the encoding by more than the length of the prefix.)

I'm not quite understanding this. In my write-up, D is not a description but a description method, i.e. a computable mapping from binary strings (descriptions) to outputs (some set that includes binary strings). What is your D and what does <D> mean?

> a given bitstring can't be represented as efficiently inside the BLC encoding as it can outside the encoding

By "inside the BLC encoding", do you mean as part of the lambda term encoding prefix of the BLC program? Indeed, that is why we equip BLC programs with pure binary input so that k bits of data only add k bits to program length. Which is what allows additive optimality.

> where is the implementation of this input encoding within your 232-bit BLC term?

It's part of the BLC machine model, i.e. of the description method. We can agree that BLC is a computable mapping from bitstrings to outputs including bitstrings.

> Its only task is to 'interrogate' the first several input bits (as already encoded) to parse the description D.

If by D you mean an encoded lambda term, then the 232 bits definitely include the parser of that binary encoding.

[1] R. Penrose, The Emperor's New Mind, Oxford University press, 1989.