Can LLMs compress those documents into smaller files that still retain the full context?
What do you mean?
The article says the LLM has to load 15540 tokens every time, I wonder if that can be reduced while retaining the context maybe with deduplications, removing superfluous words, using shorter expressions with the same meaning or things like that.
What do you mean?
The article says the LLM has to load 15540 tokens every time, I wonder if that can be reduced while retaining the context maybe with deduplications, removing superfluous words, using shorter expressions with the same meaning or things like that.