I had the same thought, although voyage is 32k vs 128k for cohere 4.

Anecdotal evidence points to benchmarks correlating with result quality for data I've dealt with. I haven't spent a lot of time comparing results between models, because we were happy with the results after trying a few and tuning some settings.

Unless my dataset lines up really well with a benchmark's dataset, creating my own benchmark is probably the only way to know which model is "best".

Are people using 32k embeddings and no longer chunking?

It feels like embedding content that large -- especially in dense texts -- will lead to loss of fidelity/signal in the output vector.

My understanding is that long context models can create embeddings that are much better at capturing the overall meaning, and are less effective (without chunking) for documents that consist of short standalone sentences.

For example, "The configuration mentioned above is critical" now "knows" what configuration is being referenced, along with which project and anything else talked about in the document.

when you say long context models as less effective for documents that consist of short sentences, do you mean that embedding models that have long context capabilities tend to be worse with shorter sentences or are you just saying that _using_ their large context windows will be less effective for docs with short sentences

It is common to use long context embedding models as a feature extractor for classification models.