Building robust LLM-based applications is token-intensive. You often have to plan for the parsing and digestion of a lot of tokens for summarization or even retrieval augmented generation. Even the mere generation of marketing blogposts consumes a lot of output tokens in most cases. Not to mention that all robust cognitive architectures often rely on the generation of several samples for each prompt, custom retry logics, feedback loops, and reasoning tokens to achieve state of the art performance, all solutions powerfully token-intensive.
Luckily, the cost of intelligence is quickly dropping. GPT-4, one of OpenAI’s most capable models, is now priced at $2.5 per million input tokens and $10 per million output tokens. At its initial release in March 2023, the cost was respectively $10/1M input tokens and $30/1M for output tokens. That’s a huge $7.5/1M input tokens and $20/1M output tokens reduction in price. https://www.lycee.ai/blog/drop-o1-preview-try-this-alternati...