Awesome, this is a good idea! Having a nice wrapper to make LLM calls easier is very helpful too :)

Nice to see someone digging in on the system models. That's on my list to play with, but I haven't seen much new info on them or how they perform yet.

We’ve begun internally evaluating the model and will share our findings more in details later. So far, we’ve found that it performs well on tasks such as summarization, writing, and data extraction, and shows particular strength in areas like history and marketing. However, it struggles with STEM topics (e.g., math and physics), often fails to follow long or complex instructions, and sometimes avoids answering certain queries. If you want us to evaluate a certain use case or vertical, please share it with us!