Hacker News

We’ve begun internally evaluating the model and will share our findings more in details later. So far, we’ve found that it performs well on tasks such as summarization, writing, and data extraction, and shows particular strength in areas like history and marketing. However, it struggles with STEM topics (e.g., math and physics), often fails to follow long or complex instructions, and sometimes avoids answering certain queries. If you want us to evaluate a certain use case or vertical, please share it with us!