Yes my findings and thoughts were pretty much identical. I actually think you can get something reasonable at 1.3B params with the correct training recipe, but definitely not at this compute/token budget.

One thing I found was that the model would pretty much always emit solutions from its training data when asked to solve problems, but it was much better at using Bash commands to explore a codebase, for example.

The Hugging Face folks have a great post on also using CAI for more vibes/character post-training than harmlessness https://huggingface.co/blog/constitutional_ai#oh-honey-lets-...