Inputs were not long enough to properly see either of the true wins in terms of reduced token counts for terser formats or their benefits in terms of avoiding stuffing the context window thereby potentially reducing accuracy. The test really needs to be conducted across multiple dimensions!