the subjective framework is exactly why its good
prior bms relied mostly on unit tests or synthetic judges which are easily benchmaxxed, which leads to nobody trusting benchmarks
we need people manually checking the data for good code quality
the subjective framework is exactly why its good
prior bms relied mostly on unit tests or synthetic judges which are easily benchmaxxed, which leads to nobody trusting benchmarks
we need people manually checking the data for good code quality