I too learned this the hard way, via a supposedly concurrent priority queue that did quadratic-time work while holding a lock over the entire thing. I was told that "premature optimization is the root of all evil."
Sorry, folks, but that's just an excuse to make dumb choices. Premature _micro_optimization is the root of all evil.
EDIT: It was great training for when I started working on browser performance, though!
And if I may add a corollary: Measurement doesn't need to be held off until the end of the project! Start doing it as soon as you can!