I gave the feedback at one Google interview that they should send Google employees through to see how many get hired. Good to see they basically tried that.
The conclusion at the end that bringing someone on board is the ideal method is true I'm sure, but even that runs into the issue that employee evaluation is an even worse situation than the interview process.
You can openly see some managers panic when they realize they have no idea what their employees have been doing for the last 6-12 months when they're asked to provide feedback.
2 anecdotes ...
1) The worst interview I ever had (BY FAR) was at Google--disrespectful people, no respect for time, I could go on and on. And I went back to try again to get that money showered on me. Worth it in the long run.
2) Their new system for "performance management" is a hoax. Just like at all other places, it "documents" what you should do so they can fire you more easily with unspoken rules and all sorts of arbitrary causes as well. A friend literally hit EVERY pre-agreed target and still got pushed out for "not delivering".
I once failed all my goals agreed upon a year ago but got promoted anyways, because priorities had shifted and the director above me just really, really wanted me to continue building his reputation. (not at google)
Oh I've had a terrible interview at Google too!
They told me it was all about pseudocode and how I think in advance, then on the actual interview they were being annoying about variable names and spaces after comma, while I was supposed to come up with some clever optimisation that boiled down to: "do you know this obscure theorem already? Cool you pass"
Same for me. I applied for engineering manager. Was told it was going to be a code review. I found a bunch of things that it could be improved, detected a very slow part / quadratic loop, but couldn't come up with the best algo to use. It was just leetcode in disguise.
Which theorem?
I don't even know the name. I reviewed the question with a friend of mine who has a PhD in mathematics and teaches at university and he figured it out (after a way longer time than the duration of the interview itself) :)
It is not necessarily bad if people hired cannot make it through. Reasons:
1. Standards got higher. Luckily if you got in early and proved yourself you are OK. But doesn't mean you would pass the current interview.
2. A marathon runner (with rare exceptions) can't run a marathon on a random day. They train for a specific date. Same with interview prep.
If I read the article correctly, they handed the hiring committee /their own/ anonymized packets and the committee chose to only hire 1/3 of themselves. That mean numer 2 is not the case. Presumably the hiring committee prepared for their interviews of they passed their rounds.
Number 1 can still be true, and likely is. But then what's the point of using dated packets to test the hiring committees calibration?
> I gave the feedback at one Google interview that they should send Google employees through to see how many get hired. Good to see they basically tried that.
They did, but not with the intention of doing anything about the problem.
This is a question of reliability, the conceptual 'correlation' of a measurement instrument with itself when measuring the same thing.
Reliability is one of two major concepts in psychometrics, the other being validity, the conceptual correlation between a measurement instrument and that part of reality that you're hoping to measure.
The question behind validity is "I want to know X; if I measure Y, how helpful will that be?". And the question behind reliability is "if I measure Z, how accurate will that measurement be?"
https://en.wikipedia.org/wiki/Reliability_(statistics)
https://en.wikipedia.org/wiki/Construct_validity
Yegge calls out both concepts explicitly, though not by name, in this essay:
>> The outcomes from interviewing are statistically terrible. Google did wave upon wave of analysis over the years, and all the results were incredibly depressing.
>> [reliability] To name just a few off the top of my head: interviewers barely agreed with each other. Put the same candidate in front of two of our sharpest people and you’d routinely get a confident “strong hire” from one and a flat “no” from the other.
>> [validity, though the 'problem' here is strongly confounded by a restriction of range issue] And once people were actually on the job, their interview scores told you next to nothing about how they’d do
>> [reliability] Hell, some of our star performers failed their Google interviews four or five times, finally got in after 2+ years...
>> [validity] ...and then outshone everyone else.
The discussion of how interviewing outcomes are statistically terrible would benefit from naming the ways in which they're statistically terrible. Knowing the problem you have is an important step toward solving it.
(And as a side note, the last I heard from Google, you're not allowed to interview more often than once a year. Interviewing five times in two years would seem to violate that policy.)
It is a basic theorem that the validity of any instrument is bounded above by the square root of the reliability. It isn't possible for an unreliable instrument to be tightly correlated to reality, because it is, by definition, not tightly correlated with anything. That's what it means to be unreliable.
Thus, any company that wanted its hiring process to be good would necessarily be extremely concerned with making that process accurate; you need to come to the same decision when you assess the same person. This is something that interviews cannot achieve except at extreme cost. You'd need far more than five interviews to get a reliable assessment from them, despite the claim in this essay that "any more than four interviews and you're just playin' with your food". Of course, the Google interviews aren't supposed to be reliable anyway, so in that sense the claim is probably accurate.
The prescription Yegge offers is valid. Multi-month work assessments will give you a strong, reliable, and valid signal. They're also very expensive.
Another thing the essay completely glosses over is that this problem has been recognized for a long time, and we already know how to do assessments that are reliable, valid, and cheap to perform. They're called standardized tests.
At least historically, Google prioritized not hiring bad candidates over hiring good candidates. So it was neither a priority for interviews to be consistent (for good candidates) or for employees to be able to consistently pass interviews.
That certainly makes sense as a goal, given the cost of hiring someone bad and then not being able to get rid of them.
The problem is that companies like Google that have evaluated their own hiring process, by comparing candidates "hiring score" with subsequent on-the-job performance, have found that there is little correlation. So, while the goal (be more concerned about false positives than false negatives) makes sense, their process of trying to achieve this is broken.
Big companies already have standardized tests; test banks that get rotated with grading rubrics. Examiners (employees) will ask their favorite questions over and over to calibrate where a given candidate stands.
Serious question, tell me what you think of using IQ tests to hire SWEs? Should we just do that instead?
Yes I do think it has merit. I think some kind of specialized IQ test which measures for aptitude can be used for for screening. Of course it's not a be all and end all but it should significantly reduce the 'grinding-leetcode' situation.
Why not do that for all jobs. Forget resumes and work experience/accomplishments, and just hire based on a test score?
Perhaps we could administer this IQ test at age 12, so that the low-scoring individuals can go straight into the fast-food industry, and the rest can pick between the doctor/lawyer/SWE offers that will be showered upon them?
From what I understand, this isn’t so far from what Germany does with their secondary schools
He said standardized test, not standard general cognitive test.
The tests that carry the particular branding "IQ" (Wechsler / Raven's / etc.) suffer from some problems in this regard - not very many questions exist and there are very large coaching effects. (Also, psychologists will tell you that getting an accurate result means you need the test to be administered by a trained psychologist. This is mostly nonsense, but to the extent you believe them, it's cost-prohibitive.)
Hiring from a test that measures IQ is a very good idea (and there is a test that's commonly used for hiring purposes, the Wonderlic†); hiring from "an IQ test" is a bit less good. Anyone who wants to subvert the Raven's test will be able to do that. High-stakes tests need more security.
The concept of "IQ" can be toxic in contemporary American politics, so there are many more tests that "happen" to test IQ than there are tests that advertise themselves as testing IQ.
† https://psycnet.apa.org/record/1982-00123-001 : "correlations between Wonderlic IQs and WAIS Full Scale IQs were [0.93] for the main group and [0.91] for the cross-validation group". Note that this test involves only basic math and takes 12 minutes of the candidate's time.
The decisions made by individual interviewers are extremely accurate if you realize they are just saying 'YES - I want this person hired' or 'NO - I don't want this person hired'. It's entirely subjective but likely very repeatable.
It's not repeatable; that's the whole point of describing how the same person gets wildly different results when they interview on multiple occasions.
Candidates are getting different outcomes on separate paths through the process because different people are interviewing them.