> “Sample bias is simply any bias in the collection of the sample of people or entities used in the sample. For example, the people we’re measuring might self-select into the sample by choosing to participate in a way that relates to the outcome. Think of a new program a teacher introduces to help students do better on tests. People self-select into the program, and what do you know, they do better than the people who don’t!”
It may be useful to make the distinction between Sampling Bias and Selection Bias. It is different to say the sample we study is non-representative (not randomly sampled), than to say individuals non-randomly select into the treatment group. Of course, they are similar problems, but Sampling Bias does not bias the estimated effect, though it may constrain us to finding a local treatment effect. In the example of Bleemer & Mehta, UCSC is not a random sample of college students, but this does not bias their estimates because Selection Bias is (probably) not present in sufficiently small intervals around the GPA cutoff. In other words: Selection Bias threatens Internal Validity, while Sampling Bias threatens External Validity.
Also, how long did it take you to write this? It’s quite good.
Good eye! Fixed it. As for how long it took to write this, I'm not sure. I was working on the earliest version of this series of posts months ago, but I work sporadically and at varying intensities. So definitely more than ten hours, but maybe not 100.
I am very impressed with the amount of effort that went into this. Maybe I'll get around to reading it once the job market forces me to finally realize theory is out and the future is applied. Till then... my whiteboard is my best friend.
> “Sample bias is simply any bias in the collection of the sample of people or entities used in the sample. For example, the people we’re measuring might self-select into the sample by choosing to participate in a way that relates to the outcome. Think of a new program a teacher introduces to help students do better on tests. People self-select into the program, and what do you know, they do better than the people who don’t!”
It may be useful to make the distinction between Sampling Bias and Selection Bias. It is different to say the sample we study is non-representative (not randomly sampled), than to say individuals non-randomly select into the treatment group. Of course, they are similar problems, but Sampling Bias does not bias the estimated effect, though it may constrain us to finding a local treatment effect. In the example of Bleemer & Mehta, UCSC is not a random sample of college students, but this does not bias their estimates because Selection Bias is (probably) not present in sufficiently small intervals around the GPA cutoff. In other words: Selection Bias threatens Internal Validity, while Sampling Bias threatens External Validity.
Also, how long did it take you to write this? It’s quite good.
Good eye! Fixed it. As for how long it took to write this, I'm not sure. I was working on the earliest version of this series of posts months ago, but I work sporadically and at varying intensities. So definitely more than ten hours, but maybe not 100.
Love the edit.
That’s pretty impressive. It’s very thorough.
I am very impressed with the amount of effort that went into this. Maybe I'll get around to reading it once the job market forces me to finally realize theory is out and the future is applied. Till then... my whiteboard is my best friend.
Should I read this if I do econometrics for a living
A lot of it would be redundant for you, but you might still enjoy the commentary. I'd give it a shot.
Jack - you should be selling this information!
I'd like to compile this series into a print edition when I'm done. So maybe I *will* sell this!