sampling bias in online research
Let’s say you’re trying to get a sense of how the U.S. population is going to vote in the upcoming presidential election. So you take a poll of 100 people that are your friends on Facebook. You are excited to find that the candidate you intend to vote for is going to win in a landslide.
Are these results a reliable prediction of the next election? No, they’re probably not. Your friends who have access to Facebook are almost undoubtedly not a representative sample of the U.S. population as a whole. Doing social science research online carries a similar risk: if you’re trying to generalize about a large population — like, say, all of humanity — based on the population of people who have access to and interest in an online study, then you may be drawing incorrect conclusions.
This problem is not unique to online research. In fact, some laboratory research in fields like cognitive psychology essentially assumes that all human beings’ minds function in essentially the same ways, and many researchers study a population consisting mostly of college students and/or local residents and passersby. There are many reasons to believe that this is often a bad policy. (To be fair, many researchers do also do significant work to try to determine which demographic factors may affect their results, and to try to adjust their sample or the breadth of their conclusions accordingly.)
So how does the online situation compare to the offline one? Some research has been done on the demographics of Mechanical Turk participants. This work makes it look like the online population will be, along some dimensions anyway, far more diverse than the local population that is easily accessible for laboratory work. However, everyone participating in online research will necessarily have access to a computer, the internet, and have the free time, interest, and computer skills necessary to participate in such tasks. This means that the online participant population will be unsuitable or insufficient for the purposes of some studies. Still, we hope that we can provide access to a population that can speed the gathering of at least a portion of the data for some studies. And for some studies that are currently run entirely on undergraduate and/or local populations, we hope to dramatically increase the diversity of the populations studied.
We don’t expect that online research will be able to completely replace field work and laboratory work for a number of reasons… at least, not until we’re all plugged into the giant planet-wide hive mind in the future. But we think it could dramatically improve collection of data for a lot of human subjects research. The fact that the NIH is trying to match up researchers and participants online for clinical trials and other lab-based research indicates that they also see some major potential for online participants in spite of potential sampling bias problems. But we’re curious what you think. Do you think you could move all or some of your research online? What issues would you be most worried about in terms of sampling bias? Can you think of any tools or information that would help?