Archive for December, 2009

great data analysis and visualization

So what do you do once you have your data and want to analyze and present it in a useful and easy to understand way?  Today we do a roundup of some of the cool blogs out there doing work in these areas.

In the realm of social science research, Andrew Gelman is always very thoughtful and thorough in the ways that he displays his own complex data, and on his blog he often shares interesting data visualizations and data visualization tools from other sources as well.  He’s done some blogging for FiveThirtyEight as well, which has had some other nice visualizations too.

Gelman contributed to a book called Beautiful Data, along with Peter Norvig of Google (discussing natural language data), Brendan O’Connor of the AI and Social Science blog, and a number of other folks.  The bits I’ve read look pretty great — I can’t wait to read the whole thing.

In the realm of more playful and off-the-wall, the OKCupid blog has really interesting data analyses.  They have a large amount of data they’ve gathered from users of their dating site, and they regularly discuss on the blog how various opinions and priorities vary geographically, which demographics dates which, what kinds of communications get a response, and so on.

There are a number of cool websites that don’t address a particular topic, but instead apply clever information visualization techniques to a wide array of issues.  I particularly like Information is Beautiful, and have recently discovered Flowing Data (full of useful tutorials and tools!) and Infosthetics.  At the last blog, I would recommend checking out their visualization of Choose Your Own Adventure books, if you ever enjoyed those as a child.  (I did!)

Finally, there are Strange Maps, covering visualizations of everything from the useful to the whimsical to the imaginary, and GraphJam, which is just plain silly, but sometimes pretty great.

Leave a Comment

sampling bias in online research

Let’s say you’re trying to get a sense of how the U.S. population is going to vote in the upcoming presidential election.  So you take a poll of 100 people that are your friends on Facebook.  You are excited to find that the candidate you intend to vote for is going to win in a landslide.

Are these results a reliable prediction of the next election?  No, they’re probably not.  Your friends who have access to Facebook are almost undoubtedly not a representative sample of the U.S. population as a whole.  Doing social science research online carries a similar risk: if you’re trying to generalize about a large population — like, say, all of humanity — based on the population of people who have access to and interest in an online study, then you may be drawing incorrect conclusions.

This problem is not unique to online research.  In fact, some laboratory research in fields like cognitive psychology essentially assumes that all human beings’ minds function in essentially the same ways, and many researchers study a population consisting mostly of college students and/or local residents and passersby.  There are many reasons to believe that this is often a bad policy.  (To be fair, many researchers do also do significant work to try to determine which demographic factors may affect their results, and to try to adjust their sample or the breadth of their conclusions accordingly.)

So how does the online situation compare to the offline one?  Some research has been done on the demographics of Mechanical Turk participants.  This work makes it look like the online population will be, along some dimensions anyway, far more diverse than the local population that is easily accessible for laboratory work.   However, everyone participating in online research will necessarily have access to a computer, the internet, and have the free time, interest, and computer skills necessary to participate in such tasks.  This means that the online participant population will be unsuitable or insufficient for the purposes of some studies.  Still, we hope that we can provide access to a population that can speed the gathering of at least a portion of the data for some studies.  And for some studies that are currently run entirely on undergraduate and/or local populations, we hope to dramatically increase the diversity of the populations studied.

We don’t expect that online research will be able to completely replace field work and laboratory work for a number of reasons… at least, not until we’re all plugged into the giant planet-wide hive mind in the future.   But we think it could dramatically improve collection of data for a lot of human subjects research.  The fact that the NIH is trying to match up researchers and participants online for clinical trials and other lab-based research indicates that they also see some major potential for online participants in spite of potential sampling bias problems. But we’re curious what you think. Do you think you could move all or some of your research online?  What issues would you be most worried about in terms of sampling bias?  Can you think of any tools or information that would help?

Leave a Comment

quick Mechanical Turk jobs from the command line

Here’s something for the geekier researchers in our crowd, and for those already running batches of tasks on Mechanical Turk.

Have you ever wanted to run a real quick experiment or task on Mechanical Turk — run a quick poll, or ask the same question about each of a few different pictures, for instance?  Do you enjoy using the command line to get things done efficiently?

Mechanical Turk’s own command line API wrappers can be clunky, but Voxilate is here to help!  They’ve created a python script that creates command line functionality for simply and quickly setting up common tasks on MTurk.  Nifty!  Google code is hosting a few other possibly useful Mechanical Turk coding projects as well.

Leave a Comment

New survey tool

There’s a new survey site called Survs.  The main feature differentiating it from existing tools appears to be that it’s easy to collaborate on surveys and share information across multiple accounts.  That seems like a good idea!  From playing around with it, it seems like they have good usability, and it looks like they also have some nice data analysis tools. The free account won’t let us do anything with  logic or with collaboration, so we can’t test the more complicated features, but it looks like these folks might be worth checking out if you have surveys to run.

Leave a Comment

keeping participants honest in online research

Let’s face it — research participants sometimes lie, cheat, or just don’t pay attention.  This can be a problem in the laboratory; I was at a dinner party the other night where some researchers were discussing recent work done showing that as many as 30% of psychology research participants in the laboratory are not paying attention to instructions (and presumably are just trying to get paid quickly, in many cases). [1]

But it seems natural to expect that this problem would be magnified even further online, where people are unsupervised and anonymous.  Part of the work that HeadLamp Research intends to do is to investigate how reliable the data collected on our online platform are, and to look for ways to improve reliability.  A first step, though, is brainstorming why and how participants  might be dishonest in the first place, so that we know what to look for.

Here are some of the things that seem like particular worries to us with online research:

  • Participants may register multiple accounts to participate in studies more than once.
  • Participants may lie about their native language, age, or other personal information in order to be eligible to participate for more studies or for better paying studies.
  • Participants may lie about their personal information like education or health background because they are embarrassed to tell the truth.
  • Participants may fail to follow instructions and simply get through a study as quickly as possible in order to maximize their pay per time.

What are we missing?

We’re not the first people to be looking into deception online, or even the data reliability of online research.  So there’s a research base for us to build on.  And we have some ideas of our own about how to detect liars, cheaters, and those who just aren’t paying enough attention.  We’ll be talking about this more, but we’d love to hear how you deal with these problems in the lab, and what your major concerns are in terms  of data reliability.  In some of my research, it’s been essential to have participants with particular linguistic backgrounds.  What factors are most important to your research?

(And, by the way, for those research participants out there — we’ll also be talking about how to keep researchers honest; we know they can also occasionally screw up or be unfair, and participants should have a way to deal with that, too!)

[1] I don’t have a citation for this yet, unfortunately; it wasn’t clear to me if this research had been published yet, but I’ll be looking it up.

Leave a Comment

social science experiments on Mechanical Turk

There’s a neat blog up called Experimental Turk whose purpose is:

reporting evidence concerning the reliability of Amazon Mechanical Turk as an online subject pool for experiments in economics, psychology, and social sciences in general.

The authors have been running classic experiments from social science on Turk (e.g., some of Kahneman & Tversky’s classic work on judgment heuristics and biases) and posting to the blog with the result, comparing the Mechanical Turk results to known effects found in the laboratory.  This is a great way to start getting a better sense of data reliability on Turk, and they encourage other researchers to collaborate.  Take a look!

Leave a Comment

HeadLamp at Ignite NYC: Hilary Mason’s talk

The HeadLamp Research co-founders recently attended the Ignite NYC event — a crazy, fun series of 5 minute talks, where each talk consists of 20 slides that auto-advance every 15 seconds.   Both of us spoke at the event, and you can currently see Hilary Mason’s talk, “How to replace yourself with a very small shell script,” on the Ignite site!  It’s fun, and all about how to manage your email and communication data more efficiently.

Leave a Comment