great data analysis and visualization

So what do you do once you have your data and want to analyze and present it in a useful and easy to understand way?  Today we do a roundup of some of the cool blogs out there doing work in these areas.

In the realm of social science research, Andrew Gelman is always very thoughtful and thorough in the ways that he displays his own complex data, and on his blog he often shares interesting data visualizations and data visualization tools from other sources as well.  He’s done some blogging for FiveThirtyEight as well, which has had some other nice visualizations too.

Gelman contributed to a book called Beautiful Data, along with Peter Norvig of Google (discussing natural language data), Brendan O’Connor of the AI and Social Science blog, and a number of other folks.  The bits I’ve read look pretty great — I can’t wait to read the whole thing.

In the realm of more playful and off-the-wall, the OKCupid blog has really interesting data analyses.  They have a large amount of data they’ve gathered from users of their dating site, and they regularly discuss on the blog how various opinions and priorities vary geographically, which demographics dates which, what kinds of communications get a response, and so on.

There are a number of cool websites that don’t address a particular topic, but instead apply clever information visualization techniques to a wide array of issues.  I particularly like Information is Beautiful, and have recently discovered Flowing Data (full of useful tutorials and tools!) and Infosthetics.  At the last blog, I would recommend checking out their visualization of Choose Your Own Adventure books, if you ever enjoyed those as a child.  (I did!)

Finally, there are Strange Maps, covering visualizations of everything from the useful to the whimsical to the imaginary, and GraphJam, which is just plain silly, but sometimes pretty great.

Leave a Comment

sampling bias in online research

Let’s say you’re trying to get a sense of how the U.S. population is going to vote in the upcoming presidential election.  So you take a poll of 100 people that are your friends on Facebook.  You are excited to find that the candidate you intend to vote for is going to win in a landslide.

Are these results a reliable prediction of the next election?  No, they’re probably not.  Your friends who have access to Facebook are almost undoubtedly not a representative sample of the U.S. population as a whole.  Doing social science research online carries a similar risk: if you’re trying to generalize about a large population — like, say, all of humanity — based on the population of people who have access to and interest in an online study, then you may be drawing incorrect conclusions.

This problem is not unique to online research.  In fact, some laboratory research in fields like cognitive psychology essentially assumes that all human beings’ minds function in essentially the same ways, and many researchers study a population consisting mostly of college students and/or local residents and passersby.  There are many reasons to believe that this is often a bad policy.  (To be fair, many researchers do also do significant work to try to determine which demographic factors may affect their results, and to try to adjust their sample or the breadth of their conclusions accordingly.)

So how does the online situation compare to the offline one?  Some research has been done on the demographics of Mechanical Turk participants.  This work makes it look like the online population will be, along some dimensions anyway, far more diverse than the local population that is easily accessible for laboratory work.   However, everyone participating in online research will necessarily have access to a computer, the internet, and have the free time, interest, and computer skills necessary to participate in such tasks.  This means that the online participant population will be unsuitable or insufficient for the purposes of some studies.  Still, we hope that we can provide access to a population that can speed the gathering of at least a portion of the data for some studies.  And for some studies that are currently run entirely on undergraduate and/or local populations, we hope to dramatically increase the diversity of the populations studied.

We don’t expect that online research will be able to completely replace field work and laboratory work for a number of reasons… at least, not until we’re all plugged into the giant planet-wide hive mind in the future.   But we think it could dramatically improve collection of data for a lot of human subjects research.  The fact that the NIH is trying to match up researchers and participants online for clinical trials and other lab-based research indicates that they also see some major potential for online participants in spite of potential sampling bias problems. But we’re curious what you think. Do you think you could move all or some of your research online?  What issues would you be most worried about in terms of sampling bias?  Can you think of any tools or information that would help?

Leave a Comment

quick Mechanical Turk jobs from the command line

Here’s something for the geekier researchers in our crowd, and for those already running batches of tasks on Mechanical Turk.

Have you ever wanted to run a real quick experiment or task on Mechanical Turk — run a quick poll, or ask the same question about each of a few different pictures, for instance?  Do you enjoy using the command line to get things done efficiently?

Mechanical Turk’s own command line API wrappers can be clunky, but Voxilate is here to help!  They’ve created a python script that creates command line functionality for simply and quickly setting up common tasks on MTurk.  Nifty!  Google code is hosting a few other possibly useful Mechanical Turk coding projects as well.

Leave a Comment

New survey tool

There’s a new survey site called Survs.  The main feature differentiating it from existing tools appears to be that it’s easy to collaborate on surveys and share information across multiple accounts.  That seems like a good idea!  From playing around with it, it seems like they have good usability, and it looks like they also have some nice data analysis tools. The free account won’t let us do anything with  logic or with collaboration, so we can’t test the more complicated features, but it looks like these folks might be worth checking out if you have surveys to run.

Leave a Comment

keeping participants honest in online research

Let’s face it — research participants sometimes lie, cheat, or just don’t pay attention.  This can be a problem in the laboratory; I was at a dinner party the other night where some researchers were discussing recent work done showing that as many as 30% of psychology research participants in the laboratory are not paying attention to instructions (and presumably are just trying to get paid quickly, in many cases). [1]

But it seems natural to expect that this problem would be magnified even further online, where people are unsupervised and anonymous.  Part of the work that HeadLamp Research intends to do is to investigate how reliable the data collected on our online platform are, and to look for ways to improve reliability.  A first step, though, is brainstorming why and how participants  might be dishonest in the first place, so that we know what to look for.

Here are some of the things that seem like particular worries to us with online research:

  • Participants may register multiple accounts to participate in studies more than once.
  • Participants may lie about their native language, age, or other personal information in order to be eligible to participate for more studies or for better paying studies.
  • Participants may lie about their personal information like education or health background because they are embarrassed to tell the truth.
  • Participants may fail to follow instructions and simply get through a study as quickly as possible in order to maximize their pay per time.

What are we missing?

We’re not the first people to be looking into deception online, or even the data reliability of online research.  So there’s a research base for us to build on.  And we have some ideas of our own about how to detect liars, cheaters, and those who just aren’t paying enough attention.  We’ll be talking about this more, but we’d love to hear how you deal with these problems in the lab, and what your major concerns are in terms  of data reliability.  In some of my research, it’s been essential to have participants with particular linguistic backgrounds.  What factors are most important to your research?

(And, by the way, for those research participants out there — we’ll also be talking about how to keep researchers honest; we know they can also occasionally screw up or be unfair, and participants should have a way to deal with that, too!)

[1] I don’t have a citation for this yet, unfortunately; it wasn’t clear to me if this research had been published yet, but I’ll be looking it up.

Leave a Comment

social science experiments on Mechanical Turk

There’s a neat blog up called Experimental Turk whose purpose is:

reporting evidence concerning the reliability of Amazon Mechanical Turk as an online subject pool for experiments in economics, psychology, and social sciences in general.

The authors have been running classic experiments from social science on Turk (e.g., some of Kahneman & Tversky’s classic work on judgment heuristics and biases) and posting to the blog with the result, comparing the Mechanical Turk results to known effects found in the laboratory.  This is a great way to start getting a better sense of data reliability on Turk, and they encourage other researchers to collaborate.  Take a look!

Leave a Comment

HeadLamp at Ignite NYC: Hilary Mason’s talk

The HeadLamp Research co-founders recently attended the Ignite NYC event — a crazy, fun series of 5 minute talks, where each talk consists of 20 slides that auto-advance every 15 seconds.   Both of us spoke at the event, and you can currently see Hilary Mason’s talk, “How to replace yourself with a very small shell script,” on the Ignite site!  It’s fun, and all about how to manage your email and communication data more efficiently.

Leave a Comment

Behavioral Economics with Mechanical Turk

Eric Waller did a quick experiment to confirm a behavioral economics hypothesis with a small amount of Mechnical Turk data. He found a paper that showed evidence that removing the minimum payment line from a credit card statement causes people to pay more (unless they typically pay the bill in full), and constructed an experiment to confirm the hypothesis.

The entire process took him only three days. I’ll let you jump over to the article to see the results!  It’s a nice example of how quickly and cheaply a short, simple experiment can be run on Turk — Waller spent a total of 3 evenings setting up the experiment, recruiting and running 200 participants, and analyzing the results.  As we’ve discussed before, Turk is a good tool for this kind of experiment.

As this kind of research becomes easier, it also makes it more likely that people will do more research like Waller’s — confirming things that they’re pretty sure are the case, but which should really be double checked, as well as fleshing out existing results a bit more precisely.  That seems like a pretty great meta-result to us.

Leave a Comment

How are people really using crowdsourcing services?

Our mission involves recruiting large populations of internet users for research tasks, so we’re always interested in innovative crowdsourcing methods.  Crowdsourcing services allow many people to contribute to a project and be compensated in various ways.  We stumbled on 10 ways small businesses can harness big crowds by Ross Kimbarovsky, co-founder of CrowdSPRING, a sometimes controversial marketplace for design services.

The most interesting services highlighted are software testing (uTest), customer support (the always fantastic Get Satisfaction), domain-specific scientific, materials and technology research (InnoCentive), and prediction marketplaces (Inkling). The most successful crowdsourcing projects seem to be those that offer a win for both the business and the community members.  As a company developing tools to help both researchers and participants, this makes intuitive sense to us, and we’re happy to see this strategy succeeding elsewhere.

Leave a Comment

ResearchMatch.org matches up U.S. laboratory researchers and participants

The NIH has just announced a great new tool for clinical trials and other IRB-approved research!  ResearchMatch.org allows people interested in participating in research to sign up to learn about specific studies they might want to participate in.  It also lets researchers to use the system to recruit participants for their research.  In contrast to our online research focus, this tool fills a gap in research done in the laboratory, where it can also be very hard for researchers to recruit study participants.

Despite being administered by the NIH, ResearchMatch.org is for any IRB-approved researcher (not just clinical trials).  It contrasts in that way with research-participant match-up site ClinicalTrials.gov.  Another difference between the sites is that ClinicalTrials.gov makes the participant do the work of finding research they’re interested in participating in, while ResearchMatch.org makes it the job of the researchers to contact participants that match their needs (the system protects potential participants’ personal information however).

Currently, the site only allows researchers to use the system if their university or institution is a participating member.  But they’re encouraging researchers to sign up for information and express interest even if their institution is not currently participating; I don’t know if the site will eventually stop being mediated by institutions, or if they’re just hoping to get lots more institutions to sign up soon.

A side effect of the institution model is that opportunities are sparse in some parts of the U.S., and if you’re interested in participating in research in those regions, you may be out of luck for a while.  But I don’t know, maybe the network will grow fast — this tool seems like a terrific idea, and I hope lots of research institutions, researchers, and people interested in participating all sign up.

I also hope that if this system is successful, other countries will emulate it. For that matter, maybe the U.S. is lagging behind here and other countries already have such systems; does anyone know of such research matchmaking sites elsewhere?

Leave a Comment