You are viewing a read-only archive of the Blogs.Harvard network. Learn more.
 

Gender Bias: A Twitter Folly?

June 2nd, 2009

There’s a claim made by researchers at Harvard Business School that men are followed disproportionately on Twitter. That may be true on a straightline basis.  But there may be more – or less — here than the authors make out. The fact is, we can’t tell yet.

A first order question is, “What is the correct denominator in this rate? What’s the expected value of the rate of male-male follow?” Then, “What’s the observed deviation and to what extent is it attributable to gender?” We don’t think the authors are in a position to answer that yet, based on the data they’ve offered.

How to sort this through? Best thing to do here is find a real world use case where we can test our intuitions about what might be going on in the authors’ data, and the claims made for it.

As we think about how we come to follow others on Twitter, there are three or four obvious vectors. To explain, we’ll choose one, and use a man named “Harry” as an example.

Here’s the vector: Harry wakes up to find that someone is now following him on Twitter. He could (but doesn’t always) follow back. Harry faces a decision: should he follow back? In this case, we’d want to know, when making his choice, does Harry show a bias in favor of following males vs. females? Secondly, if he does appear to bias towards male or female,  is it due to maleness/femaleness, or to some underlying trait?

The first thing we’d want to know is what the proportion of males-to-females is in the group of people newly following Harry – this is the universe of his potential choice. If it’s 30-70, then all things equal we should expect Harry to follow back at that rate if he’s gender blind.

That gives us the expected value of Harry’s follow-back rate for males vs females: it’s 30-70.

Anything other than that is a deviation that may or may not be attributable to chance – and if not to chance, then to some other factors, including (possibly) gender bias.

But the authors of the HBS note seem to suggest that the 30-70 (or whatever its equivalent) is in fact prima facie good evidence of a gender-based selection bias on Harry’s part – when in fact, for a 30-70 population it’s exactly perfect. So we need to know what the expected value is, and what the observed deviation from this expected value is – if any. If Harry should follow 30-70 and does, he’s perfect: zero observed bias. But if Harry ends up following 50-50, ok – something is possibly going on.

But we can’t stop there. Just because it’s 50-50 doesn’t mean there’s gender bias. Because when someone follows Harry, Harry actually *reads* what these folks do – and Harry won’t follow folks whose profile says they do Internet marketing, for example.  It may turn out that the ranks of Internet marketers are disproportionately male, or female. So when Harry elects not to follow, what might at first blush look like a gender bias is actually a bias against a profession.

So, Harry might have plenty of biases, but gender might not be one of them. What looks like a gender bias is in fact first and foremost an expected value; and after than, much of the deviation could be explained by non-gender factors like profession.

This points to the importance of understanding the vectors and dynamics of the follow/don’t follow decision in Twitter. There are more vectors than this one use case, and plenty of dynamics. In the current analysis, the data offered could be explained by a host of factors, most not explicated by the authors.

Until they are it’s a little early to point the fickle finger of gender bias at Twitter.

We’ll save for another post taking a look at these important questions, too:

– How does the fact that 80% of users follow or are followed by one or more in fact test the capacity of a user base to understand the service? Do we have some expectation about the probability of an occurrence of a tie, and if so, why?

– A large proportion of Twitter users keep their gender identification ambiguous. To what extent does this alter the authors’ conclusions – did they adjust for this?

– As the authors assert, all well developed online social media services have a contribution pattern that roughly follows power-law or exponential distributions. Does the fact that Twitter falls within the extreme bounds of these distributions point to the fact that it is still settling into an equilibrium?

(With Andrew Conway/cross-posted to http://www.drewconway.com/zia/)