You are viewing a read-only archive of the Blogs.Harvard network. Learn more.

The Longest Now

Gormless statistical studies and social analysis
Monday February 02nd 2009, 8:45 pm
Filed under: Uncategorized

I haven’t posted a rant for a while, so I though I’d share my frustration with a paper I recently read, filtered through a more distant conversation with Aaron Swartz.

I am interested in the bounty of social exploration open to those of us alive today — from creating new geography-neutral organizations and governing bodies to actively living multiple separated and separately valued lives more explicitly than every before.  But sociology papers I read, by humanities scholars, technophiles, or artistic lawyers, seem to be all form and no substance.

Here is a popular coffee-table study from Yee & Bailenson at Stanford’s Virtual Human Interaction Lab about online experiences influencing offline life.   Enough people I respect (I have seen references to it twice in the past 2 days, prompting this post) refer to it off-hand to support related conversation, as though its conclusions were surely true, that I read it today.  And publication and the attention of a article certainly lend credence to the study!

Yet on reviewing it, I find almost no aspects of it that are not misleading, naive, or methodologically suspect.  Group selection seems lacking in depth, and the notion of “repeating” a study in the sense of verifying its results is used extremely loosely.  The choice of motivation and introduction for the experiments are artificial and described in haste.  Obvious dependence between different parts of an experiment are treated as unexpected discoveries (only after encountering results contrary to a hypothesis) rather than prepared for in advance.  The language used obfuscates what is new about the study and what is universal, statistics are used as a bludgeon and to hide unconvincing results, and the interpretation of data in terms of human psychology is exaggerated in places and unsupported in others.   

Here is the original paper.

This sort of practice seems commonplace; I do not mean to single out the authors, who for all I know are models of methodology in their subfield.  Doing a study as an excuse to posit a personal opinion can effectively produce centuries of dialectic without furthering our understanding of fundamentals one whit; which may be valuable in its own right and enough for certain people.  But it is not good for exploring of the boundaries of what is possible, or for discovering deeper truths, and it sure doesn’t satisfy me.  

Specific failures of this and similar studies, to my lay eyes:

  • Use of citations to back up common sense statements at the beginning… and no citations to back up more dubious claims at the end.  This implies a lack of prioritization of what needs measurement.  It also lends an aura of inevitability to the use of research to validate any statement.   “Research in communication and social psychology has shown that people automatically mimic each other’s speech patterns and posture”
  • Use of insufficiently detailed studies to describe a general phenomenon.  Studying “whether wearing black uniforms causes athletes to behave more aggressively” takes up 2 pages of the background for the paper.  Apparently the only corrollary studies have been about black v. white uniforms and nuns uniforms v. Ku Klux Klan robes.  A delightfully unconvincing spectrum of how the color or type of one’s clothes implicitly affects how one behaves.  (Now if you took a dozen colors and patterns of uniform, tried to come up with some mapping of uniform style to statistical behavior on the field from one large set of data, and applied it with uniform predictive effect to a second large set of data, that might be interesting.) 
  • Carrying out of experiments as “find data that prove my idea” exercises.  When every hypothesis you make is supported by your data, and any contrary evidence is described as suspect or indicative of a potential source of error, that’s a bad sign.  “these two unexpected findings show that a negotiation task performed in series does not provide a clean measure for the intended effects” (oh no: unexpected findings!)
  • The use of dichotomies rather than nuanced metrics, particularly in interpreting ambiguous results to create an overall qualitative story when there is no significant quantitative one. 
    Here: “tall v. short”, “aggressive v. passive”, “attractive v. ugly”
  • The glossing over of potential sources of error, and total lack of any numerical error analysis.  In more extreme cases, the glossing over of the existance of dependent variables, treating them as independent in some places (even while noting elsewhere that they do depend on one another).
    The point where I stopped reading this paper the first time through (I came back when posting this to read it completely) was where it explains that it was too difficult to assess the actual attractiveness of each avatar in WoW, so they used race a proxy for attractiveness.  A few paragraphs earlier, the paper notes that each race has a uniform height.  So the two major ‘independent’ variables, height and attractiveness, are both determined uniquely by race – and there are only 8 races in the game, with very definite differences in traits, abilities, and characteristics.  Such a paper could only have been published by an editorial board that had never played such games…
  • The use of detailed measurement and statistics for values that are irrelevant to the main thrust of the study.  This implies a lack of prioritization or recognition of where measurement matters.  Here: measuring the height of each race by pixel, the use of 4 significant digits in reporting data.
  • General lack of sanity checks.  In the WoW experiment : level of a character was chosen to indicate how successful the character was, and by association how confident the player felt while playing it.  however, players can have as many characters as they like.  the first order effect of a statistically attractive character is that the only person who ever sees the char as more than a flash on the screen – the player – is more likely to want to play it more often.  hence, to play with that char for longer.  So, since time spent playing the game rather than in real life is the main determinant of what level it reaches, you might expect that the maximally attractive chars will have high levels.  Indeed, since every player can choose to optimize all character traits, while you can imagine players who really like playing very pretty or very homely characters, there may be fewer who want to play the most homely pretty-race character or the most-pretty homely-race character… a possible explanation for the inverse stat in the study that the characters of attractive short races tended to be lower level than those of uglier short races.  (there, I can pilpul with the worst.)

I always get peeved enough when people use “real life” to refer to “not online” that I find it hard to read the rest with a fair eye. (Real life? What do you mean, this isn’t real?)

Comment by Kat 02.02.09 @ 9:27 pm

Bad Behavior has blocked 189 access attempts in the last 7 days.