You are viewing a read-only archive of the Blogs.Harvard network. Learn more.

Weekender

ø

Modified the script that calculates the Cramér’s V values for Figure 1 to catch any characters that have only one valid state, having escaped my attention earlier. (Of course, this offers a way of automating the process I was doing manually and, evidently, inefficiently earlier on). Found only one, character X47. Removed the offender, and (yawn) waited for the distance matrix calculation to run again.

This allowed me to move ahead and try to remake Figure 1. Turned out to be much, much harder than I would like… had to faff endlessly to get the plot to accommodate all the extra rows and columns, which in turn fucked everything else up, and the text size… in any case. Once I had something resembling a plot, it became immediately clear that the pattern of correlation is quite different from the previous exercise, different characters are responsible for the axis loadings and they are much, much more evenly spread across all the axes of the morphospace. I was looking at the marginal histograms of the plot (sums of Cramér values) and noticed the distribution across PCOs was much more even (not great), so wondered if perhaps the column sums of p-values would paint a prettier picture—I assumed that the axes with the highest Cramér values would also have the smallest p-values, generally. This did not seem to be the case, though—the sums of p-values are also the greatest where the sums of Cramér values are the greatest. So those axes with the biggest character contributions are also the least significant, statistically speaking. Jeez. This makes my head hurt. But, maybe this is just because those columns with big Cramér sums also have the largest number of items with p < 0.5. If I plot Cramér column sums against average p-value…

That is much more reassuring. So, on average the significance of the bigger Cramér values is better. I decided to code up one final variation on Figure 1 to reflect this—coloring the bars of the marginals to reflect the average p-value of the column or row they represent. This done, an inordinate amount of graphical faffing followed, in order to get the final plot to decent shape:

So, from the sheer R prowess perspective, this makes me a little bit proud, but it’s going to be a bit harder to explain and justify what’s actually going on here. Gone is the (sort of) lovely story told by the last version of this plot where all the big circles and dark colors are on the left—here the picture is, basically, you can’t trust PCO axes 1 and 2 (or 1 and 2 and 3, for that matter) to show you much of anything beyond a very general summary of what’s going on. There’s a much more complex story in the whole shebang of characters. Good luck describing that shit. (In other words: rewriting everything I just wrote about this plot last week. Hooray.)

I also realized, while on a walk with Kati in Lincoln on Saturday, that I could use the makeNumeric() function I’d written last week in order to see if I can use the biplot.pcoa() method supplied by the {ape} package’s equivalent of cmdscale(), which is called pcoa(). I tried it and, much as I had expected, supplying the function with a numeric matrix (rather than a character matrix or even a data frame) as the Y argument alongside the results of the PCO makes it “work”—in quotation marks because I have now convinced myself that, because my characters are measured on a strictly nominal scale (i.e. they are not just categorical, but also unordered), attempting to project loadings in this way is quite nonsensical.

Aside from the unintelligible cloud of names, what this plot shows in red is the “projection” of these original high-dimensional axes (defined by each character) into the lower-dimensional space. X1, X5, and X8 feature prominently—but isn’t this just because they have many character states (so their “values” range beyond the 0-1 of most of the characters, through 4) and because they have quite a high variance (or “variance”, since they’re really ordinal, and I’m not sure what variance of an ordinal variable even means). I think this is enough thinking about this bit, so I can finally leave behind the thought that had been haunting me for a while—”what if biplot.pcoa() does what I’ve been trying to do, but simply and elegantly?” The answer, I am fairly confident at this point, is “it does not”.

So, now to describe what my new Figure 1 (second figure above) shows. So, a whole lot of erasing what I’d already written, and starting over. Yech.

previous:
Moving Ahead, For Reals Today
next:
Monday Morning; Oh Yeah, Those Suck

Comments are closed.