You are viewing a read-only archive of the Blogs.Harvard network. Learn more.

Pickin’ Up Mo’ Mentum

ø

Have missed a couple of days of note-taking—mostly due to fierce productivity. Finally cracked the % variance explained problem using a two-pronged approach, first by the ratio of eigenvalues to the sum of eigenvalues (or the trace of the Gower-transformed matrix, which are supposed to be the same thing), second by the sum of the r-squared values of the correlation between squared distances in the original data matrix against the squared (Euclidean) distances in the PCO-space of the first n principal coordinates.

This fantastic success under my belt, I spent a couple of fairly agonizing days trying to understand the description of the PCO analogue to PCA axis “loadings” described by Foote in both his ’95 and ’99 crinoid papers. Eventually tracked down the reference he cites for the statistics he uses to calculate coefficients of association between the categorical characters in the original matrix (on a “nominal scale”) and the PCO scores on each axis (a continuous character, or one on an “interval scale”, as Siegel & Castellan call it). The PCO scores need to be discretized, which is very easily done with R’s cut() function.

Anyway… As of Wednesday morning I feel that I’m on the brink of figuring this shit out, so I popped into Andy’s office first thing and let him know he wasn’t going to be getting my draft outline yet, because I am on a roll and want to see if I can crack this beast.

After an incredibly frustrating but focused day, I was able to both implement the Cramér coefficient, write a code that would calculate it and the p-value for each combination of PCO axes and (most difficult of all) generate a plot that summarizes the results:

The color scale shows the significance of each combination—black is a p-value of 0 (very significant), white is a p-value of 0.05 (not as significant), and everything above 0.05 has been thrown out. The size of the circles shows the degree of association, bigger circles implying a stronger association (larger Cramér coefficient).

I need to add a legend to this, and maybe fix the outline color of the circles, but I’m done for today. Tomorrow I will try to sort the association pairs by Cramér coefficient and make a table of the, say, 20 largest associations and what PCO axes they’re on, to get a sense for what determines the axes most. But for today, this has been a pretty huge accomplishment, and today is the first anniversary of our official city hall wedding, so I’m off!

previous:
Figured out the GOF in cmdscale()!
next:
Soldiering Ahead

Comments are closed.