You are viewing a read-only archive of the Blogs.Harvard network. Learn more.

Negative Eigenvalues, Negative Eigenconfidence

ø

Had another slow start. I’m a bit more optimistic about getting started earlier for next week, since that’s when the pool will start opening at its usual, earlier time—which will give me some motivation to get out early, exercise, and settle down to work at a more reasonable hour than, say, the 10:30 start I managed today. In support of that, I spent a bit of time once I had sat down at my desk this morning mapping out the next week, juggling time between exercise, research, and the many career events going on next week. This feels a bit better.

Added some more writing to the methods section of the morphospace write-up I had started on Tuesday, which felt good. Sentence by sentence, the thesis will come together. Once I got stuck—on what justification to present for not including resting stages in my analysis, at which point I realized I don’t really know anything about resting stages in diatoms—I decided to abandon ship and switch to something more productive. Fortunately, R was open and waiting for me.

I thought the first task I’d do was to calculate the %age of the total variance in the morphospace data is captured by the two axes chosen to represent it. It’s not entirely clear what a “good” %age is, but in some PCA cases I’ve seen people like having 80% or so; in Boyce’s thesis it’s just over 50% I think. Finding out how to do this has not turned out to be easy. As far as I had understood before, the magnitude of the eigenvalues were an indication of variance, such that the sum of the eigenvalues was equal to (or related through a constant to) the total variance. Supposedly the cmdscale() function in R supplies a readily-calculated value (called GOF, for goodness of fit), i.e. the eigenvalues of the two axes as a proportion of the sum of all of the eigenvalues. However, a little bit of digging in the R documentation and the R forums suggests that things aren’t so simple. Apparently when using non-Euclidean distance measures (like the one I’m using), you can run into negative eigenvalues. This may or may not ruin the calculation of the GOF values, depending on whether the negative eigenvalues are subtracted out of the sum of the eigenvalues. It is entirely unclear whether this happens in the R version I am using, and therefore whether the values I’m getting are valid or not. Of course, I completely don’t understand at all what negative eigenvalues mean, or why the sum of the eigenvalues should equal the variance…. but never mind.

Even if they are correct, the values provided by the cmdscale() function for my data set are absolutely abysmal. The values returned, presumably for axes 1 and 2, are 13% and 17%. That adds up to 30%, so that gives basically no confidence at all that the arrangement of the taxa we’re seeing in the plot resembles their higher-dimensional arrangement at all. Yech! Disappointment.

 

 

previous:
The (Morphospace) Plot Thickens
next:
Career Angst

Comments are closed.