You are viewing a read-only archive of the Blogs.Harvard network. Learn more.

The (Morphospace) Plot Thickens

ø

Here, at long last, my very first visualization of the morphospace I’ve labored so long and hard to put together…

After having gotten well and truly stuck trying to understand what Boyce meant by transforming the dissimilarity matrix so the centroid was at zero, and where that statement came from in the Gower paper, I decided that it was all just slowing me down to much. So I re-ran the dissimilarity matrix algorithm so it would produce a full matrix (fast than writing up new code to reflect the existing half-matrix, and less liable to me making a mistake there!), which I had found out from reading the R function documentation was a required input to the cmdscale() function I’m using. Sent my dissimilarity matrix to the function, plotted up the resulting list of points, added the taxon names as labels, and hey presto! A product! Woot!

Now, figuring out what this means is going to be another task altogether. Right now this just looks like a random scattering of points with a bizarre outlier—Pseudorutilaria—which I’m not at all sure why it’s so far away from anything else. I’d like to label all these taxa with which ones are centrics and which ones are pennates, then further split into radial and multipolar centrics, araphids and raphids, to see if the major groups fall out separately on the space.

The weird outlier may have something to do with data quality along rows—i.e. it’s probably worth doing the analysis I showed in the last post, but for genera (i.e. rather than seeing how complete each character is in terms of valid coding, seeing how complete each genus is in terms of valid characters). That way perhaps I can rerun the morphospace analysis with a few “bad apples” out, if that’s what Pseudorutilaria is. It may of course be an indication that because of the garbagey quality of my character coding, the whole morphospace represents nothing. Garbage in, garbage out.

This is what the genus data quality looks like:

No genera have more than 80% valid genera. More than half have over 60% valid characters. So not fantastic. If I widen the net and also count “v” character states as invalid, it should look even worse. 

And indeed, things are worse, but not substantially different. What this suggests, in any case, is that a) the quality is pretty bad, and b) for a few genera, it’s really bad—less than half the characters have valid states. These may be worth getting rid of. Or, at least re-run the analysis without them to see if it affects the results substantially.

The genera are (numbered from 1):

36  97  28  29  39 100  33  42  38  96 102 123  25  75  17

Cussia, Pseudoeunotia, Cladogramma, Clavicula, Cymatogonia, Pseudostictodiscus, Cosmiodiscus, Cymatotheca, Cymatodiscus, Pseudodimerogramma, Pyrgupyxis, Stephanogonia, Cestodiscus, Lisitzinia, Baxteriopsis.

Interestingly, Pseudorutilaria is not on the list. How complete is its character list? It’s row 99 in the matrix. But it has 72% valid characters, so it must actually be a legit outlier.

Of the other outlier-ish taxa on the plot above, Pseudoeunotia is on the list, but that’s about it. Hmmmm.

 

 

previous:
Let the Analysis Begin
next:
Negative Eigenvalues, Negative Eigenconfidence

Comments are closed.