You are viewing a read-only archive of the Blogs.Harvard network. Learn more.

Aping Some More

ø

Started the day trying to figure out how to plot lines to demarcate clades (for color coding) on the phylogeny. Not as easy as one would hope for it to be in ape(). Eventually figured it out (ooh, hello, it’s lunch time already! shit!). It looks like this (i.e., promising):

Now the next task is to get these taxa colored up all nice and good so the points in the PCO plot can be colored correspondingly. This took quite a while, owing to some very non-intuitive complexities in the way objects of class phylo store the order of tips on the phylogeny (not, as one might expect, the order in which they are displayed!), and the way R codes for colors, which you’d think I knew by now… In any case, figured out eventually:

Was feeling quite ready to roll on and start making the matching PCO plots which will use the same colors to link phylogeny with morphology, but alas, this is going to have to serve as a temporary stopping point: since I was rudely shoehorned into co-presenting at the idiotic Geobiology meeting on Friday, I need to invest at least the next half an hour before Justin comes by to discuss what we’ll say actually reading the damn paper we’re going to present. At least I’m parked on a success/downhill.

Well, Justin showed up early and we ended up talking about the paper for a bit longer than I had anticipated, so it wasn’t until after dinner that I got back to working on the figure. In any case, I was able to get the rest of it banged out pretty quickly, and the result is at least aesthetically satisfying (although I don’t think it shows anything particularly fascinating that isn’t already obvious from the other plots I’ve made):

First off, though pennates and centrics occupy different parts of the 2D morphospace (which is obvious from the main PCO plot), their subdivisions don’t really occupy distinct areas. Radial versus bi-/multipolar centrics pretty much sit in the same part of morphospace, and araphids and raphids have a lot of overlap, too. What’s more, the clades within each of those groups (which I’ve plotted in similar colors—reds, blues, and dark greys) don’t seem to plot together, either. So in this view, while it seems that there is some very high level relationship between morphological and phylogenetic proximity, but it isn’t a particularly close relationship.

This is kinda disappointing (I think), but it sets up rather nicely the next plot, which shows pretty well the same thing, but from a slightly different perspective.

Monkeying with ape

ø

After a week of relatively low efficiency, exhaustion, and feeling sorry for myself, kicked in the help of SelfControl this morning to get myself back on track. At this point I’ve reached the “morphospace vs. phylogeny” section, which requires plotting a phylogeny side-by-side with a PCO plot, with matching color codes. This requires quite a bit of R learning, since it isn’t something I’ve done before.

I first wanted to visualize what the plot should look like. I printed out tree plots of both trees I’d gotten, the one from Sörhannus and the one from Medlin. The latter I had not actually even downloaded from my emails yet. I did this, and had a good look at it, but was not impressed by the data she had sent me—the tree plotted up OK, but all the species names were abbreviated, and the tree had far fewer taxa in it than the one from Sörhannus. So, I decided to go with the Sörhannus one, whatever the repercussions might be.

I printed out his tree and the list of genus names found in the morphospace as well as on the tree (there are 41). Then I went through and manually highlighted all species on the tree belonging to those genera. It’s a pretty good, broad spread. What I want to do is to produce a pruned version of the tree with only one node for each of those genera. Had to spend a lot of time in the ape() package (which stands for “analysis of phylogenetics and evolution”) to get it to go from looking like this (the original phylogeny):

To this (the version I ended up with last night at about 7:30):

This involved removing a lot of species, shortening the names, and rotating a good many nodes to put them in an order at least broadly comparable to the Sörhannus paper (also the order, roughly, of the Medlin and Kooistra phylogenies). The motivation-sapping thing was to see how different this phylogeny is topologically quite different from the one Sörhannus published. I had just assumed that the tree he sent me was the same one that was published in the paper, and it was impossible to tell (given the image above) with the dense tangle of branches, that it was actually different. Now that I’ve spent a whole day monkeying with it, though, I don’t feel like it makes sense to abandon it and try all over. I suppose I will just have to acknowledge that it’s topologically different, and maybe email Sörhannus and ask him what the deal is. Maybe add a line or two in the paper about why, if I can figure it out.

The alternatives are to use Kooistra’s phylogeny or Medlin’s. But Kooistra was spectacularly unhelpful when I emailed him with my original character list for review, so I don’t really feel like engaging with him, and Medlin sent me her phylogeny, but it’s even messier than the one from Sörhannus because all the names are abbreviated and it would probably take me at least a day just to decipher what species the abbreviations actually stand for. So no really good alternatives.

Best to press ahead with this. I’ve got to get this thing done, after all, whether it’s perfect or not. And it won’t be.

Interview Done, Here Cometh the List

ø

Tuesday and Wednesday ended up being a write-off, in terms of research. First, I had my annual paragraph of letdowns to write for PlanktonTech, which I ended up doing successfully, though it cost some time and emotional investment, reading back over the lofty aspirations and lost time of the past year. And on Wednesday morning, the interview—which went OK, though it left me feeling pretty well exhausted, and after a careless mistake too many I decided to call it quits for the day and went home (and actually fell asleep in the middle of the afternoon).

Anyway. It’s been a bit of an uphill struggle regaining momentum after that break of focus, short though it was. Maybe it’s the midnight meowing of our feline houseguest, or the morning runs I am still getting used to, but I’m remarkably exhausted…

In any case. I emailed Sorhannus again and asked if he would send the complete tree file, so that I could calculate patristic distances. This involves figuring out how to read this file into R, and how to monkey with it once it’s there. It’s a .nwk file, which turns out stands for Newick, and is the standard file format for trees. It’s taxon (node) names hierarchically clustered by parentheses, with numbers denoting branch lengths. Seems straightforward enough.

I managed to do this, calculate the patristic distance (thanks to some more help from Allison) using the cophenetic() function in {ape}, and plot up the result (it’s slightly less well correlated, interestingly, than the direct distance, but not by much):

Since this plot looks similar to the last one I posted, it’s no surprise that the correlation between patristic and “direct” raw distance among sequences is high (r-squared of 0.79):