Thinking Evolution, At Darwin’s
ø
After a late start (thanks to a late night at Pierre and Nicole’s yesterday), and part of the morning helping Beau figure out his very own PCO problem, spent the afternoon at Darwin’s East. Reading the Erwin paper on disparity brought up a handful of points:
- There are a few other disparity measures I might consider—total variance, total range, number of unique pairwise character combinations, participation ratio—that I’m not too familiar with. He cites a paper by Ciampaglio, 2001, that I should check out.
- Is there some way to assess whether the filling of morphospace is more rapid (or less) than would be expected if diversification filled the morphospace under some null model (say, randomly)? Can that be simulated (sort of a bootstrap/p-value for morphospace occupation)? Erwin references Foote, 1996b in the book Evolutionary Paleobiology by Jablonski, Erwin, and Lipps, Gavrilets, 1999 (“Dynamics of clade diversification…”), and Pie and Weitz, 2006 (“A null model of morphospace occupation”). The last one seems particularly relevant. Yikes! Lots of reading to do.
- Can clades be separated out from the morphospace and their patterns of occupancy-through-time be examined individually? (My guess is that this will break down due to the degree of uncertainty of the tree topology).
Well, these are cool and interesting things to think about. Need to actually produce something now, though. I think setting up my code to produce disparity/diversity figures for different taxon sampling methods is next. I’m a little confused about exactly how I’m going to go about this, since I have two separate Neptune data files now, the original one I was working with on the diversity project, and the modified one for the morphospace project, which includes the crucial “Genus” column (with the genus name only). But, it has about 2,000 fewer entries than the full one, and I’m not quite sure why. It probably doesn’t matter though, since it’s only a third of a percent or so of all the entries in the database. Maybe I can just ignore it. Maybe it’s just the zero-age-value occurrences I took out. (Quick check: doesn’t look like it). I’m going to ignore it for the time being. [Later note: the difference is those occurrences of taxa that are in the Neptune database but not in the morphospace—i.e. genera that I didn’t code for. It’s actually quite reassuring that they only account for a third of a percent of all the occurrences.]
Another minor niggle I noticed: the genus diversity from the morphospace increases in the first few time bins of the Paleocene, while the species diversity seems to be zero until about 60 myr. That does not make sense—something’s fishy there. Need to follow up on that. Result: probably down to setting the time bins so they only start at 60 myr. Resetting the time bins to go back to 64 myr ought to fix that.
The problem I’m grappling with at the moment is that the code to calculate convex hull volumes crashes when I run it under the “in-bin” sampling model. It works fine under “range-through” sampling. I had problems with the mean pairwise distance function initially, too, because in the in-bin sampling mode supplies some time bins with zero length taxon lists (i.e. no occurrences). But I fixed that and the mpwd works fine now, but not the convex hull volumes. What gives? Reducing the range of dimensions used for convex hull volume calculation to 3D and 4D only (rather than 3D through 10D) makes it run OK—which suggests there’s some sort of problem in the higher number of dimensions. I’ll just have to keep redoing it until it breaks. It works fine to 5D and 6D. It crashes when I try to run it in 7D. Huh?
One observation that’s a possible lead is that, under in-bin sampling, the least diverse time bin has only 7 taxa. Is it possible that at least n+1 vertices are needed to calculate the hypervolume of a shape in n-space? It sort of makes sense in low dimensions—you need at least 3 points to define an area in 2-space, and you need at least 4 points in 3-space to define a volume. Maybe you need at least 8 vertices to calculate a hypervolume in 7-space. If this is true, no time bin under range-through sampling should have less than 11 genera in it. This is, in fact, true—the least diverse bin has 13 species in it. I think I’ve figured that one out!
Here, then, is the same figure as in yesterday’s post, but using the in-bin taxon sampling rather than range-through:
It looks quite a bit more variable and messy, but otherwise it’s pretty similar overall. I’m not quite sure why the alpha volumes with values >0.11 did not plot—maybe that algorithm crashed, too?! No, the results are there. Why aren’t they plotting, though? OK, problem solved—silly indexing mistake that crept in since I copy-pasted the plotting code for the alpha shape results from the code that plots the convex hull volumes. Never mind. Here’s the corrected version:
Well. Now that I’m not distracted by those missing lines, a few differences between the two sets of panels:
- Mean pairwise distance more variable using SIB
- Oligocene is peak convex hull volume in RT, while in SIB, Miocene is peak and Oligocene doesn’t stand out
- In alpha shape volume, the SIB curve is messier/noisier, but the pattern of increase with peak in the Miocene is the same
- There are hardly noticeable differences in the species diversity curves—this is a bit surprising (?)—I thought the RT looked a bit more different from the SIB based on my term paper with Charles all those many moons ago
OK. So now that this works sorta-kinda well, on to implementing the other taxon sampling methods. I already have routines in place for rarefaction and by-list, unweighted (“UW”) subsampling in my code from the diversity project. I should try and see if I can run those routines on the morphospace-modified Neptune database. Ah. But here’s a pretty major complication I hadn’t thought of. The subsampling algorithms all work by taking subsamples many times, calculating the diversity for each subsample, and then taking an average. Obviously, that average (which is what would be plotted in the bottom panel—no problem there, in theory) doesn’t have an associated set of taxon lists for each time bin. Rather, to do this properly I’d have to write a routine that generates a subsample, calculates the associated mean pairwise distances, the convex hull volumes, and the alpha volumes, and then does that a bunch of times and calculates the averages. It’s not that this is insurmountable, it’s just going to be a fair bit of work.
It also raises the question of how this subsampling should be done. Should the subsampling routine for the morphospace be run at the species level, and then the resulting pattern be showed at the genus level in the morphospace, or should the subsampling routine itself be run at the genus level? The latter would be more work, but perhaps the more “correct” thing to do? Phew, this is more complicated than I anticipated. Much, much more complicated. Since this is going to be a pretty major effort, I should probably do it only for one subsampling method, at least at first, and so since I will want to use SQ subsampling in the diversity paper, that’s probably what I ought to use here. But, I haven’t implemented the SQ algorithm for my dataset yet… Yikes…




