{"id":1847,"date":"2011-09-08T11:05:04","date_gmt":"2011-09-08T15:05:04","guid":{"rendered":"http:\/\/blogs.law.harvard.edu\/kotrc\/?p=1847"},"modified":"2011-09-08T21:40:05","modified_gmt":"2011-09-09T01:40:05","slug":"no-rain-no-gain","status":"publish","type":"post","link":"https:\/\/archive.blogs.harvard.edu\/kotrc\/2011\/09\/08\/no-rain-no-gain\/","title":{"rendered":"No Rain, No Gain"},"content":{"rendered":"<p>It is still bucketing down today, which Bawb Oauwkes explains is thanks to the remnants of Lee, the next-next hurricane to pass through after Irene. Hence my best-laid manly plans to rise early and swim were dashed as I laid in bed in musine paralysis&#8230;<\/p>\n<p>Anyway, I eventually made it to Starbucks and am overlooking constant, perpendicular flows of umbrellas across and water down Mass Ave. It&#8217;s a pretty picture, really. After being frustrated with the outcome of the time-resolved\/Neptune-connected morphospace plots, I had decided last night to re-run the analysis so far with a reduced number of taxa and characters. I culled the dataset to leave only characters with valid states for at least 50% of the genera, and genera with valid character states for at least 50% of the characters (both calculated from the original, full data matrix\u2014but with duplicate entries for heterovalvate genera removed). This matrix now had 120 genera and 77 characters.<\/p>\n<p>I generated a new dissimilarity matrix from this and ran the cmdscale() PCO algorithm, which yielded a somewhat improved goodness-of-fit scores of 0.17 and 0.22, which I&#8217;m interpreting to mean that PC1 and PC2 account for 39% of total variance in the dissimilarity matrix. I don&#8217;t yet, however, understand how exactly the GOF is calculated, since it doesn&#8217;t match the first and second eigenvalues as a proportion of the sum of all eigenvalues (however calculated). At some point I need to track down a stats textbook that has this information in it and educate myself on how PCO works, since the R documentation of the function is not particularly illuminating. The equivalent function in Matlab, incidentally, had a very much better explanation in its documentation (perhaps [?] unsurprisingly considering it&#8217;s an expensive proprietary, rather than free open-source, piece of software). That version doesn&#8217;t seem to calculate a goodness of fit, but it explains that if there are only two large eigenvalues, the spatial relationship among the points can be represented in just two dimensions. And, for their example,<\/p>\n<blockquote><p>The two negative eigenvalues indicate that the genetic distances are not Euclidean, that is, no configuration of points can reproduce D exactly. Fortunately, the negative eigenvalues are small relative to the largest positive ones, and the reduction to the first two columns of Y should be fairly accurate.<\/p><\/blockquote>\n<p>This isn&#8217;t the case for my dataset. The first (i.e. largest-magnitude) 16 eigenvalues are all of the same order of magnitude (between 0.6 and 0.1), and the 40 largest-magnitude negative eigenvalues are just 1 order of magnitude less than those largest positive eigenvalues. This means that there&#8217;s no euclidean representation of the relative positions of the genera in morphospace. This might not come as a great surprise, since the matrix is still quite sparse and very high-dimensional. I&#8217;m not trying to represent geographic locations of points on a globe on a map\u2014the sort of exercise in the Matlab example\u2014but points in a much, much higher-dimensional space&#8230;<\/p>\n<p>Here&#8217;s a plot of the eigenvalues:<\/p>\n<p><a href=\"http:\/\/blogs.law.harvard.edu\/kotrc\/files\/2011\/09\/EigenvaluesCulled50.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-medium wp-image-1851\" src=\"http:\/\/blogs.law.harvard.edu\/kotrc\/files\/2011\/09\/EigenvaluesCulled50-300x300.png\" alt=\"\" width=\"300\" height=\"300\" srcset=\"https:\/\/archive.blogs.harvard.edu\/kotrc\/files\/2011\/09\/EigenvaluesCulled50-300x300.png 300w, https:\/\/archive.blogs.harvard.edu\/kotrc\/files\/2011\/09\/EigenvaluesCulled50-150x150.png 150w, https:\/\/archive.blogs.harvard.edu\/kotrc\/files\/2011\/09\/EigenvaluesCulled50.png 480w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>This is actually not too bad. The first two eigenvalues don&#8217;t necessarily explain a huge hog of the <em>total <\/em>variance, but they&#8217;re definitely substantially bigger than any of the others, so that&#8217;s good. It&#8217;s at least clear that the third doesn&#8217;t explain much more than the fourth or fifth, so it doesn&#8217;t necessarily make sense to start going down the list and including more dimensions. That&#8217;s a good thing, I think.<\/p>\n<p>So what does the plot for the morphospace look like for this new culled dataset?<\/p>\n<p><a href=\"http:\/\/blogs.law.harvard.edu\/kotrc\/files\/2011\/09\/FullSpacePlotPennCenCulled50.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-medium wp-image-1855\" src=\"http:\/\/blogs.law.harvard.edu\/kotrc\/files\/2011\/09\/FullSpacePlotPennCenCulled50-300x300.png\" alt=\"\" width=\"300\" height=\"300\" srcset=\"https:\/\/archive.blogs.harvard.edu\/kotrc\/files\/2011\/09\/FullSpacePlotPennCenCulled50-300x300.png 300w, https:\/\/archive.blogs.harvard.edu\/kotrc\/files\/2011\/09\/FullSpacePlotPennCenCulled50-150x150.png 150w, https:\/\/archive.blogs.harvard.edu\/kotrc\/files\/2011\/09\/FullSpacePlotPennCenCulled50.png 480w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>Interestingly, the groups seem to be better separated now\u2014less overlap. Is the same true if we look at the subgroups, which were completely overlapped when using the full set of characters and taxa? Well, there&#8217;s still a lot of overlap, but at least there are some areas that seem to be differentiated.<\/p>\n<p><a href=\"http:\/\/blogs.law.harvard.edu\/kotrc\/files\/2011\/09\/FullSpacePlotSubgroupsCulled50.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-medium wp-image-1857\" src=\"http:\/\/blogs.law.harvard.edu\/kotrc\/files\/2011\/09\/FullSpacePlotSubgroupsCulled50-300x300.png\" alt=\"\" width=\"300\" height=\"300\" srcset=\"https:\/\/archive.blogs.harvard.edu\/kotrc\/files\/2011\/09\/FullSpacePlotSubgroupsCulled50-300x300.png 300w, https:\/\/archive.blogs.harvard.edu\/kotrc\/files\/2011\/09\/FullSpacePlotSubgroupsCulled50-150x150.png 150w, https:\/\/archive.blogs.harvard.edu\/kotrc\/files\/2011\/09\/FullSpacePlotSubgroupsCulled50.png 480w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>Now, the (kind of) crucial question\u2014is there better separation of occupied morphospace area with this reduced dataset? Basically the story is a little bit better than before\u2014at least there&#8217;s some expansion of morphospace to be seen. Still, the Paleocene and Eocene (red and orange) are still completely overlapping, and already cover most of the morphospace ever to be explored. The more interesting story may be told only when pre-Cenozoic data are included, which is going to make a whole hell of extra work.<\/p>\n<p>&nbsp;<\/p>\n<p><a href=\"http:\/\/blogs.law.harvard.edu\/kotrc\/files\/2011\/09\/PlotByEpochInBinCulled502.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-medium wp-image-1863\" src=\"http:\/\/blogs.law.harvard.edu\/kotrc\/files\/2011\/09\/PlotByEpochInBinCulled502-300x300.png\" alt=\"\" width=\"300\" height=\"300\" srcset=\"https:\/\/archive.blogs.harvard.edu\/kotrc\/files\/2011\/09\/PlotByEpochInBinCulled502-300x300.png 300w, https:\/\/archive.blogs.harvard.edu\/kotrc\/files\/2011\/09\/PlotByEpochInBinCulled502-150x150.png 150w, https:\/\/archive.blogs.harvard.edu\/kotrc\/files\/2011\/09\/PlotByEpochInBinCulled502.png 480w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>Well, this is something of a minor success, and since it&#8217;s coming up to 10pm I&#8217;m going to call it quits for the day and pick up again here tomorrow.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>It is still bucketing down today, which Bawb Oauwkes explains is thanks to the remnants of Lee, the next-next hurricane to pass through after Irene. Hence my best-laid manly plans to rise early and swim were dashed as I laid in bed in musine paralysis&#8230; Anyway, I eventually made it to Starbucks and am overlooking [&hellip;]<\/p>\n","protected":false},"author":2222,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[14607,13584],"tags":[16233],"class_list":["post-1847","post","type-post","status-publish","format-standard","hentry","category-research-journal","category-timekeeping","tag-morphospace"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/posts\/1847","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/users\/2222"}],"replies":[{"embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/comments?post=1847"}],"version-history":[{"count":12,"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/posts\/1847\/revisions"}],"predecessor-version":[{"id":1866,"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/posts\/1847\/revisions\/1866"}],"wp:attachment":[{"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/media?parent=1847"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/categories?post=1847"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/tags?post=1847"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}