{"id":1170,"date":"2010-09-14T17:38:37","date_gmt":"2010-09-14T21:38:37","guid":{"rendered":"http:\/\/blogs.law.harvard.edu\/kotrc\/?p=1170"},"modified":"2010-09-14T20:06:52","modified_gmt":"2010-09-15T00:06:52","slug":"three-day-push-three-it-starts-with-a-shove","status":"publish","type":"post","link":"https:\/\/archive.blogs.harvard.edu\/kotrc\/2010\/09\/14\/three-day-push-three-it-starts-with-a-shove\/","title":{"rendered":"Three Day Push Three: It Starts With a Shove"},"content":{"rendered":"<p>It&#8217;s been a productive morning, even though I&#8217;ve put off starting on my three-day push thus far. It has been time very well spent, though\u2014a <a href=\"http:\/\/blogs.law.harvard.edu\/kotrc\/files\/2010\/09\/DSA_091410.rtf\">first transatlantic DSA<\/a> session which, in spite of temporarily clogging tubes, was as helpful as ever. And it was good to share with Beau some of the feelings of progress and positive outlook that have been creeping up through the productivity of the past week.<\/p>\n<p>After DSA, I spent the remainder of the morning finishing the photocopying of the Farlow books, and returned them\u2014just in the nick of time, as it turned out! Judy was just leaving the building and said she had decided she&#8217;d give me until noon (and this was seconds before noon)&#8230;<\/p>\n<p>After a short lunch break, it was finally time to settle down for the task at hand: the SQ subsampling algorithm. It took me quite a while to get my brain shifted back in gear for thinking about this. Once I did, I figured calculating a coverage estimator was the first task at hand. This is an estimator that is supposed to capture the extent of the total diversity in a time interval is captured by the diversity the sample for that time interval. The most commonly used, according to Alroy&#8217;s manuscript, is Good&#8217;s <em>u, <\/em>which is 1\u2013o1\/O, where o1 is the number of taxa that occur only once in the sample, and O is the total number of occurrences in the sample. Now, Alroy alters this to replace o1 by p1, which are single-<em>publication <\/em>taxa. The justification is that people are most likely to publish on new things (taxonomic groups, environments, times, places) rather than publish yet another random occurrence of an already well-described phenomenon (like a certain taxon). This, in turn, is likely to distort\u00a0<em>u <\/em>as an estimator of taxonomic coverage. I&#8217;m not sure I understand why exactly, but I guess the more publications you have, the larger your O gets. But\u00a0your o1 would also increase, and you would actually start to increase o1\/O, which would decrease <em>u, <\/em>even though coverage should be getting better. This is, I think, what Alroy means.<\/p>\n<p>What&#8217;s confusing, though, is his suggested solution: instead of o1, single-occurence taxa, he suggests counting p1, single-publication taxa. I&#8217;m not sure how this is any different. When would a single publication have multiple occurrences of a taxon? Well, I suppose the publication could describe a stratigraphic section with multiple formations, and the taxon could occur in any number of those. OK, so in measuring single-publication taxa, we&#8217;re including a greater number of taxa in the numerator of that ratio, which helps&#8230; make <em>u <\/em>even smaller?<\/p>\n<p>In any case, although I don&#8217;t quite understand the justification, I feel like the reasons for this substitution listed by Alroy don&#8217;t really apply to the Neptune data. In some sense, the Neptune data are much closer to the &#8220;random sampling&#8221; that Alroy (correctly) suggests most paleontological collections are not. In those, he writes, &#8220;the point of publishing is not&#8230;to list further random samples of what might already be well-known times, places, environments, and taxonomic groups&#8221;. While this criticism is true in the choice of DSDP\/ODP borehole locations, I think in some ways, Neptune data does record, quite systematically, at least all the taxa that are found in a location, not just those that are new and interesting.<\/p>\n<p>Spent a lot of time (most of the afternoon, sadly) scratching my head about this. Perhaps I need to just go ahead and code Good&#8217;s <em>u <\/em>in the simplest, original formulation, <em>u = o1\/O. <\/em>Once I actually sat down to do this, it was surprisingly (almost shockingly!) quick to do. First hurdle down. Next, I wanted to get a sense for what this statistic looks like over the course of the Cenozoic for the diatom data in Neptune, so I quickly bashed out a for loop to do that:<\/p>\n<p style=\"text-align: center\"><a href=\"http:\/\/blogs.law.harvard.edu\/kotrc\/files\/2010\/09\/Goods-u.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-1181\" src=\"http:\/\/blogs.law.harvard.edu\/kotrc\/files\/2010\/09\/Goods-u-300x171.png\" alt=\"\" width=\"300\" height=\"171\" srcset=\"https:\/\/archive.blogs.harvard.edu\/kotrc\/files\/2010\/09\/Goods-u-300x171.png 300w, https:\/\/archive.blogs.harvard.edu\/kotrc\/files\/2010\/09\/Goods-u-1024x585.png 1024w, https:\/\/archive.blogs.harvard.edu\/kotrc\/files\/2010\/09\/Goods-u.png 1050w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p style=\"text-align: left\">Interestingly, much like the &#8220;preservation&#8221; indicators I was looking at before, this statistic doesn&#8217;t change a whole lot. There&#8217;s one sample in the early Eocene that has a very low <em>u<\/em>\u2014but I&#8217;m willing to bet that&#8217;s because it&#8217;s just a very small sample. Otherwise, <em>u <\/em>sticks quite firmly between about 60 and 80%. This suggests to me, at a first pass, that the subsampling-corrected diversity curve using this approach will look pretty similar in shape to the raw data. If anything, the Oligocene looks like a coverage optimum here, and the late Eocene as a coverage minimum. We&#8217;ll see what that does to the final shape of the curve.<\/p>\n<p style=\"text-align: left\">Hold on! I think I just realized I made a mistake in my programming of this &#8220;really easy&#8221; algorithm&#8230; I calculated O as the total diversity, rather than the number of occurrences. Re-do:<\/p>\n<p style=\"text-align: center\"><a href=\"http:\/\/blogs.law.harvard.edu\/kotrc\/files\/2010\/09\/Goods-u1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-1183\" src=\"http:\/\/blogs.law.harvard.edu\/kotrc\/files\/2010\/09\/Goods-u1-300x171.png\" alt=\"\" width=\"300\" height=\"171\" srcset=\"https:\/\/archive.blogs.harvard.edu\/kotrc\/files\/2010\/09\/Goods-u1-300x171.png 300w, https:\/\/archive.blogs.harvard.edu\/kotrc\/files\/2010\/09\/Goods-u1-1024x585.png 1024w, https:\/\/archive.blogs.harvard.edu\/kotrc\/files\/2010\/09\/Goods-u1.png 1050w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p style=\"text-align: left\">OK, so that&#8217;s not much different. Remarkably constant over time, is the bottom line. So my struck-out thought above stands: probably, this method is going to recover a corrected diversity curve that&#8217;s similar to the raw data.<\/p>\n<p style=\"text-align: left\">Lost substantial amounts of steam at 8pm and decided to bag it in for the day. Not the most productive big day push so far. Hard to get the mind focused on the single goal when there are so many urgent other things that need to get done, too&#8230; Hopefully tomorrow will be better.<\/p>\n<p style=\"text-align: left\">\n","protected":false},"excerpt":{"rendered":"<p>It&#8217;s been a productive morning, even though I&#8217;ve put off starting on my three-day push thus far. It has been time very well spent, though\u2014a first transatlantic DSA session which, in spite of temporarily clogging tubes, was as helpful as ever. And it was good to share with Beau some of the feelings of progress [&hellip;]<\/p>\n","protected":false},"author":2222,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[13584],"tags":[],"class_list":["post-1170","post","type-post","status-publish","format-standard","hentry","category-timekeeping"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/posts\/1170","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/users\/2222"}],"replies":[{"embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/comments?post=1170"}],"version-history":[{"count":11,"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/posts\/1170\/revisions"}],"predecessor-version":[{"id":1178,"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/posts\/1170\/revisions\/1178"}],"wp:attachment":[{"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/media?parent=1170"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/categories?post=1170"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/tags?post=1170"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}