{"id":1791,"date":"2011-08-19T12:12:01","date_gmt":"2011-08-19T16:12:01","guid":{"rendered":"http:\/\/blogs.law.harvard.edu\/kotrc\/?p=1791"},"modified":"2011-08-24T12:46:55","modified_gmt":"2011-08-24T16:46:55","slug":"let-the-analysis-begin","status":"publish","type":"post","link":"https:\/\/archive.blogs.harvard.edu\/kotrc\/2011\/08\/19\/let-the-analysis-begin\/","title":{"rendered":"Let the Analysis Begin"},"content":{"rendered":"<p>Spent the first part of the day plotting up the results of yesterday&#8217;s first look at the data. Needed to re-learn a whole bunch of stuff I already knew how to do, it&#8217;s amazing how quickly you forget things when you don&#8217;t use them every day. Yikes. Anyway, eventually I was able to plot up and save (!) some graphs showing how many of the 127 characters are &#8220;well-used&#8221; by the genera, that is, how badly characters are affected by missing data or inapplicability.<br \/>\n<a href=\"http:\/\/blogs.law.harvard.edu\/kotrc\/files\/2011\/08\/CharacterQuality1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1792\" src=\"http:\/\/blogs.law.harvard.edu\/kotrc\/files\/2011\/08\/CharacterQuality1.png\" alt=\"\" width=\"480\" height=\"480\" srcset=\"https:\/\/archive.blogs.harvard.edu\/kotrc\/files\/2011\/08\/CharacterQuality1.png 480w, https:\/\/archive.blogs.harvard.edu\/kotrc\/files\/2011\/08\/CharacterQuality1-150x150.png 150w, https:\/\/archive.blogs.harvard.edu\/kotrc\/files\/2011\/08\/CharacterQuality1-300x300.png 300w\" sizes=\"auto, (max-width: 480px) 100vw, 480px\" \/><\/a>What this first plot shows is that a little less than half the characters apply to 80% or more of the genera in the dataset. The other half of the characters have valid character states in anywhere from just a few % to 80% of the genera, somewhat evenly distributed. What this shows, I think, is that there&#8217;s an interesting exercise waiting to be done comparing an analysis of only those widely-applicable characters to one using the full set of characters. I&#8217;d predict that they give a similar answer, but it&#8217;ll be interesting to see if that&#8217;s the case.<\/p>\n<p>Just to make sure the character state &#8220;v&#8221;, which stands for &#8220;variable&#8221; and means that the character state can take multiple values within a single genus (e.g. some species in the genus have spines, others don&#8217;t) isn&#8217;t a big issue, I made the same plot as above including &#8220;v&#8221; in with the invalid character states. The following plot shows that it doesn&#8217;t have a big impact:<\/p>\n<p><a href=\"http:\/\/blogs.law.harvard.edu\/kotrc\/files\/2011\/08\/CharacterQuality2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1793\" src=\"http:\/\/blogs.law.harvard.edu\/kotrc\/files\/2011\/08\/CharacterQuality2.png\" alt=\"\" width=\"480\" height=\"480\" srcset=\"https:\/\/archive.blogs.harvard.edu\/kotrc\/files\/2011\/08\/CharacterQuality2.png 480w, https:\/\/archive.blogs.harvard.edu\/kotrc\/files\/2011\/08\/CharacterQuality2-150x150.png 150w, https:\/\/archive.blogs.harvard.edu\/kotrc\/files\/2011\/08\/CharacterQuality2-300x300.png 300w\" sizes=\"auto, (max-width: 480px) 100vw, 480px\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>I decided next to move right on to the most important part of the analysis, namely the dimensionality reduction (principal coordinates analysis, or PCO). Looked back over Kevin Boyce&#8217;s paper, as well as Lupia (1999) and Gower (1966), on which he based his method. The basic order of steps I need to implement seems to be:<\/p>\n<ol>\n<li>Calculate a dissimilarity matrix, i.e. pairwise distances between each genus-genus pair given some metric. Lupia and Boyce both use &#8220;the sum of all character state mismatches, each scored as one unit distance, divided by the number of possible matches (i.e., all characters minus inapplicable and missing characters).&#8221; This seems sensible. How to handle my &#8220;v&#8221; coding for variable character states is not clear; Boyce included a separate character state for variable, such that the variability itself becomes a character state. This doesn&#8217;t make much sense given the way my taxa were coded (in some genera I sampled only one species, in others many). My options are to treat &#8220;v&#8221; as a missing\/inapplicable character, score it as a match, or score it as a 1\/2 unit mismatch. Perhaps try those and see what happens. [Oh, and I am treating the characters as unordered, same as Boyce, since character state 0 is no closer to 1 than 2 or 3 in any of my characters.]<\/li>\n<li>Boyce then says this matrix &#8220;was transformed to move the centroid of the dissimilarity distribution to zero (Gower 1966).&#8221; This is where I lose the plot, because I&#8217;m not sure what this means, nor where in Gower&#8217;s paper this step takes place. Neither do I have any clue whether the R command to carry out PCO does this transformation or not. Frankly, I don&#8217;t quite understand what moving the centroid of the dissimilarity distribution even means.<\/li>\n<li>Boyce describes the final step to be calculating eigenvalues and eigenvectors of the transformed dissimilarity matrix.<\/li>\n<\/ol>\n<p>Got kind of stuck here over the course of the weekend and the start of the next week&#8230;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Spent the first part of the day plotting up the results of yesterday&#8217;s first look at the data. Needed to re-learn a whole bunch of stuff I already knew how to do, it&#8217;s amazing how quickly you forget things when you don&#8217;t use them every day. Yikes. Anyway, eventually I was able to plot up [&hellip;]<\/p>\n","protected":false},"author":2222,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[14607,13584],"tags":[16233],"class_list":["post-1791","post","type-post","status-publish","format-standard","hentry","category-research-journal","category-timekeeping","tag-morphospace"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/posts\/1791","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/users\/2222"}],"replies":[{"embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/comments?post=1791"}],"version-history":[{"count":6,"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/posts\/1791\/revisions"}],"predecessor-version":[{"id":1799,"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/posts\/1791\/revisions\/1799"}],"wp:attachment":[{"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/media?parent=1791"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/categories?post=1791"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/tags?post=1791"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}