{"id":2081,"date":"2012-01-15T16:14:32","date_gmt":"2012-01-15T21:14:32","guid":{"rendered":"http:\/\/blogs.law.harvard.edu\/kotrc\/?p=2081"},"modified":"2012-01-15T17:59:28","modified_gmt":"2012-01-15T22:59:28","slug":"2081","status":"publish","type":"post","link":"https:\/\/archive.blogs.harvard.edu\/kotrc\/2012\/01\/15\/2081\/","title":{"rendered":"Day of Wor(k)ship"},"content":{"rendered":"<p style=\"text-align: justify\">Tried to confirm that I was doing the right thing calculating molecular distances in R, so I downloaded MEGA (which runs on OSX through a cool Wine emulation layer), and ran a distance calculation on the alignments from Ulf Sorhannus with the default settings (which were even more complicated that in the <em>ape <\/em>package, and I understood even less). I then had to go through an export to Excel to add zeros in the diagonal and upper triangle of the matrix so I could then import it into R as a .csv file, but this allowed me to make a crossplot and compare to the morphological distances:<\/p>\n<p style=\"text-align: center\"><a href=\"http:\/\/blogs.law.harvard.edu\/kotrc\/files\/2012\/01\/Rplot01.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-medium wp-image-2078\" src=\"http:\/\/blogs.law.harvard.edu\/kotrc\/files\/2012\/01\/Rplot01-300x300.png\" alt=\"\" width=\"300\" height=\"300\" srcset=\"https:\/\/archive.blogs.harvard.edu\/kotrc\/files\/2012\/01\/Rplot01-300x300.png 300w, https:\/\/archive.blogs.harvard.edu\/kotrc\/files\/2012\/01\/Rplot01-150x150.png 150w, https:\/\/archive.blogs.harvard.edu\/kotrc\/files\/2012\/01\/Rplot01.png 900w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p style=\"text-align: left\">The correlation looks quite a bit better, and the r-squared is 0.1\u2014so about double that by the <em>ape <\/em>methods\u2014but it is still pretty lousy. I&#8217;m not sure what MEGA is doing differently&#8230; But in any case, it&#8217;s still not a stellar correlation, and what has to be explained is why the correlation is so poor, not why it&#8217;s so good. I tried another one (this time using a distance setting I think has an equivalent in <em>ape, <\/em>namely JC, the Jukes\/Cantor model), and it looked similar, also with an r-squared of 0.1, and a different look of the plot (it&#8217;s not so &#8220;binned&#8221; in the molecular distance axis). Maybe there&#8217;s a setting for how many decimal points the <em>dna.dist() <\/em>function in <em>ape <\/em>returns, that&#8217;s set for too coarse of a setting, causing the r-squared to be lower than it would otherwise be?<\/p>\n<p style=\"text-align: center\"><a href=\"http:\/\/blogs.law.harvard.edu\/kotrc\/files\/2012\/01\/Rplot02.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-medium wp-image-2082\" src=\"http:\/\/blogs.law.harvard.edu\/kotrc\/files\/2012\/01\/Rplot02-300x276.png\" alt=\"\" width=\"300\" height=\"276\" srcset=\"https:\/\/archive.blogs.harvard.edu\/kotrc\/files\/2012\/01\/Rplot02-300x276.png 300w, https:\/\/archive.blogs.harvard.edu\/kotrc\/files\/2012\/01\/Rplot02.png 900w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p style=\"text-align: left\">This is compared to the same plot produced using <em>ape <\/em>with the JC69 setting:<\/p>\n<p style=\"text-align: center\"><a href=\"http:\/\/blogs.law.harvard.edu\/kotrc\/files\/2012\/01\/Rplot03.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-medium wp-image-2083\" src=\"http:\/\/blogs.law.harvard.edu\/kotrc\/files\/2012\/01\/Rplot03-300x276.png\" alt=\"\" width=\"300\" height=\"276\" srcset=\"https:\/\/archive.blogs.harvard.edu\/kotrc\/files\/2012\/01\/Rplot03-300x276.png 300w, https:\/\/archive.blogs.harvard.edu\/kotrc\/files\/2012\/01\/Rplot03.png 900w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p style=\"text-align: left\">Hmm. This is strange, actually\u2014the range of molecular distance values is also much less, they only go up to 0.05 in the <em>ape <\/em>plot versus all the way to 0.14 in the <em>MEGA <\/em>plot. Something seems to be fishy. It can&#8217;t be that the distance matrix in <em>MEGA <\/em>is somehow bigger, because if the size of the distance matrix didn&#8217;t match the size of the morphological distance matrix, the <em>plot() <\/em>function would throw an error, which clearly isn&#8217;t happening. Perhaps I am not reading the data in correctly in R using the <em>read.dna() <\/em>function\u2014after all, I don&#8217;t really know what the hell alignments or alignment files are supposed to look like.<\/p>\n<p style=\"text-align: left\">I noticed that when printing the summary of the dna data, I get nonsensical base compositions that suggest maybe R isn&#8217;t reading in the uracil data at all&#8230; though this seems so silly as to be really unlikely.<\/p>\n<pre>Base composition:<\/pre>\n<pre>a     c     g     t<\/pre>\n<pre>0.407 0.322 0.271 0.000<\/pre>\n<p>Could this be the problem? Surely not&#8230; Well. After a lot of searching (<em>ape <\/em>seems to be even less well-documented than all the other packages I&#8217;ve used!), figured out the (retrospectively) obvious way of interrogating the sequence alignments directly:<\/p>\n<pre>(as.character(dna[1:5,1:20]))<\/pre>\n<p>&#8230;which reveals that the &#8220;U&#8221; nucleotide is not being read in by the <em>read.dna() <\/em>function. Wow. That is stunningly awful. I spent a good hour trawling for help, got none, and finally decided to email Maude, who I think might conceivably work on this sort of stuff, for help. I&#8217;m giving up here. Know thy limits.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Tried to confirm that I was doing the right thing calculating molecular distances in R, so I downloaded MEGA (which runs on OSX through a cool Wine emulation layer), and ran a distance calculation on the alignments from Ulf Sorhannus with the default settings (which were even more complicated that in the ape package, and [&hellip;]<\/p>\n","protected":false},"author":2222,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[14607,13584],"tags":[16233,19979],"class_list":["post-2081","post","type-post","status-publish","format-standard","hentry","category-research-journal","category-timekeeping","tag-morphospace","tag-r-trickery"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/posts\/2081","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/users\/2222"}],"replies":[{"embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/comments?post=2081"}],"version-history":[{"count":6,"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/posts\/2081\/revisions"}],"predecessor-version":[{"id":2086,"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/posts\/2081\/revisions\/2086"}],"wp:attachment":[{"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/media?parent=2081"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/categories?post=2081"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/kotrc\/wp-json\/wp\/v2\/tags?post=2081"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}