You are viewing a read-only archive of the Blogs.Harvard network. Learn more.

Figured out the GOF in cmdscale()!

1

After a whole bunch of reading around, I finally figured out how the GOF (“goodness of fit”) statistic reported by the R function cmdscale() works. It reports two numbers. The first is the sum of the first k eigenvalues divided by the sum of the absolute values of all of the n-1 eigenvalues, where k is the number of eigenvectors you request from the function and n is the rank of the distance matrix you supply. The second is the same thing but divided by the sum of the positive eigenvalues only (so, the bigger, and better-looking, number).

Although I still don’t understand how the PCO algorithm works exactly, in spite of trying rather hard to work through the original Gower paper from 1966, it still feels like a victory to at least have figured out what the R function I’m using is doing. I also figured out why Mike Foote, in his morphospace papers, kept referring to the ratio of the eigenvalues to the trace of the distance matrix as his measure of variance explained, since the trace (i.e. the sum of the diagonals) of the distance matrix is, by definition, zero.

Anyway. Having done a lot of reading and google-trawling on the subject of PCO, I started feeling rather grotty as the sun went down, and decided I might not be doing myself a favor in the long run pushing myself too hard when I’m sick. So, I decided to call it and try again tomorrow, when I will hopefully be feeling better. It’s been another good day—something new understood that I didn’t understand before, another paragraph written: progress made, momentum gained. Forwards!

Rewriting R before GRD

1

Got very frustrated at the end of the day yesterday—I was trying to write up the code to get the stacked, 3D plot of characters related to linking mechanisms working. The code started to get very messy, because I needed to use the culled version of the matrix to construct the basic morphospace plot by PCO, but I needed access to the full matrix to show all of the relevant characters—because some of them fell victim to the culling process. But since I’ve been referring to the matrix as “m” throughout the code, and variously loading the full or the culled matrix into that variable as needed, this was getting ugly when I needed access to both. I was wasting time and getting lost going back and forth in my code, trying to execute bits and fragments of it to do what I wanted to do.

In short, I realized that it was no longer feasible to store this code as a long script broken into various sections dealing with the various aspects of analysis. The analysis will be so much easier and cleaner if I just repackage this into functions that I can call from a much shorter script, or directly from the command line.

So, this meant a bit of extra unanticipated work, and I was a little slow getting started with this in the morning. I think it’s just a little daunting, because the code I’ve written for the analysis now tallies up to an impressive 28 pages, so it’s a fair bit of effort. But I just don’t trust myself to be able to add another 7 or 8 analysis steps (as I outlined yesterday) without completely derailing myself in the chaos of going back-and-forth in a completely linear, procedural script. Things will get screwed up unless I tidy this up.

Spent most of the day working on this, and got about half way done. Didn’t get to my stated goal (as of yesterday) of spending the afternoon thinking/reading/writing… Bit by bit. I’ll get to it, tomorrow.

Tying the Interface Together

ø

Completely neglected to post yesterday—it was a slow, distracted day but I still got a few things done—mostly assembling bits and pieces of the R routine to read the ImageJ “pipe” file into R and create the SQL INSERT statements.

In the course of thinking that through, made a minor design change in the database. Rather than storing the complete name of each measurement type in the relevant column of the “Measurements” table, e.g. “Length from top of shell to base of ante-/postcephalic chamber”, I thought it might be prudent to store just an integer ID related 1-to-1 to a measurement name. Wrote a quick look-up function to return the full name for each integer ID so that the original design can be easily recovered with a function call. The rationale was that once I have finished collecting data and am at the data analysis stage, it might be very clumsy to formulate the mathematical models in terms of SQL queries including those long labels, and that the volume of a sphere, for example, would be more neatly described as “π x (1)^2” than “π x ‘Width of outer medullary shell’^2”. This might be daft and overkill, but I had a horror vision of running into problems with that and decided to do it this way instead. I think, given that SQLite is so forgiving in terms of data types, it would also be quite easy to revert to my original design and replace the measurement type IDs with the full strings quickly.

Anyhow, that was yesterday, and today is today. For the morning I’d like to get the bits and pieces together and get the interface up and running.

Shuttling Shit Between ImageJ and R

ø

Incredibly slow moving day. Monkeyed around with getting measurements written to file from ImageJ, which you’d think would be easy. And it is. Don’t know why it’s taking me this long; lack of focus, I guess. Got part of the way through doing this and figured out that I’ll need just three ImageJ macros, plus some counting trickery, to do it all:

  1. Set magnification—a macro or set of macros to set the pixel:µm scale for each of the objectives I’ll be using to make my measurements, the first step after taking a picture down the scope.
  2. Write linear results to file—a macro to extract the values from the “Length” column of the results table, pair them with the image filename, and append them to the pipe file. This will do for all of the length, width, and shell thickness measurements.
  3. Write area results to file—a macro to subtract the 2nd through nth values from the “Area” column of the results table from the 1st such value, then append that value to the pipe file along with the image filename. This will calculate the pore area percentage value from the raw measurements of total area and pore areas.
  4. Number trickery—haven’t figured out exactly how this will work, but since I’ll be making measurements on multiple images with separate results windows, I’ll need to keep track of which measurements I’ve made and which number I’m on. Haven’t figured this part out yet.

Aborted day fairly early due to DSA followed by (yet another) job talk. This the penultimate one, thank goodness.

Literature Drudgery

ø

Continued working on the interface this morning, pushing ahead with the tricky “new individual” function. I sorted out the plotting, and figured out how to get the program to pause at that point for the user to switch over to the EOS Utility and ImageJ to do make the measurements for the new individual.

The next tricky part, as alluded to, is assembling taxonomic literature on each of the species I’ll be working with, so that I can adequately identify them, and developing the mathematical models for each lineage. This is going to be a fairly sizeable chunk of work.

I downloaded what I could find from radiolaria.org, which covers about half of the species I’m looking at.

The Interface Begins to Form

ø

Worked long and hard today on the first steps of the interface to the RSQLite database. Got the structure and logic of the main menu working, and have a completely working first menu item—”New slide”. “New individual” is next, and more difficult. Spent quite a long time struggling with how to get raster images to plot in R, but eventually figured out a way, using the pixmap package (which needed to be installed). Not fast, not elegant, but it’ll do the job for my purposes.

library(pixmap);

data <- read.pnm(“piccie.pnm”);

plot(data);

As ever with R—it’s three lines of code, and three hours to find it.

I also now need to confront the fact that I don’t really know the taxonomy of the lineages I’m setting out to study. So, in the process of preparing the “here’s what to measure” images for the interface, I’ll not only need to make the mathematical models I’ll use to calculate silica volume, but also assemble the taxonomic literature I’ll need to be familiar with in order to be able to identify the buggers.

A long list, and I’m not as far as I hoped I’d been—but I’m making good progress, and enjoying myself. That said, I’ve spent a full 9-5 day (9-5:23, actually) staring at the computer screen today, and I’m pooped.