You are viewing a read-only archive of the Blogs.Harvard network. Learn more.

When Will the Snow Stop?

ø

You’d think that the snow wouldn’t be an obstacle. Especially at this stage—even when I can’t travel to the office, I should be able to work just fine at home. You’d think. Somehow, yesterday went down the toilet, work-wise… I felt incredibly tired and lazy in the morning, no doubt aided by severely dark and snowy weather, and sacrificed the afternoon for the long-anticipated and much-postponed thesis defense of Mark Romanowsky. By the time the evening rolled around, I had lost all desire to start work…

Well, a new day, a new start.

I finally got some feedback from Jacques on my morphospace characters. He astutely observed that—in spite of having attempted to standardize terminology and characters such that they would apply to the widest possible range of taxa—there are still many characters, like those describing the shape and other attributes of the raphe, will only apply to a subset of all taxa. The problem, he points out, is that the morphospace will then essentially have different dimensionality for different taxa.

He’s right, of course. My first instinctual response is to point out that there are two basic options: either include those characters not universal to all forms, and deal with the consequences, or leave them out. The latter would force me to exclude some really quite important morphological features of many diatoms. One solution, of course, is to code all of the characters, but then run the analysis twice, once with those characters included, and once without. Simple enough.

He mentioned that morphometrics must have well-established ways of dealing with this problem. Kevin Boyce, on whose thesis I’m closely basing my morphospace project, doesn’t seem to explicitly address how he dealt with missing or inapplicable characters, though he appears to have coded everything that’s not a character state as “?” universally. He refers to a paper by Lupia (1999), which applies the dimensionality reduction method (PCO, or principal coordinates analysis) to fossil pollen. His description of the method states he uses a “simple matching coefficient” to calculate pairwise distances between taxa in the analysis; this simple matching is:

…the sum of all character state mismatches, divided by the number of possible matches (i.e., all characters minus inapplicable and missing characters).

This seems happy enough as an answer for me, and I summed this up for Jacques in an email that ate up the first part of the day.

Then, time to move on to the main task of the moment—getting my SQLite database set up in R, ready to collect data for the radiolarian lineage project. I left off on Tuesday at a point of readiness to start setting up a trial version of the database design I laid out over the preceding days… so, here goes!

Realized that INTEGER would not be the ideal data type for the hole_id row, since the holes have descriptors like 0699A, 0701C, 1041, etc. So perhaps a CHAR type would be better (of course, SQLite doesn’t actually care, since it lets you put any data type into a field, regardless of how it was defined). Seems to work fine.

Also realized that the primary key for the Slides table will actually be a composite key, consisting of the hole_id and slide_id  values. The unique designator of the slides contains the Hole ID, and while the non-Hole ID part of the designator is unlikely to be a non-unique value (two slides would have to have the same core, section, interval-top and interval-bottom values, which is very unlikely), it is still possible, so it’s probably smart to designate the primary key as the full combination of Hole ID and Slide ID.

I eventually got stuck when I was tinkering around with the basic set-up (all of the tables except the measurements one). I tried inserting a row in the Slides table that contained a value for the foreign key field (“Hole ID”) not found in the parent table primary key—which should not work. It inserted just fine. I did some digging and found indications that SQLite has foreign key constraints deactivated by default, and I couldn’t get the activation commands to execute in R—they are supposed to be set at compile time, whatever the hell that means in my case. Confusingly, the foreign key commands (x REFERENCES y) seem to parse fine (at least they are not throwing an exception), which they’re not supposed to do if the foreign key constraints are disabled. So maybe they are enabled at compile time (in the RSQLite package), but I’m doing something wrong? Or expecting a behavior I’m not supposed to expect.

previous:
Another Day, Another Snowstorm, More SQL
next:
RadData: Hello, World

Comments are closed.