Captain Silica’s Log, Stardate 110409
Wednesday
Very productive phone call with Zoe this morning. The upshot is that she is willing to give me the cleaned ODP samples she has in exchange for co-authorship on any papers arising. She had a lot of thoughts, ideas, and feedback on my proposed projects, too, which was very helpful. In summary:
- From just a non-quantitative visual inspection, she thinks there may be no change in the samples, at least through the Cenozoic. The two Cretaceous samples are clearly more silicified, but she thinks there might be no trend in the Cenozoic samples. This should be a concern, because the measurements are going to be a lot of work, and if no pattern emerges from it, I need to have thought about whether it would still have been worth doing the excercise… preferably before I start. She also recommended having back-up measurements to make on these samples, if the “silicification” measurements don’t pan out.
- For example, she noticed nano-scale silica spherules in some of the samples, but not in others. Are these a product of the dissolution of a surficial “smooth” layer deposited on the frustule subsequently eroded during partial dissolution? Does the size of these spherules change through time? There might be functional consequences (in terms of fracture area/energy relationships for mechanical strength).
- Another idea was using AFM (atomic force microscopy) to assess the frustule-scale mechanical properties of diatom frustules over time. (My concern in this endeavor is separating effects of diagenesis from evolution).
- For many of these samples, Zoe thinks she has both the <38µm as well as the >38µm fraction. The larger fraction contains rads (where present), the smaller diatoms. Comparing radiolarian silicification to diatom silicification in the same sample could test the idea that silicification is resulting from competitive interaction between the two groups. (My thought in this regard: this might be a way of getting a comparison of the silicon isotopic composition of the two groups, thus taking a stab at separating the surface and deep-water silicon isotope composition, and thus silica concentration, through time).
- The age determinations in the samples she will provide are very coarse and uncertain. Proper age models will need to be made for each sample eventually.
- Quantifying preservation is really hard in these samples.
- The samples she is sending are all from the North Atlantic. Eventually I should include tropical and/or southern hemisphere samples to get more biogeographic coverage, if I can find suitably well-preserved samples.
Zoe also had some very insightful comments and advice on making progress… in essence, stop being such a perfectionist. “Don’t let perfection get in your way. It’s never perfect.” She encouraged me to go after the major answers, rather than trying to get any project completely right. She is, of course, completely right… It was nice to hear this from a strictly professional colleague in a very supportive and understanding way. And encouraging—I feel better about actually being able to do something satisfactory to these “only the big picture matters” standards.
Aside from time spent DSA blogging, resumed where I had left off walking through Rabosky’s R code. Found that once I culled my complete download of the Neptune database to exclude samples with zero age, the database still contains 72,960 entries, which is more than Rabosky’s file. What else is he excluding? According to his email, he also excluded occurrences with ‘questionable ID’. I’m not sure how this would be denoted—perhaps he means specifically indeterminate occurrences (e.g. Nitzschia sp.)? If I remove all those from the database… wait. This is more difficult than I thought it was, I can’t quite see how to do that in Excel. Maybe I need to do this straight in R. Oh, yuck. This is complicated! There is a lot more to learn in R than I thought. This is going to take a while… Ended the day trying to figure out how to exclude lines of the database with “sp.” in the taxon name, to remove taxonomically questionable entries.
Thursday
Started the day late (is it getting boring to read this yet?) on account of being unable to get out of bed, once again. Spent the morning making headway on the literature search for Cambrian radiolarians from carbonate concretions. Found none—there are so few Cambrian radiolarians, none of the ones that have been reported appear to have been preserved in concretions. Downloaded literature on otherwise preserved Cambrian radiolarians instead. Checked my ongoing radiolarian dissolutions. The Ohio samples were sufficiently buffered that I pipetted a bit of each sludgepot onto a petri dish for drying. I should be able to pop it under the microscope tomorrow to see what I can see. At the moment it just looks like brown muddy sludge.
In the afternoon, took care of some e-mail traffic, and pressed ahead with the radiolarian literature review, now for the Ordovician. Found very few papers reporting radiolarians from concretions, but a fair number from other carbonate settings. Decided not even to bother downloading or looking at papers describing radiolarites, since these will be severely recrystallized and the process of extraction is horrible (involves bone-dissolving HF).
In the late afternoon, turned to the diversity project, and back to R. After much searching, figured out the pattern-matching command for strings in R is grep() and its logical-value-returning sibling, grepl(). Awesome! Now I have the programming tools to do some basic data culling within R. Starting with the full Neptune database (73,422 lines), I can cull out the entries with age zero using
v.nonzero <- v[!v$Sample.Age == 0,]
which gives me 72,960 lines of data. This matches the number I got from doing this manually in Excel, so it’s working. Sweet! Now if I trim out the specifically indeterminate occurrences, i.e. anything that contains “sp.”, using
v.nonindet <- v.nonzero[!grepl(“sp.”, v.nonzero$Taxon.Name), ]
I get 66,946 occurrences. This is pretty damn close to the file I got from Rabosky, which had 66,941 occurences in it—only five fewer. OK, we’re starting to get somewhere.
Friday
Worked from home this morning—broken spokes on my bike need fixing, and I decided to take a lesson in bike repair so I can do this sort of thing for myself in the future (a little soulcraft, I guess); the only time they had available was during the day. Decided to look over the SEM images of Zoe’s samples to see if I agree with her assessment that there probably isn’t any change in silicification through the Cenozoic. To do this, had to download images from her lab server. This ended up taking far too long (there are many hundreds of images), so I aborted that mission.
After lunch, had Phoebe show me how she prepares samples for microscopy. Learned how to filter samples, which is easy but strikes me as potentially destructive to delicate samples—maybe this is why most of what I’ve heard about radiolarian/diatom sample processing involves sieving, not filtering. Learned how to use a hotplate, PVA as a dispersal agent, and epoxy to mount a sample for microscopy. The sample I looked at (M-1, the Ohio concretion I was able to get to dissolve somewhat) seemed to consist of bits of crap with an admixture of bubbles. Great. No radiolarians discernible.
Later in the afternoon, returned to R to begin my second iteration of walking through Rabosky’s code.
Monday
Once again, did not read my Darwin chapters as I should have, and spent the morning catching up on that. In the afternoon, resumed the valiant push ahead in Rabosky’s code. His dataset has two additional columns in it, which are absent from the raw download from Neptune. They seem to be his own additions (maybe—the headers are in lowercase, while all the others are in uppercase). The first, “sp”, appears to be a cleaned up copy of the “Taxon.Name” column, though I’m not sure what has been changed. I presume it’s something, though, because the list of unique taxa is longer in my culled dataset (1,175 taxa) than in Rabosky’s (1,143 taxa). The second, “list”, is a useful one, which I will also need to create. It concatenates the leg, hole, and age data for each occurrence to identify which taxa in a time bin came from the same sample (i.e. were on the same list in the paleontological sense).
Later in the afternoon, continued literature search for radiolarian concretions. In a heroic act of true boredom, waded through pages upon pages of Web of Science search results to find papers on Silurian radiolarians preserved in concretions. There ain’t many.
Tuesday
Arrived early (for once!), and promptly got drawn into helping Tais with the animal phyla card game he’s been developing, which led to a discussion of the computational limits of phylogenetic tree building and biology teaching aims. Then decided to use the morning to pursue deliverable #6, a literature search on relative abundances of diatom and radiolarian chert. After a lengthy search, downloaded about a dozen papers. Leafed through them and felt the tidal wave of confusion and ignorance wash over me. Then I remembered—this is why I stopped reading papers…
In the afternoon: back to R. Figured out how to make the concatenated “list” column for the dataset:
N$List <- paste(N$Leg, N$Site, N$Hole, N$Sample.Age, sep=”x”)
This allows occurrences to grouped together by list, i.e. which species were found in the same sample (the same slide). This also confirms Dan’s suspicion that by “strat.sections” he means “lists”. I’m going to dispense with that name, because it’s just confusing. He means a taxonomic list from on sample. No sense calling this a stratigraphic section. One line that I found a little confusing in the code is where he calculates the occurrences-squared in the getSamplingIntensity function—they are calculated as a sum of the squared occurrences in the whole time bin, not by list. Is that correct?
OK. Managed to comment through Dan’s first function (getSamplingIntensity), and—of course!—it doesn’t work… Let the debugging begin. Grrr. Ah! Simple mistake, was calculating the edges of the time bins wrong. Now it works a treat. Huzzah. Learned that you can save a plot to a file using any number of filetype drivers with these two commands:
dev.copy(png,’myplot.png’)
dev.off()
So here’s the plot comparing my getSamplingIntensity function with Rabosky’s:

The red dots are my number of occurrences per time bin, the black circles are his.
- previous:
- DSA Long-Distance, 11/4/09
- next:
- DSA, 11/11/09


Beau
November 6, 2009 @ 2:21 pm
Wow, you’ve been busy. Can’t pretend that I have any clue what all Zoe’s comments were about, but it does sound like you have a very productive relationship going.
Ah, the grep. Ain’t it nice when a plan comes together, though? While 6,000 is a problem, 5 is a rounding error – nice work.
Anyhoo, glad to read that the Starship Silica continues to boldly go where no graduate student has been before! Best wishes from a fantastically uncomfortable floor in DIA.
kotrc
November 6, 2009 @ 2:27 pm
Hey Beau! Look at that, we’re on the blog at the same time. Thanks for your comments—hope you’re excited to come back to Boston “The Icebox”, MA. Looking forward to some bikerideage this weekend. And if a wheel comes untrue, no worries—I’ve got it covered…