You are viewing a read-only archive of the Blogs.Harvard network. Learn more.

Confidence Blow, Confidence Boost (It’s All in a Day’s Work)

1

Weird day. Started by finally reading the paper for lab meeting this afternoon. Then got into a conversation with Jc about the failures of my projects, why they had all failed, and why I wasn’t—as Dave Johnston had apparently commiserated last night—working on a project that was worthy of my intellect. While I was flattered at the compliment, this brought up all sorts of unnecessary feelings of frustration about all the things that haven’t worked out, all the ways in which I feel I’ve been misadvised, dropped, ignored, and just plain unlucky. I went on for way too long about it, but maybe I needed to vent. It’s been an overwhelming few days with the thoughts about life after graduation/job hunting, planning the wedding, and trying to move the wretched thesis along.

Then spent the promised hour researching the companies and organizations represented at the energy & environment jobs fair. Some reasonably interesting companies there—it’s actually kind of surprising to me how many companies seem to be doing something along the line of what Beau’s company does, albeit in a more specialized fashion… Consultants seem to be everywhere. Everywhere. After about an hour, it all started blurring together—sure, here or there a company had a circular job posting that I vaguely felt I might be able to press a square application into, but who the hell knows? Ultimately I still have no idea whether there’s a position for someone with my background and skills at any of these places. And even less of an idea whether I’d enjoy it.

After lunch, headed off to a very surprising careers fair—surprisingly positive. Met some really rather friendly and fun people involved in what seem to be pretty interesting lines of business, particularly the smaller ones. Walked away with a sense of having genuinely enjoyed chatting to people about what their companies do, and feeling tremendously more optimistic about the task of finding a job. I really didn’t expect the conversations to come as easily as they did. I think I’m really a lot better at it than I give myself credit for: there almost always seemed to be an interesting question, a relevant connection or a humorous detail that sprang to mind at the right moment and kept the momentum of each conversation up, the mood light, and the awkward silences at bay. It really helped boost confidence that two of the people I talked to were (very) recent Harvard grads, both of whom had gotten their jobs through this very career fair last year or the year before! With this many contacts made in just an hour and a half (about as much as I had in me before I was well and truly exhausted), I feel quite confident that something good will come my way in the next year and a half!

I feel like I should record my contact details and interactions somewhere, to keep a record of details, impressions, and leads—perhaps this blog is a good a place as any, being that I can search through it by post tags and categories. Memory is already beginning to fade, so here I go:

  • I was the very first person through the gates, having arrived about ten minutes before the official start of the event, and so I enquired whether I would be awarded with a lollipop for having won the fair. I was rewarded with some chuckles and a Rolo by Brittany Lin, Manager at PowerAdvocate, as well as a promise that if I mentioned the lollipop she’d remember who I was. Clearly something to follow up on. I, on the other hand, am having trouble remembering much about the company. They sell software and “intelligence” to utility companies, and I seem to remember they had developers who wrote the software, who seemed to be fairly stationary in the Boston office, and then people in “Client Services” (where Brittany works) who travel a lot to the clients and help them implement the software. She seemed to emphasize it was quite a small, entrepreneurial sort of company where she could feed back quite quickly from her experiences with the client to the developers of the next version of the software. She had done an internship at some large, corporate firm (in the financial sector, I think—I can’t remember) and hated the rigid, cubicle-bound experience. She went to PowerAdvocate because she got the sense it was the exact opposite sort of company, and seemed to have found this to be true in her experience thus far. There did seem to be a lot of traveling involved with her, and her sister (who works on the financial side of the company I think) was up late into the night working during crunch time when new products come out (or something—don’t remember the details), but on the whole she seemed satisfied that the “work-life balance” was something the company actually realistically respected, unlike most of her experiences in the corporate world. In the year and a half she’s been there, she’s only worked on the weekends once or twice.
  • Next, I spoke Julia Palatine, who works for Apex Green Roofs. The chances of getting a job with these guys are electron-microscopical, as they’re a tiny business and currently only hiring interns and a project manager (someone with construction experience—i.e. Stuart, not me), but what a cool company. They build and maintain green roofs—planting cool shit on top of new buildings, and going around to weed and maintain them. What a cool thing to do. She did say I should send my resume, and that you never know—and I will do that, just because you just never know what may come of it over the course of the next eighteen months. Anyway, I grilled her for a good long while about how their business works, how the roof gardens are built, and how they weather the winters… just because I thought it was really, really cool.
  • My next stop was at the booth for Genscape. Sarah Knight—who works in their HR department—gave me a much clearer idea of what it is these guys do (or maybe I was just starting to hit my stride at this point). In any case, they sell information on energy (mostly electrical grid stuff)—capacities, flow, utilization—using all sorts of fancy magnetic sensors and their own software. They are hiring for a few different positions, but the one I talked to Sarah about was as a Power Market Analyst: these guys show up at work at 6, and have about 6 hours of super stressful work—they have to collect and analyze the data from overnight, and prepare a daily report for their clients—that gets sent out a couple of hours later, and then they spend the rest of the morning making calls to their clients to follow up on the reports. Sounds kind of intense, but weirdly interesting. The best part—by 2:30, they get to go home, and on Fridays they’re off well before that. How cool is that?! Their offices are on Huntington Ave in Boston. She also gave me the name of Mark Doolin, a graduate of the Anthropology department at Harvard, just down the hall. An undergrad, admittedly, but at least someone I could talk to who’d have a bit of a sense of where I’m coming from—and a way to start the conversation, at least.
  • Next I stopped by the eye-catching display of MaxLite, a company that makes LED lights and is hoping to conquer various sectors of the market with this super-efficient technology. I talked to Charlie Andersen, a really young and enthusiastic guy, who also turned out to be a geology grad from Amherst (so we had both a common background and a common acquaintance, Whitey Hagadorn, to talk about). He seemed pretty jazzed about his job because it’s a fairly small company, and he’s both able to turn his ideas quickly into results, and is both able and called upon to perform a lot of different tasks—not a boring, do-the-same-thing-every-day sort of job. He’s also coming back to Harvard to do an MBA, so he may be a useful contact to have in any case—since the company is located in NJ and probably not the place I’d want to work.
  • My last stop was at a company called Harvest Power, who are conveniently based in Waltham, and who turn organic waste into energy and profit (by operating a distributed network of biomass gasification, anaerobic digestion, and composting plants). It’s an interesting business model because it makes something useful out of waste that might otherwise be landfilled, but it also generates profit at both ends of the process—collecting revenue in the form of tipping fees paid by the waste producers at the input, as well as from the sale of the output (energy and compost soil). Again a good connection, Molly Bales, the woman I talked to, was a Harvard undergrad who minored in the EPS department, so again we had some common ground to chat about. There wasn’t too much in the way of direct practical application of her skills on the job, though she mentioned that the “wedge paper” (I assume this one), which she read in a Schrag class, had come up and turned out to be quite handy. She seemed to very much enjoy the job, and gave me the sense that it’s a quickly growing company. When she joined they considered themselves a start-up, which they no longer do; they aim to go public within two years and foresee a lot of growth in the meantime. They’re not hiring for specific positions (beyond internships—yech) at the moment, but she did suggest I send her a resume since these things can change rather quickly. Working in Waltham didn’t sound appealing, but Molly said she loves it—she lives near in Porter near the commuter rail, which drops off at the train station in Waltham just a short walk away from the office, so she’s selling her car as she has no more use for it. Not bad!

As a final hurrah for the week, and very much against my better intuition, I sent an email to the OEB administrator asking to be put on the schedule for Mike Foote’s visit the Tuesday after next. Mike is the Chicago paleobiology heavyweight, and Jerry Mitrovica was (rightly) very impressed by him—and has been urging very firmly that I meet with him while he’s here to try and see if I can impress him into offering me a post-doc in Chicago. Now, a post-doc is of course the last thing I want (and never mind that he might not be in Chicago that much longer since he’s here because OEB is trying to lure him to Harvard), but it can never hurt to make connections, especially when it’s with someone like Mike. He’ll probably think I’m an idiot (I have a good track record in that regard with Chicago folks—Mark Webster basically told me my project and the reasons for doing it were crap when he came through on EHAP, and of course there’s the years of history with Charles… well, Gene Hunt at least was really nice), but whatever. Who cares what he thinks, perhaps he’ll have something helpful to say.

I’m Sorry Dave, I Can’t Do That

1

Although that’s kind of what I felt like telling him, unlike HAL-9000, I do still need Dave. So I finally buckled down this morning to step through the rest of his manuscript, hunting down typos and noting what constructive things I can say about each section.

I had left off the last time at the section “Preservation at the species level”. In his table of % preservation of extant species, he gives diatoms only a <50% percent score, but gives the total extant diversity as 1,500. While I have no doubt that there are at least that many diatom species today, I’m not convinced that there are anywhere near that many that are commonly found as pelagic, planktonic species. Much of the extant diatom diversity (which far exceeds 1,500) is terrestrial/lacustrine or benthic/epiphytic marine, and is sampled in the Souria survey as tychoplankton. I chased up the Souria, 1991 species to see where that number comes from. Indeed, between 37 and 44% (depending on whether you choose the low or high end of their range estimate) of the diatom diversity reported consists of pennates—which are mostly benthic. I think if you were to include only those species that are, like radiolarians, exclusively marine and planktonic, the % of preservation would be far higher. It would also be nice to have the number of preserved fossil species in the table (not just the %age), and a source reference for that number, cited in the table legend. Well, I just checked the Kooistra chapter in Falkowski & Knoll, and he cites numbers between 5,000 and 10,000 for the diversity of strictly marine planktonic diatoms. Oops. Well, scratch that, then.

Much of the rest of the section is OK, although there’s a paragraph on page 7 I have a bit of a hard time with. I don’t think I’ll comment on it, but Dave makes the point that even though preservation is so good, it’s possible that there were times when very ill-preservable species of radiolarians evolved, that left no record. Sure, it’s possible, but so what—it’s also possible that the Earth was invaded time and again over its history by purple alien cloud-people who left no footprints and no traces of their influence. Sure, it’s possible.

He also makes the point that there are ‘significant gaps’ because some regions, such as the gyres, don’t leave much of a microfossil record at all. This is true, but again it seems like peanuts compared with the sort of comparatively appalling preservation characteristic of invertebrate fossils on the shelves.

When Dave talks about hiatuses, again, he makes the deep sea record seem worse in comparison to the shelf record, in a way I’m not sure is fair—because he suggests there are no changes in lithology to suggest a hiatus has taken place. My sense is that changes in lithology don’t necessarily help you in shelf sections—sometimes they might represent very little time, and sometimes there can be consistent lithology over many millions of years. I’m not sure the problems of recognizing the partitioning of time in rocks are really meaningfully different in deep and shallow sediments.

Fortunately Dave ends the section on quite a positive note; perhaps I didn’t quite appreciate this on first reading. He says that in spite of all of this, the record is really good—species level evolution for entire clades for most of the biogeographic provinces over the past 100 million years.

Unfortunately, he then launches into a diatribe about how poorly this record is recovered and documented. This is the next section, “Recovery of Deep-sea Fossil Material”. This section begins with a description of piston-coring, and its fantastic coverage, but the admittedly damning limitation of short timescale. Moving on to deep-sea drilling material, he duly acknowledges the staggering number of fossils already available for study (at least 10^15 specimens, a million times more than all the world’s natural history museums combined), and the fact that most of these come with coeval paleoenvironmental data. Surprisingly (for me, given what I remembered from my first reading of the paper) he also ends this section on a positive note, namely that the record is nearly complete at the species level, given that the MRC holds more than 100 samples per million years for most of the Cenozoic.

In the next section, “IRAT—Imperfections in the Existing Dataset”, he explains why the data generated from these samples is less than complete, and why it’s a problem to use them for paleobio research. “Incomplete Data” outlines a problem with how species are recorded on a slide. Rather than the ideal model, in which the paleontologist records the taxonomic identity of a certain number of specimens, and then moves on to the next slide, thereby obtaining a random and unbiased subsample of the sample in hand, the situation is usually as follows. The paleontologist has a list of taxa that is as short as it can usefully be, and he records presence/absence (or abundance) of those taxa in order to determine the age of the sample.

But on top of this, the paleontologist often records some additional taxa, which do conform to the random sampling. Crucially, he states that “the differences in the average reported diversity per sample/study simply reflect the average practical size of a taxonomic list, and do not have a necessary relationship to actual real sample diversity”. Now this seems to be the key sentence. Does this mean each study has a different taxonomic list, and that’s what determines list length, more so than underlying diversity? If so, this should be easy to test (and I think Dave should do this if he wants to back up his argument): what publication a sample is from should be a better predictor of list length than what time it’s from. So, if the Neptune database has publication information (which I hope it does), you should be able to parse the data by time bins vs. by publication, and see if the variability is better described by what time bin samples are in, or by what publication they’re from. This could be compared by Akaike weights, for example.

The next paragraph—on page 11—is really quite confusing, and stands at the heart of the part of this paper that affects what I’ve been thinking about and doing with diatom diversity. Dave states that most data is collected by his “model C”, so the model where paleontologists record the presence/absence of taxa on a list, plus whatever other taxa they fancy. He states that this leads to a correlation between sample availability and total diversity, but not because of the reason we might think (i.e., going up a collector curve)—but rather, because sample availability is correlated with taxonomic effort. I think what he means here is that sections with more samples available have “model B” taxonomic lists that are longer than sections with few samples. It seems to me, though, that this reduces down to the same thing as collector curves, albeit via the detour of constructing a list: the more diverse-seeming assemblages seem thus because they have longer “model B” taxonomic lists, not because they’ve had more random samples taken, but the reason they have longer taxonomic lists is because there is more “sample availability”, as Dave puts it, which I think means… they have been more extensively—randomly—sampled.

In the next paragraph, he rallies support from a figure (figure 8) that I just don’t understand. The point he’s trying to make is that species are more rarely reported than they should be, I think; what he shows is a histogram of the number of samples from which a radiolarian taxon is reported; 100+ taxa show up in only 1-5 samples, and 40+% of taxa show up in 25 samples or fewer. Besides the fact that the plot is confusing (not clear what the inset plot is, vs. the main plot, nor what the total number of samples is) and the calculation in the figure caption is impossible to follow, I’m not sure this addresses the same point as the preceding paragraph. That paragraph was trying to say that subsampling exercises wouldn’t work because many, or most, of the taxa in the database will be from “model B” lists of stratigraphic marker species. Apart from the fact that this might not be true (see below), the point explored with figure 8 is different.

Is it true that “model C” makes subsampling impossible? I think Dave might have his answer backwards, actually. If the “model B” list is consistent over time (and I’m not sure what Dave’s stance is on that—he seems to want it both ways at the beginning of this section), then you might actually be making a much fairer comparison if you are subsampling by lists, because each list you pull will be comparing apples to apples in its “model B” component. In addition it provides its “model A” component, but that should be subject to the same qualifying properties of random sampling as sanctioned by Dave in the beginning of the section, so it should behave well under subsampling. So, aren’t we actually improving things in this way?

Of course, if lists are different depending on what time interval we’re looking at, then I think the “model C” argument just breaks down to a “model A” scenario, more or less.

The main point here, though, is that this does not distinguish the microfossil record in any way from the rest of the fossil record—dominated by shelf invertebrates—as recorded in PBDB. That record is also a combination of biostratigraphic occurrence data of a limited, and commonly represented, stratigraphically informative species, and a more or less random sampling of other taxa. How does that make the microfossil any worse?

Moving on, the section “Reworking” opens with the claim that reworking affects only the microfossil record, a claim I think can hardly be considered true. I don’t have any great references at hand, but can offer one (sight unseen, thanks to a lapsed subscription to Lethaia, cheers Harvard): Fürsich, F.T. 1978. The influence of faunal condensation and mixing on the preservation of fossil benthic communities. Lethaia, Volume 11, Issue 3, pages 243–250. Also Kidwell, S.M., 1998, Time-averaging in the marine fossil record: overview of strategies and uncertainties: Geobios, v. 30, p. 977– 995. Kidwell, S.M., and Bosence, D.W.J., 1991, Taphonomy and time averaging of marine shelly faunas: in Allison, P.A., and Briggs, D.E.G., eds., Taphonomy: Releasing the Data Locked in the Fossil Record: Plenum Press, New York, p. 115–209. Recent study: DeFrancesco, C. and Hassan, G.S. 2008. PALAIOS; v. 23; no. 1; p. 14-23.

And again, these issues are all the same issues that befall the macrofossil record, too—I don’t think an obviously reworked specimen will be reported by a trilobite worker as occurring in the formation in which it was found.

The first sentence of the “Age Model Problems” sentence really says it all—they’re way better for the marine microfossil record than for any other record we have. And that should be the focus of the paper, not all the things that are wrong with it! In diversity studies, a 1-my error is not a problem if we use 2-my bins. Also, this error is unbiased—and this is a critical point—so for macroevolutionary studies, it really shouldn’t matter. As long as it affects everything equally, and more or less evenly throughout time, we should be golden as long as the signal we’re trying to see is strong enough.

The same “but it’s even worse in the rest of the fossil record” argument can be brought against the “Taxonomy” section, which says because there’s convergence and some morphospecies overlap through time, but this has got to be a pretty minor problem and should only cause ranges to extend very slightly.

“Reworking, Age Model Errors and Macroevolutionary Metrics”. Dave shows his calculations (or their result) that suggest 5% of radiolarian LADs in Neptune are off, and 3% of FADs. He adds it up to a total error of 8% of all occurrences being outside the true range of the species. This leaves the apparent ranges of many taxa extended beyond their true ranges, which is a big difference to most of the rest of paleontology, where the opposite is the case (this is actually a really good point, I think). But because taxa are rare, Dave proposes using range-through; the problem then becomes the artificial range extension and how to deal with it.

I think it might be worth piping up here and putting in a word for Alroy and his distaste for range-through, because of the ugly edge effects it causes. But to illustrate the downfalls of range-through, Dave does something really sketchy here that bothers me a lot. He takes a 1 my time bin for forams in Neptune, and compares what’s found there to what’s supposed to be there based on the biostratigraphic framework.

The Day I Emptied out the Farlow Library

ø

A day for miscellany. Started by tackling email, got a message from Dave reassuring me that I did not tread on any toes with my Neptune database request, but that it was not in fact openly available in full. Wrote most of my regular mini-progress-report for Andy, and spent the rest of the morning at the Farlow library getting my hands on the papers I needed. I got very nearly everything I needed, with the exception of one reference. Weighed down with two huge, bulging bags of books, I now have a monumental amount of photocopying (or scanning, I haven’t decided yet) to do this afternoon.

After lunch, finished writing up my report for Andy. Also found Wil and asked him about dry ice—I wasn’t sure where to get it, how long it would last, what kind of container it should go into, etc. and he was tremendously helpful. In about three minutes I learned everything I needed to know, which would have been impossible to find out any other way (of course, all of this is the sort of information everybody who does microbiology/enzyme extractions/DNA work knows how to do, but doesn’t bother to write down). I now have an appropriately sized cooler, instructions on where to buy dry ice, and how much, and what it will cost, and how long it’ll stay cold. Done!

Had my meeting with Andy at 2:30, which followed a now almost completely predictable format—I hand Andy my report, we go through it, he tells me things are looking good, and keep working. He didn’t have a whole lot in the way of feedback, although he did agree with me that Annika’s concern about the stratigraphic coverage of her Lophocyrtis was less of an issue than she suggested. He also asked a good question in my discussions of the various calculated preservation statistics I showed him (for the diatom diversity/e-o project), which made me realize it would be helpful to also plot diversity on those graphs, to see how—if at all—diversity can be explained by preservation. He also raised the point (though indirectly, and I’m not sure if it was intentional) that it’s ever so slightly sketchy to test explanations of Neptune diversity patterns by using statistics calculated from the same underlying diversity data.

After the meeting, and an extended trip to cookies with our new postdoc Walton as a reward, it was time to head down to the photocopiers and start working through the ginormous pile of literature I brought back from the Farlow this morning.