You are viewing a read-only archive of the Blogs.Harvard network. Learn more.

465 Days

1

Or, one year and one hundred days to go, exactly.

Before the fear strikes deep, let me throw myself back into the fray. I left off reading through Beau’s recommended SQL book, which has been terrific. I’m going to plow ahead with that until I get to a point where I can start implementing simple SQL commands through RSQLite. Hopefully before lunch!

Chapter 2 of Beau’s suggested text, “The Practical SQL Handbook”, recommends going through “normalization”, i.e. making sure the database design conforms to the four design standards known as normal forms.

  1. The first normal form requires each row-and-column intersection to be one and only one atomic value. Since I have separated the measurements table out from the individuals table, I am OK here, I think. If I had kept measurements as a list in the individuals table, I would be in violation of this form. The other tables are all OK too.
  2. The second normal form applies only to tables where the primary key is a combination of two or more columns; I have no such tables, so I’m OK.
  3. The third normal form requires every non-key column in a table to be an attribute of the primary key (not an attribute of another column); as the book says, “Every non-key column must describe the key, the whole key, and nothing but the key”. I’m pretty sure this is true of all of my tables.
  4. The fourth normal forms forbids independent many-to-one relationships between the key and non-key columns. Since every non-key column in my tables is a unique attribute of the key column (e.g. every individual has only one species attribute; every measurement has only one measurement type and measurement value and associated image file; every slide has only one age and one preservation attribute), I think I’m OK here too.

The book recommends setting up the tables in SQL and testing them with some dummy data before going ahead with full implementation. This seems like a sensible idea, so off we go with Chapter 3, which will hopefully teach me how to do that (probably with some help from the RSQLite documentation). Hit a bit of a bump here—still can’t find any decent basic documentation for RSQLite that tells me the basics, like how to create a new database, send a query, etc. The official CRAN documentation is an obtuse mess of technical-speak.

After reading half of the archives of the extremely unfriendly R-sig-DB mailing list/support forum, I discovered a hint that the documentation I should be looking for is not RSQLite, the SQLite implementation for R, but rather the documentation for DBI, the R package that handles interfacing for databases generically (for SQLite as well as MySQL, postgreSQL, etc). Took a while to get to this point.

This latter mentioned document finally gave some of that basic information I’d been hunting for. Like the heavens opened up, and a beam of knowledge shone down upon me, the angelic chorus singing hallelujah!

SQL queries can be sent by either dbSendQuery or dbGetQuery. dbGetquery sends the query and retrieves the results as a data frame. dbSendQuery sends the query and returns an object of class inheriting from “DBIResult” which can be used to retrieve the results, and subsequently used in a call to dbClearResult to remove the result.

Finally something I can use. Yay! Now I finally know how to connect to SQLite, I can jump back to reading the book, which will tell me more about the actual SQL commands to use to set up my first table.

Success! Flying high. Made an actual table, even if it’s just a toy. Reading about indexes, but unsure if I need one (at least at this stage). Also unsure about whether to include images stored within the database (this would be as a BLOB data type, apparently), or simply store the system path to the image file, and keep the images in a folder. The latter might offer a bit more stability/security in case the database goes to shit for some reason, and presumably would help performance by keeping the .sqlite file smaller. It might—but this is a real uncertainty—make data entry a little harder. As I think about it, though, I imagine it’ll be as difficult to read the image into a BLOB field as it would be to enter/copy-paste/read the image path or filename into a text field.

previous:
Break for Lurgee, Now SQL
next:
Another Day, Another Snowstorm, More SQL

1 Comment

  1. Beau

    February 1, 2011 @ 4:16 am

    1

    Very excited to read of your progress! I’m impressed with your dedication to the normal forms; most DB designers starting out completely disregard these forms – in fact, according to George Kocur, a lot of databases at Verizon and AT&T don’t obey all the normal forms! So nice work. Your database will treat you well, I’m sure!