{"id":1538,"date":"2012-10-16T08:05:27","date_gmt":"2012-10-16T12:05:27","guid":{"rendered":"http:\/\/blogs.law.harvard.edu\/pamphlet\/?p=1538"},"modified":"2012-10-19T14:56:43","modified_gmt":"2012-10-19T18:56:43","slug":"for-ada-lovelace-day-2012-karen-sparck-jones","status":"publish","type":"post","link":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/2012\/10\/16\/for-ada-lovelace-day-2012-karen-sparck-jones\/","title":{"rendered":"For Ada Lovelace Day 2012: Karen Sp\u00e4rck Jones"},"content":{"rendered":"<table width=\"140\" align=\"right\" bgcolor=\"#F7EFE5\">\n<tbody>\n<tr>\n<td align=\"center\"><a href=\"http:\/\/en.wikipedia.org\/wiki\/Karen_Sp%C3%A4rck_Jones\"><img decoding=\"async\" src=\"http:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/a\/af\/Karen_Sp%C3%A4rck.jpg\/180px-Karen_Sp%C3%A4rck.jpg\" alt=\"\" \/><\/a><\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center\"><span style=\"color: #999999\">Karen Sp\u00e4rck Jones, 1935-2007<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>In honor of <a href=\"http:\/\/findingada.com\/\">Ada Lovelace Day<\/a>\u00a02012, I write about the only female winner of the <a href=\"http:\/\/www.bcs.org\/category\/5932\">Lovelace Medal<\/a> awarded by the <a class=\"zem_slink\" title=\"British Computer Society\" href=\"http:\/\/www.bcs.org\/\" rel=\"homepage\" target=\"_blank\">British Computer Society<\/a> for &#8220;individuals who have made an outstanding contribution to the understanding or advancement of Computing&#8221;. <a class=\"zem_slink\" title=\"Karen Sp\u00e4rck Jones\" href=\"http:\/\/en.wikipedia.org\/wiki\/Karen_Sp%C3%A4rck_Jones\" rel=\"wikipedia\" target=\"_blank\">Karen Sp\u00e4rck Jones<\/a> was the 2007 winner of the medal, awarded shortly before <a href=\"http:\/\/dx.doi.org\/10.1162\/coli.2007.33.3.289\">her death<\/a>.\u00a0She also happened to be a leader in my own field of computational linguistics, a past president of the <a class=\"zem_slink\" title=\"Association for Computational Linguistics\" href=\"http:\/\/www.aclweb.org\/\" rel=\"homepage\" target=\"_blank\">Association for\u00a0Computational\u00a0Linguistics<\/a>. Because we shared a research field, I had the honor of knowing Karen and the pleasure of meeting her on many occasions at ACL meetings.<\/p>\n<p>One of her most notable contributions to the field of information retrieval was the idea of inverse document frequency. Well before search engines were a &#8220;thing&#8221;, Karen was among the leaders in figuring out how such systems should work. Already in the 1960&#8217;s there had arisen the idea of keyword searching within sets of documents, and the notion that the more &#8220;hits&#8221; a document receives, the higher ranked it should be. Karen noted in her seminal 1972 paper &#8220;<a href=\"http:\/\/scholar.google.com\/scholar?cluster=2767385655895369762&amp;hl=en&amp;as_sdt=0,22\">A statistical interpretation of term specificity\u00a0and its application in retrieval<\/a>&#8221; that not all hits should be weighted equally. For terms that are broadly distributed throughout the corpus, their occurrence in a particular document is less telling than occurrence of terms that occur in few documents. She proposed weighting each term by its &#8220;inverse document frequency&#8221; (IDF), which she defined as log(<em>N<\/em>\/(<em>n<\/em> + 1)) where <em>N<\/em> is the number of documents and <em>n<\/em> the number of documents containing the keyword under consideration. When the keyword occurs in all documents, IDF approaches 1 for large <em>N<\/em>, but as the keyword occurs in fewer and fewer documents (making it a more specific and presumably more important keyword), IDF rises. The two notions of weighting (frequency of occurrence of the keyword together with its specificity as measured by inverse document frequency) are combined multiplicatively in the by now standard <a href=\"http:\/\/en.wikipedia.org\/wiki\/Tf%E2%80%93idf\">tf*idf metric<\/a>; tf*idf or its successors underlie essentially all information retrieval systems in use today.<\/p>\n<p>In <a href=\"http:\/\/www.bcs.org\/content\/ConWebDoc\/10791\">Karen&#8217;s interview for the Lovelace Medal<\/a>, she opined that &#8220;Computing is too important to be left to men.&#8221; Ada Lovelace would have agreed.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Karen Sp\u00e4rck Jones, 1935-2007 In honor of Ada Lovelace Day\u00a02012, I write about the only female winner of the Lovelace Medal awarded by the British Computer Society for &#8220;individuals who have made an outstanding contribution to the understanding or advancement of Computing&#8221;. Karen Sp\u00e4rck Jones was the 2007 winner of the medal, awarded shortly before [&hellip;]<\/p>\n","protected":false},"author":2110,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[6028,380,6027],"tags":[],"class_list":["post-1538","post","type-post","status-publish","format-standard","hentry","category-computational-linguistics","category-computer-science","category-linguistics"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p5pLfN-oO","jetpack-related-posts":[{"id":410,"url":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/2010\/03\/24\/happy-ada-lovelace-day\/","url_meta":{"origin":1538,"position":0},"title":"Happy Ada Lovelace Day","author":"Stuart Shieber","date":"Wednesday, March 24, 2010","format":false,"excerpt":"Fragment of Charles Babbage's first difference engine, from the collection of the Harvard University Collection of Historical Scientific Instruments. In honor of Ada Lovelace Day, here is a fragment of Charles Babbage's difference engine, from the Collection of Historical Instruments at Harvard University. Babbage went on to design a programmable\u2026","rel":"","context":"In &quot;computer science&quot;","block_context":{"text":"computer science","link":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/category\/computer-science\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1829,"url":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/2013\/10\/28\/can-gerrymandering-be-solved-with-cut-and-choose\/","url_meta":{"origin":1538,"position":1},"title":"Can gerrymandering be solved with cut-and-choose?","author":"Stuart Shieber","date":"Monday, October 28, 2013","format":false,"excerpt":"Update\u00a0March 25, 2019:\u00a0Wesley Pegden, Ariel D. Procaccia, and Dingli Yu have an elegant working out of the proposal below that they call \"I cut, you freeze.\" Pegden and Procaccia describe it in a Washington Post opinion piece. \u2026how to split a cupcake\u2026 \u201cHalves\u201d image by flickr user Julie Remizova. Why\u2026","rel":"","context":"In &quot;computer science&quot;","block_context":{"text":"computer science","link":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/category\/computer-science\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1376,"url":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/2012\/05\/29\/shieber\/","url_meta":{"origin":1538,"position":2},"title":"Processing special collections: An archivist&#8217;s workstation","author":"Stuart Shieber","date":"Tuesday, May 29, 2012","format":false,"excerpt":"John Tenniel, c. 1864. Study for illustration to Alice's adventures in wonderland.\u00a0Harcourt Amory collection of Lewis Carroll, Houghton Library, Harvard University. We've just completed spring semester during which I taught a system design course jointly in Engineering Sciences and Computer Science.\u00a0The aim of ES96\/CS96 is to help the students learn\u2026","rel":"","context":"In &quot;computer science&quot;","block_context":{"text":"computer science","link":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/category\/computer-science\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1203,"url":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/2012\/03\/06\/an-efficient-journal\/","url_meta":{"origin":1538,"position":3},"title":"An efficient journal","author":"Stuart Shieber","date":"Tuesday, March 6, 2012","format":false,"excerpt":"\u201cYou seem to believe in fairies.\u201d Photo of the Cottingley Fairies, 1917, by Elsie Wright via Wikipedia. Aficionados of open access should know about the Journal of Machine Learning Research (JMLR), an open-access journal in my own research field of artificial intelligence, a subfield of computer science concerned with the\u2026","rel":"","context":"In &quot;computer science&quot;","block_context":{"text":"computer science","link":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/category\/computer-science\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1348,"url":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/2012\/05\/21\/open-letter-on-the-access2research-white-house-petition\/","url_meta":{"origin":1538,"position":4},"title":"Open letter on the Access2Research White House petition","author":"Stuart Shieber","date":"Monday, May 21, 2012","format":false,"excerpt":"I just sent the email below to my friends and family. Feel free to send a similar letter to yours. You know me. I don't send around chain letters, much less start them. So you know that if I'm sending you an email and asking you to tell your friends,\u2026","rel":"","context":"In &quot;open access&quot;","block_context":{"text":"open access","link":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/category\/scholarly-communication\/open-access\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1561,"url":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/2012\/11\/06\/how-not-to-entice-an-author\/","url_meta":{"origin":1538,"position":5},"title":"How not to entice an author","author":"Stuart Shieber","date":"Tuesday, November 6, 2012","format":false,"excerpt":"...There's a \"tree\" in it... \"Fall New England\" image by flickr user BrtinBoston. Used by permission. I received the attached email, inviting a contribution to a journal called\u00a0Advances in Forestry Letter. Yes, that's \"Letter\" in the singular, which is even still optimistic given the number of papers they've published so\u2026","rel":"","context":"In &quot;open access&quot;","block_context":{"text":"open access","link":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/category\/scholarly-communication\/open-access\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/wp-json\/wp\/v2\/posts\/1538","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/wp-json\/wp\/v2\/users\/2110"}],"replies":[{"embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/wp-json\/wp\/v2\/comments?post=1538"}],"version-history":[{"count":13,"href":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/wp-json\/wp\/v2\/posts\/1538\/revisions"}],"predecessor-version":[{"id":1549,"href":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/wp-json\/wp\/v2\/posts\/1538\/revisions\/1549"}],"wp:attachment":[{"href":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/wp-json\/wp\/v2\/media?parent=1538"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/wp-json\/wp\/v2\/categories?post=1538"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/pamphlet\/wp-json\/wp\/v2\/tags?post=1538"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}