You are viewing a read-only archive of the Blogs.Harvard network. Learn more.
Skip to content

Google Privacy Videos

I recently ran across a series of videos produced by Google to explain the data collection of its search engine. Ms. Oyhe, an attractive, professional, and terribly reassuring support engineer explains what sorts of data Google collects and, implicitly, why users should not be overly concerned about the data collection:

To improve out search results as well as maintain security and prevent fraud we remember some basic information about searches. Without this information, our search engine wouldn’t work as well as it does or be as secure. … We’re able to do that [replace ‘carss’ with ‘cars’] because we’ve analyzed search queries in our logs and found that when people type in ‘carss’ they really mean ‘cars’. … Only your provider can directly match you with your ip address. … What a cookie doesn’t tell google is personal stuff about you like where you live and what your phone number is. … In the same way that a store keeps a receipt of your purchases, google keeps a kind of receipt of your visit called a log. … As you can see, logs don’t contain any truly personal information about you.

All of this is true in a narrowly technical sense. What’s missing is the recognition that the importance of data is determined by the larger world in which it lives — by the other data that it connects to. So when Ms. Ohye asserts that a cookie doesn’t tell google “personal stuff about you like where you live” that’s only true in the sense that the cookie that your driver’s license number doesn’t tell the police where you live. As with the driver’s license number, however, even though the cookie itself is just a random string of gibberish letters, it can indeed be used to lookup personal information “like where you live.”

For example, the cookie connects, reasonably well, all searches performed by a single person. Many, many people search for their own names at some point and for their own addresses at some point (if for no other reason than to see their houses in google maps). The cookie connects those two searches to the same (otherwise anonymous) person, thus potentially identifying the name and address of the person behind the random gibberish of a particular cookie. This method of identification is not perfect, but researchers have consistently shown the ability to crack the identity of individual users in these kinds of data collections with anonymous but individually unique identifiers, as learned by AOL and Netflix.

In fact, it’s likely that this collection of search terms, ip addresses, and cookies represents perhaps the largest, most sensitive single collection of data extant, on- or offline. Google may or may not choose to do the relatively easy work necessary to translate its collection of search data into a database of personally identifiable data, but it does have the data and the ability to query personal data out of the collection at any time if it chooses (or is made to choose by a government, intruder, disgruntled worker, etc).

One Trackback/Pingback

  1. […] privacy questionable methods that we privacy interested folks worry about. They tap into their own extensive search logs, the even more extensive data from the adwords system, the extensive data from their analytics […]