{"id":10,"date":"2010-12-15T23:31:06","date_gmt":"2010-12-16T03:31:06","guid":{"rendered":"http:\/\/blogs.law.harvard.edu\/markshead\/?p=10"},"modified":"2016-04-26T19:44:55","modified_gmt":"2016-04-26T23:44:55","slug":"how-to-keep-from-giving-away-passwords-like-gawker","status":"publish","type":"post","link":"https:\/\/archive.blogs.harvard.edu\/markshead\/how-to-keep-from-giving-away-passwords-like-gawker\/","title":{"rendered":"Keep From Giving Away Passwords Like Gawker"},"content":{"rendered":"<p>Some time ago I made a comment on Lifehacker.com. In order to comment I had to create a login. \u00a0I used an email account that I mainly use for junk email and a simple password. \u00a0In the past week, some hackers got into Gawker&#8217;s servers and downloaded among other things the user database.<\/p>\n<p>When you store a password in a database, you are supposed to hash it. \u00a0In theory this turns your password into a random string so no one who has access to the database, can actually see the password. \u00a0To verify your login, the server takes the password you give it and runs it through the same hash software. \u00a0If it matches what is stored in the database then you are given access.<\/p>\n<p>The hash function is one way. \u00a0You can easily create the hashed value from the password, but it is difficult or impossible to create the password from the hashed value. \u00a0However, if you have a list of the hashes from a bunch of passwords, it is easy to search for a hash and then find its\u00a0corresponding\u00a0password.<\/p>\n<p>To prove how great they were, the hackers did this to nearly 200,000 accounts and posted it online. \u00a0These accounts were listed with their username, password and email address. \u00a0Mine was one of them. \u00a0Fortunately, the password is something I haven&#8217;t used for several years and I haven&#8217;t used it for anything important for about a decade. \u00a0Still it is a bit unnerving to see my password out there for everyone to read.<\/p>\n<p>I traced down a few places where the password was still being used&#8211;on a bookmark site and another\u00a0comment\u00a0account and changed those. \u00a0Fortunately, I have been using 1Password for a few years so it was easy to do a search and find the accounts that were still using this password.<\/p>\n<p>More recently I have been using 1Password to create a random password for each site I visit. \u00a0This is ideal because a long random password is going to be much more difficult to search for unless they have an enumerated list of every possible combination of keyboard characters of any length. Further, even if someone was able to find the random password that\u00a0corresponds\u00a0to the hash, it would only give them access to the account on one website&#8211;which they probably had to have access to in order to get to the database in the first place.<\/p>\n<h3>Salting Passwords<\/h3>\n<p>One thing Gawker\/Lifehacker should have done is &#8220;salted&#8221; the passwords. \u00a0This would simply be adding some characters to the beginning or end of each password before running it through the hash function. \u00a0When you go to login again, the server simply takes the password you give and adds the same characters to it before running the hash on it again. \u00a0This makes the hashes much more difficult to find because it helps make sure that they aren&#8217;t common words. \u00a0If the hackers don&#8217;t know what the salt is or how it is applied, this should make it pretty much impossible to figure out the passwords.<\/p>\n<p>However, in this case, the hackers got access to the Gawker source code as well, so they would have known exactly how any salt was applied before making the hash. The hackers could go through and create hashes for a bunch of common words with the salt characters added to it, but that means they can&#8217;t rely on existing databases of hash to password mappings&#8211;they would have to calculate everything from scratch. \u00a0Still this is possible and not unreasonable&#8211;particularly because they would only need to run through a bunch of words once in order to see if <strong>any<\/strong> user had used that as a password.<\/p>\n<p>So if you have 1,000,000 users, running such a process against 1,000,000 common passwords will probably give you access to a number of the accounts. \u00a0That would require 1,000,000 hash operations using the dictionary file. I did a test and found that 1,000,000 MD5 hashes takes about 25 minutes on my i7 MacBook Pro. \u00a0This is just using the command line and a shell script. \u00a0I&#8217;m guessing that it could be done much faster using a different method, running it on a bigger machine or dividing it up among different computers.<\/p>\n<p>Things can be made much more secure by setting things up so the password to hash mappings can&#8217;t be used to search the whole dataset. You want the hackers to need to recalculate the hashes for the entire dictionary file for each user. \u00a0This can be done if each password in hashed using a different salt value for each user&#8217;s password.<\/p>\n<h3>Different Salt Values<\/h3>\n<p>This could be as simple as merging the username and passwords together and then creating the password hash from that. This means that in order to compromise the same number of accounts as could be done with the same salt value would now require 1,000,000 X 1,000,000 hash operations. \u00a0This would mean\u00a0significantly\u00a0more time and\/or more machines to run it against all of the passwords. It would take my computer somewhere in the 50 year range to do this. \u00a0Still if you got a cluster of 100 computers that are all 10x faster than my laptop, it could be done in around 400 hours. \u00a0So it is definitely doable, but it is starting to get much more expensive. Amazon&#8217;s high end EC2 instances cost $0.50 to $2.10 per hour so we are looking at costs in the $20,000 or higher range.<\/p>\n<p>Still this isn&#8217;t prohibitive&#8211;particularly if the hackers are willing to use a smaller dictionary of passwords. \u00a0With at 100,000 dictionary of potential passwords the cost on Amazon falls into the $2,000 or higher range.<\/p>\n<h3>Hashing the Username or Email Address<\/h3>\n<p>Another idea would be to hash the username or email address. Usernames are going to be a bit more difficult to attack using a dictionary because they tend to be more unique. \u00a0In fact, if you were to combine the username and the password into the same hash it would be significantly more difficult because you couldn&#8217;t try to break the username or password by itself. \u00a0So something like:<\/p>\n<pre>hash(username + password)<\/pre>\n<p>Of course this would only work if you didn&#8217;t need the username anywhere else. \u00a0For example, you couldn&#8217;t use it to show the name of the person who left a comment. \u00a0This might work out ok if your application uses a username to login and a &#8220;screen name&#8221; to show to other users.<\/p>\n<p>The problem with this is that you couldn&#8217;t look up a user without having their correct username and password. \u00a0This could be problematic for doing any type of password recovery. You might be able to work around this by using a third field that can uniquely identify the user&#8211;like an email address.<\/p>\n<p>If your system doesn&#8217;t need to send emails, hashing the email address might be a better solution because these tend to be much longer and are automatically unique. \u00a0You could still send emails for password recovery because when given an email address you could find the row with a matching hash value.<\/p>\n<p>If the average email address is 20 characters and can contain A-Z, @ and a period, then you&#8217;d be looking at 20^28 different combinations for a dictionary attack. \u00a0Actually the numbers would be\u00a0significantly\u00a0reduced if you took into consideration the fact that most email addresses end in .com, there is only one @ and other common patterns. \u00a0This is partially offset by the fact that some email addresses are longer than 20 characters and the fact that many people have username portions of their email that are more complicated than their passwords.<\/p>\n<p>Something like the hash shown below would be impossible to break on any large scale, but wouldn&#8217;t allow traditional password recovery using email.<\/p>\n<pre>hash(email address + password)<\/pre>\n<p>If someone forgot their password, it would be impossible to lookup their account via email using this approach. \u00a0Still, in some situations that might be acceptable&#8211;particularly if an alternative form of password recovery could be done either in person, SMS or some other method.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Some time ago I made a comment on Lifehacker.com. In order to comment I had to create a login. \u00a0I used an email account that I mainly use for junk email and a simple password. \u00a0In the past week, some hackers got into Gawker&#8217;s servers and downloaded among other things the user database. When you &hellip; <a href=\"https:\/\/archive.blogs.harvard.edu\/markshead\/how-to-keep-from-giving-away-passwords-like-gawker\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Keep From Giving Away Passwords Like Gawker<\/span><\/a><\/p>\n","protected":false},"author":718,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[142],"tags":[],"class_list":["post-10","post","type-post","status-publish","format-standard","hentry","category-technology"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p70xgu-a","jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/archive.blogs.harvard.edu\/markshead\/wp-json\/wp\/v2\/posts\/10","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/archive.blogs.harvard.edu\/markshead\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/archive.blogs.harvard.edu\/markshead\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/markshead\/wp-json\/wp\/v2\/users\/718"}],"replies":[{"embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/markshead\/wp-json\/wp\/v2\/comments?post=10"}],"version-history":[{"count":4,"href":"https:\/\/archive.blogs.harvard.edu\/markshead\/wp-json\/wp\/v2\/posts\/10\/revisions"}],"predecessor-version":[{"id":98,"href":"https:\/\/archive.blogs.harvard.edu\/markshead\/wp-json\/wp\/v2\/posts\/10\/revisions\/98"}],"wp:attachment":[{"href":"https:\/\/archive.blogs.harvard.edu\/markshead\/wp-json\/wp\/v2\/media?parent=10"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/markshead\/wp-json\/wp\/v2\/categories?post=10"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/markshead\/wp-json\/wp\/v2\/tags?post=10"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}