{"id":456,"date":"2014-10-07T10:13:45","date_gmt":"2014-10-07T14:13:45","guid":{"rendered":"http:\/\/blogs.law.harvard.edu\/dpsi\/?p=456"},"modified":"2014-10-24T10:08:19","modified_gmt":"2014-10-24T14:08:19","slug":"developing-big-data-analysis-tools-2-0","status":"publish","type":"post","link":"https:\/\/archive.blogs.harvard.edu\/dpsi\/2014\/10\/07\/developing-big-data-analysis-tools-2-0\/","title":{"rendered":"Developing big data analysis tools 2.0"},"content":{"rendered":"<p dir=\"ltr\" style=\"color: #222222\"><strong><span style=\"font-style: italic;color: #000000\">What we worked on<\/span><\/strong><\/p>\n<p dir=\"ltr\" style=\"color: #222222\">The Big Data team spent the past few weeks introducing the group&#8217;s work so far and setting concrete goals for this semester. We are extremely excited to welcome new highly qualified and interested team members!<\/p>\n<p dir=\"ltr\" style=\"color: #222222\">During our first meeting, the team\u00a0joined the Privacy Tools group of the Center for Research in Computation and Society to talk about the results from the group&#8217;s work last year and the CACM paper that followed.<\/p>\n<p style=\"color: #222222\">Next, we met to learn about every member&#8217;s interests and goals discussed how might we work\u00a0together to\u00a0define and explore the key questions that are interesting to the group.<\/p>\n<p style=\"color: #222222\"><strong><i>Our Work Plan for the Semester<\/i><\/strong><\/p>\n<p style=\"color: #222222\"><span style=\"text-decoration: underline\">Throughout\u00a0the semester we will be asking 3 key questions:<\/span><\/p>\n<p style=\"color: #222222\"><strong><em>(1) Do current anonymization techniques used in large datasets able to maintain the data and its properties reliable and complete?<\/em><\/strong><\/p>\n<p style=\"color: #222222\">Can we use anonymized datasets in research?<\/p>\n<p style=\"color: #222222\">Can robust insights be generated from such anonymized datasets?<\/p>\n<p style=\"color: #222222\">To answer such questions, we will analyze samples of such datasets and try to understand whether analysis of the original datasets and the anonymized ones generates the same results.<\/p>\n<p style=\"color: #222222\"><strong><i>(2) If current methods do not maintain the data&#8217;s key properties, is there an anonymization method that can do so?<\/i><\/strong><\/p>\n<p style=\"color: #222222\">We will experiment with different ways of anonymizing data and try to understand which one, if any, \u00a0generates robust and satisfactory results in a way that maintains the qualities of the original data and does not\u00a0compromise users&#8217; privacy.<\/p>\n<p style=\"color: #222222\"><strong><em>(3) Finally, if anonymizing data in a way that maintains its original properties is not possible, we will research and brainstorm new concepts of privacy<\/em><\/strong><\/p>\n<p style=\"color: #222222\">Can privacy exist without anonymity?<\/p>\n<p style=\"color: #222222\">This is a huge undertaking, and one that many have thought of in the past. We will spend the semester researching different notions of privacy and try to understand what lies at the core of it,\u00a0and whether we can generate a kind of privacy without anonymity. While we may not succeed, we think that spending time on this issue is important.<\/p>\n<p dir=\"ltr\" style=\"color: #222222\"><em><strong>What went well so far<\/strong><\/em><\/p>\n<p dir=\"ltr\" style=\"color: #222222\">Everyone seems genuinely fascinated by the problem and excited to get our hands dirty trying to do some meaningful work on the subject. We are all thrilled\u00a0to be working on a subject that not many others\u00a0have explored before us.<\/p>\n<p style=\"color: #222222\"><em><strong>What was challenging<\/strong><\/em><\/p>\n<p style=\"color: #222222\">Since we are stepping into a path not many have walked before us, we will have to figure things out as we go. This may be challenging at time, but we will work to create a supportive community that will facilitate a productive process.<\/p>\n<p dir=\"ltr\" style=\"color: #222222\"><strong><span style=\"font-style: italic;color: #000000\">What\u2019s up next<\/span><\/strong><\/p>\n<p dir=\"ltr\" style=\"color: #222222\">We have been building a reading list to get everyone up to speed on concepts in data anonymization, de-identification methods, legal requirements, and related notions of privacy. In the next week or so we will discuss the readings and the larger themes around them.\u00a0We are also working on securing\u00a0access to some large data sets so that we can start conducting\u00a0preliminary analysis and visualization.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>What we worked on The Big Data team spent the past few weeks introducing the group&#8217;s work so far and setting concrete goals for this semester. We are extremely excited to welcome new highly qualified and interested team members! During our first meeting, the team\u00a0joined the Privacy Tools group of the Center for Research in &hellip; <a href=\"https:\/\/archive.blogs.harvard.edu\/dpsi\/2014\/10\/07\/developing-big-data-analysis-tools-2-0\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Developing big data analysis tools 2.0<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":7075,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[43435,127507],"tags":[],"class_list":["post-456","post","type-post","status-publish","format-standard","hentry","category-big-data","category-big-data-team-2013"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/archive.blogs.harvard.edu\/dpsi\/wp-json\/wp\/v2\/posts\/456","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/archive.blogs.harvard.edu\/dpsi\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/archive.blogs.harvard.edu\/dpsi\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/dpsi\/wp-json\/wp\/v2\/users\/7075"}],"replies":[{"embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/dpsi\/wp-json\/wp\/v2\/comments?post=456"}],"version-history":[{"count":1,"href":"https:\/\/archive.blogs.harvard.edu\/dpsi\/wp-json\/wp\/v2\/posts\/456\/revisions"}],"predecessor-version":[{"id":458,"href":"https:\/\/archive.blogs.harvard.edu\/dpsi\/wp-json\/wp\/v2\/posts\/456\/revisions\/458"}],"wp:attachment":[{"href":"https:\/\/archive.blogs.harvard.edu\/dpsi\/wp-json\/wp\/v2\/media?parent=456"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/dpsi\/wp-json\/wp\/v2\/categories?post=456"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/dpsi\/wp-json\/wp\/v2\/tags?post=456"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}