{"id":66,"date":"2012-08-03T01:13:39","date_gmt":"2012-08-03T01:13:39","guid":{"rendered":"http:\/\/blogs.law.harvard.edu\/signal\/?p=66"},"modified":"2012-08-07T19:11:13","modified_gmt":"2012-08-07T19:11:13","slug":"my-data-science-roadmap","status":"publish","type":"post","link":"https:\/\/archive.blogs.harvard.edu\/signal\/2012\/08\/03\/my-data-science-roadmap\/","title":{"rendered":"My Data Science Roadmap"},"content":{"rendered":"<p><a href=\"http:\/\/blogs.law.harvard.edu\/signal\/files\/2012\/08\/data-brain.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft size-thumbnail wp-image-78\" style=\"margin: 5px 10px;\" title=\"data-brain\" src=\"http:\/\/blogs.law.harvard.edu\/signal\/files\/2012\/08\/data-brain-150x150.jpg\" alt=\"\" width=\"150\" height=\"150\" \/><\/a>I have set a goal to learn Data Analytics and began this journey a while back. \u00a0One means which I am learning Data Science by is EMC&#8217;s Data Science Training. \u00a0They succinctly outline the skills I am looking to master for building a practical foundation of analytics:<\/p>\n<table width=\"372\" border=\"1\" cellspacing=\"0\" cellpadding=\"0\">\n<colgroup>\n<col width=\"132\" \/>\n<col width=\"128\" \/>\n<col width=\"112\" \/> <\/colgroup>\n<tbody>\n<tr>\n<td width=\"132\" height=\"15\"><strong>Problem<\/strong><\/td>\n<td width=\"128\"><strong>Category of Techniques<\/strong><\/td>\n<td width=\"112\"><strong>Methods to Learn<\/strong><\/td>\n<\/tr>\n<tr>\n<td width=\"132\" height=\"60\">Group items by similarity Find structure and commonalities in the data<\/td>\n<td>Clustering<\/td>\n<td width=\"112\">K-means clustering<\/td>\n<\/tr>\n<tr>\n<td width=\"132\" height=\"30\">Discover relationships between actions or items<\/td>\n<td>Association Rules<\/td>\n<td width=\"112\">Apriori<\/td>\n<\/tr>\n<tr>\n<td width=\"132\" height=\"45\">Discover relationships between the outcome and input variables<\/td>\n<td>Regression<\/td>\n<td width=\"112\">Linear Regression Logistic Regression<\/td>\n<\/tr>\n<tr>\n<td width=\"132\" height=\"30\">Assign (known) labels to objects<\/td>\n<td>Classification<\/td>\n<td width=\"112\">Na\u00efve Bayes\u00a0\u00a0 Decision Trees<\/td>\n<\/tr>\n<tr>\n<td width=\"132\" height=\"60\">Find the structure in a temporal process\u00a0\u00a0\u00a0\u00a0 Forecast the behavior of a temporal process<\/td>\n<td>Time Series Analysis<\/td>\n<td width=\"112\">ACF, PACF, ARIMA<\/td>\n<\/tr>\n<tr>\n<td width=\"132\" height=\"60\">Analyze text data<\/td>\n<td>Text Analysis<\/td>\n<td width=\"112\">Regular Expressions, Document representation (Bag of Words), TF-IDF<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<p>In addition to the above I plan to approach with foundation knowledge in Mathematics, Computer Science, Machine Learning, Artificial Intelligence, Predictive Analytics and Life Science. \u00a0Some of this will be via my degree program at Harvard, however the program I am in, Information Technology, only gives some courses that are useful in Data Science. \u00a0Other knowledge will come from additional courses I will take outside of my degree program, books, and possibly even the pursuit of another graduate degree specific to Data Analytics.<\/p>\n<p>A few degree programs that look very attractive are below. \u00a0The prerequisites are what prevent me from pursuing one of these programs at this time. \u00a0I have significant amount of work I need to do to get my Mathematics and Life Sciences foundations built up before I would be able to be admitted. \u00a0My background is in technology and computer science, which is very useful to Data Science, but only one part of a much larger domain of knowledge.<\/p>\n<p><a href=\"http:\/\/advanced.jhu.edu\/academic\/biotechnology\/ms-in-bioinformatics\/\">Master of Science in Bioinformatics &#8211; John Hopkins University<\/a><\/p>\n<p><a href=\"http:\/\/analytics.ncsu.edu\">Master of Science in Analytics &#8211; North Carolina State University\u00a0<\/a><\/p>\n<p><a href=\"http:\/\/www.analytics.northwestern.edu\">Master of Science in Analytics &#8211; Northwestern University\u00a0<\/a><\/p>\n<p><a href=\"http:\/\/www.scs.northwestern.edu\/grad\/mspa\/\">Master of Science in Predictive Analytics &#8211; Northwestern University<\/a><\/p>\n<p><a href=\"http:\/\/scpd.stanford.edu\/public\/category\/courseCategoryCertificateProfile.do?method=load&amp;certificateId=10555807\">Mining Massive Data Sets Graduate Certificate &#8211; Stanford University<\/a><\/p>\n<p><a href=\"http:\/\/www.csml.ucl.ac.uk\/courses\/msc_ml\/?q=node\/143\">MSc Machine Learning &#8211; University of London<\/a><\/p>\n<p><a href=\"http:\/\/web.ccsu.edu\/datamining\/master.html\">Master of Science in Data Mining &#8211; Central Connecticut State University<\/a><\/p>\n<p><a href=\"http:\/\/scpd.stanford.edu\/public\/category\/courseCategoryCertificateProfile.do?method=load&amp;certificateId=1240186#searchResults\">Master of Science Biomedical Informatics<\/a><\/p>\n<div>\u00a0It would likely be three years or more before I would be able to pursue a program such as above. \u00a0In the meantime I plan to build up my knowledge in the various domains.<\/div>\n<p>College Courses I will take outside of Harvard (all of the below have co-requisite labs as well):<\/p>\n<p>Biology I<br \/>\nBiology II<br \/>\nChemistry I<br \/>\nChemistry II<br \/>\nOrganic Chemistry I<br \/>\nOrganic Chemistry II<\/p>\n<p>Courses I am taking or have taken at Harvard that will help in Data Science:<\/p>\n<p><a href=\"http:\/\/www.extension.harvard.edu\/courses\/introduction-statistics\">Introduction to Statistics<\/a><br \/>\n<a href=\"http:\/\/www.extension.harvard.edu\/courses\/java-distributed-computing\">Java for Distributed Computing<\/a><br \/>\n<a href=\"http:\/\/www.extension.harvard.edu\/courses\/oracle-database-administration\">Oracle Database Administration<\/a><br \/>\n<a href=\"http:\/\/www.extension.harvard.edu\/courses\/visualization\">Visualization<\/a><br \/>\n<a href=\"http:\/\/www.extension.harvard.edu\/courses\/computing-foundations-computational-science\">Computing Foundations for Computational Science<\/a><br \/>\nBooks I will be working through:<\/p>\n<p><strong>R<\/strong><\/p>\n<p><a href=\"http:\/\/www.amazon.com\/Data-Mining-Learning-Knowledge-Discovery\/dp\/1439810184\/ref=wl_it_dp_o_pC_nS_nC?ie=UTF8&amp;colid=DE01BWE4H3VT&amp;coliid=I3QNEG41QO8SGY\">Data Mining with R: Learning with Case Studies (Chapman &amp; Hall\/CRC Data Mining and Knowledge Discovery Series)<br \/>\n<\/a><a href=\"http:\/\/www.amazon.com\/The-Book-Michael-J-Crawley\/dp\/0470510242\/ref=wl_it_dp_o_pdT1_nS_nC?ie=UTF8&amp;colid=DE01BWE4H3VT&amp;coliid=ITK47H7ST08NB\">The R Book<br \/>\n<\/a><a href=\"http:\/\/www.amazon.com\/Data-Mashups-R-Jeremy-Leipzig\/dp\/1449303536\/ref=wl_it_dp_o_pC_nS_nC?ie=UTF8&amp;colid=DE01BWE4H3VT&amp;coliid=I1HUXFSD1NPIUT\">Data Mashups in R<br \/>\n<\/a><a href=\"http:\/\/www.amazon.com\/Nutshell-Desktop-Quick-Reference\/dp\/059680170X\/ref=wl_it_dp_o_pdT1_nS_nC?ie=UTF8&amp;colid=DE01BWE4H3VT&amp;coliid=I1X3Y9SZWJUFCK\">R in a Nutshell: A Desktop Quick Reference<br \/>\n<\/a><a href=\"http:\/\/www.amazon.com\/Cookbook-OReilly-Cookbooks-Paul-Teetor\/dp\/0596809158\/ref=wl_it_dp_o_pdT1_nS_nC?ie=UTF8&amp;colid=DE01BWE4H3VT&amp;coliid=IYXY30B77UL9V\">R Cookbook (O&#8217;Reilly Cookbooks)<br \/>\n<\/a><a href=\"http:\/\/www.amazon.com\/Getting-Started-RStudio-John-Verzani\/dp\/1449309038\/ref=wl_it_dp_o_pC_nS_nC?ie=UTF8&amp;colid=DE01BWE4H3VT&amp;coliid=I1GER2I1502S40\">Getting Started with RStudio<br \/>\n<\/a><a href=\"http:\/\/www.amazon.com\/Parallel-R-Q-Ethan-McCallum\/dp\/1449309925\/ref=wl_it_dp_o_pC_nS_nC?ie=UTF8&amp;colid=DE01BWE4H3VT&amp;coliid=I2X7O5RJFVZQXZ\">Parallel R<\/a><\/p>\n<p><strong>Statistics<\/strong><\/p>\n<p><a href=\"http:\/\/www.amazon.com\/Data-Mining-Practical-Techniques-Management\/dp\/0123748569\/ref=wl_it_dp_o_pC_nS_nC?ie=UTF8&amp;colid=DE01BWE4H3VT&amp;coliid=I17V3LJ9XFV7R2\">Data Mining: Practical Machine Learning Tools and Techniques, Third Edition (The Morgan Kaufmann Series in Data Management Systems)<br \/>\n<\/a><a href=\"http:\/\/www.amazon.com\/All-Statistics-Statistical-Inference-Springer\/dp\/1441923225\/ref=wl_it_dp_o_pC_nS_nC?ie=UTF8&amp;colid=DE01BWE4H3VT&amp;coliid=I2ISNLI82SQ9VU\">All of Statistics: A Concise Course in Statistical Inference (Springer Texts in Statistics)<br \/>\n<\/a><a href=\"http:\/\/www.amazon.com\/Think-Stats-Allen-Downey-B\/dp\/1449307116\/ref=wl_it_dp_o_pC_nS_nC?ie=UTF8&amp;colid=DE01BWE4H3VT&amp;coliid=I2P80HVOC0K4ZF\">Think Stats<br \/>\n<\/a><a href=\"http:\/\/www.amazon.com\/Statistics-Nutshell-Desktop-Reference-OReilly\/dp\/0596510497\/ref=wl_it_dp_o_pdT1_nS_nC?ie=UTF8&amp;colid=DE01BWE4H3VT&amp;coliid=I1EUCSB0P6T6VQ\">Statistics in a Nutshell: A Desktop Quick Reference (In a Nutshell (O&#8217;Reilly))<br \/>\n<\/a><a href=\"http:\/\/www.amazon.com\/Statistics-Hacks-Tools-Measuring-Beating\/dp\/0596101643\/ref=wl_it_dp_o_pC_nS_nC?ie=UTF8&amp;colid=DE01BWE4H3VT&amp;coliid=I22KHO4KAX6B4M\">Statistics Hacks: Tips &amp; Tools for Measuring the World and Beating the Odds<\/a><\/p>\n<p><strong>Linear Algebra<\/strong><\/p>\n<p><a href=\"http:\/\/www.amazon.com\/Introduction-Linear-Algebra-Fourth-Edition\/dp\/0980232716\/ref=wl_it_dp_o_pdT1_nS_nC?ie=UTF8&amp;colid=DE01BWE4H3VT&amp;coliid=I2CWEOHZ4HCY1B\">Introduction to Linear Algebra, Fourth Edition<\/a><\/p>\n<p><strong>Machine Learning<\/strong><\/p>\n<p><strong><\/strong><a href=\"http:\/\/www.amazon.com\/Machine-Learning-Action-Peter-Harrington\/dp\/1617290181\/ref=wl_it_dp_o_pdT1_nS_nC?ie=UTF8&amp;colid=DE01BWE4H3VT&amp;coliid=I16U20NFZOLZN8\">Machine Learning in Action<br \/>\n<\/a><a href=\"http:\/\/www.amazon.com\/Machine-Learning-Hackers-Drew-Conway\/dp\/1449303714\/ref=wl_it_dp_o_pdT1_nS_nC?ie=UTF8&amp;colid=DE01BWE4H3VT&amp;coliid=I2WNDBJEGUEVP9\">Machine Learning for Hackers<\/a><\/p>\n<p><strong>Data Mining<\/strong><\/p>\n<p><strong><a href=\"http:\/\/www.amazon.com\/Mining-Social-Web-Analyzing-Facebook\/dp\/1449388345\/ref=wl_it_dp_o_pC_nS_nC?ie=UTF8&amp;colid=DE01BWE4H3VT&amp;coliid=I3I6UXKOQK7JO8\">Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites<br \/>\n<\/a><\/strong><a href=\"http:\/\/www.amazon.com\/Recipes-Mining-Twitter-Matthew-Russell\/dp\/1449303161\/ref=wl_it_dp_o_pC_nS_nC?ie=UTF8&amp;colid=DE01BWE4H3VT&amp;coliid=I3SHS0DTV3XSDB\">21 Recipes for Mining Twitter<br \/>\n<\/a><a href=\"http:\/\/www.amazon.com\/Big-Data-Glossary-Pete-Warden\/dp\/1449314597\/ref=wl_it_dp_o_pC_S_nC?ie=UTF8&amp;colid=DE01BWE4H3VT&amp;coliid=I2XPI5DWUQJMO9\">Big Data Glossary<br \/>\n<\/a><a href=\"http:\/\/www.amazon.com\/Data-Analysis-Open-Source-Tools\/dp\/0596802358\/ref=wl_it_dp_o_pdT1_nS_nC?ie=UTF8&amp;colid=DE01BWE4H3VT&amp;coliid=I1DZS8ESTWUHU2\">Data Analysis with Open Source Tools<\/a><\/p>\n<p><strong>Visualization<\/strong><\/p>\n<p><a href=\"http:\/\/www.amazon.com\/Designing-Data-Visualizations-Noah-Iliinsky\/dp\/1449312284\/ref=wl_it_dp_o_pC_nS_nC?ie=UTF8&amp;colid=DE01BWE4H3VT&amp;coliid=I2ACOTYDALEJIV\">Designing Data Visualizations<br \/>\n<\/a><a href=\"http:\/\/www.amazon.com\/Now-You-See-Visualization-Quantitative\/dp\/0970601980\/ref=wl_it_dp_o_pdT1_nS_nC?ie=UTF8&amp;colid=DE01BWE4H3VT&amp;coliid=I378XEVTPGQ0NG\">Now You See It: Simple Visualization Techniques for Quantitative Analysis<br \/>\n<\/a><a href=\"http:\/\/www.amazon.com\/Beautiful-Visualization-Looking-through-Practice\/dp\/1449379869\/ref=wl_it_dp_o_pC_S_nC?ie=UTF8&amp;colid=DE01BWE4H3VT&amp;coliid=IIL7QZKKLUJ2W\">Beautiful Visualization: Looking at Data through the Eyes of Experts (Theory in Practice)<br \/>\n<\/a><a href=\"http:\/\/www.amazon.com\/Visualize-This-FlowingData-Visualization-Statistics\/dp\/0470944889\/ref=wl_it_dp_o_piT1_nS_nC?ie=UTF8&amp;colid=DE01BWE4H3VT&amp;coliid=I3GR8OJMPHF8XQ\">Visualize This: The FlowingData Guide to Design, Visualization, and Statistics<\/a><\/p>\n<p><strong>Hadoop<\/strong><\/p>\n<p><a href=\"http:\/\/www.amazon.com\/Hadoop-Definitive-Guide-Tom-White\/dp\/1449311520\/ref=wl_it_dp_o_pdT1_nS_nC?ie=UTF8&amp;colid=DE01BWE4H3VT&amp;coliid=I2OHM708L9LBKB\">Hadoop: The Definitive Guide<br \/>\n<\/a><a href=\"http:\/\/www.amazon.com\/HBase-Definitive-Guide-Lars-George\/dp\/1449396100\/ref=wl_it_dp_o_pC_S_nC?ie=UTF8&amp;colid=DE01BWE4H3VT&amp;coliid=I16U6DCJU12H3F\">HBase: The Definitive Guide<br \/>\n<\/a><a href=\"http:\/\/www.amazon.com\/Programming-Pig-Alan-Gates\/dp\/1449302645\/ref=wl_it_dp_o_pC_nS_nC?ie=UTF8&amp;colid=DE01BWE4H3VT&amp;coliid=I2IMNTVS7UNAUU\">Programming Pig<br \/>\n<\/a><a href=\"http:\/\/www.amazon.com\/Cassandra-Definitive-Guide-Eben-Hewitt\/dp\/1449390412\/ref=wl_it_dp_o_piT1_S_nC?ie=UTF8&amp;colid=DE01BWE4H3VT&amp;coliid=IDLACLUPEMIR5\">Cassandra: The Definitive Guide<\/a><\/p>\n<p>There is much I have left out, I am sure, and if anyone has any good books to recommend please do. \u00a0I have found the <a href=\"http:\/\/www.quora.com\">Quora<\/a> fourms to be particularly helpful in networking with others about Data Science.<\/p>\n<p>&nbsp;<\/p>\n<p><a href=\"http:\/\/www.analytics.northwestern.edu\">\u00a0<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>I have set a goal to learn Data Analytics and began this journey a while back. \u00a0One means which I am learning Data Science by is EMC&#8217;s Data Science Training. \u00a0They succinctly outline the skills I am looking to master for building a practical foundation of analytics: Problem Category of Techniques Methods to Learn Group [&hellip;]<\/p>\n","protected":false},"author":4602,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-66","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/archive.blogs.harvard.edu\/signal\/wp-json\/wp\/v2\/posts\/66","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/archive.blogs.harvard.edu\/signal\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/archive.blogs.harvard.edu\/signal\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/signal\/wp-json\/wp\/v2\/users\/4602"}],"replies":[{"embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/signal\/wp-json\/wp\/v2\/comments?post=66"}],"version-history":[{"count":12,"href":"https:\/\/archive.blogs.harvard.edu\/signal\/wp-json\/wp\/v2\/posts\/66\/revisions"}],"predecessor-version":[{"id":69,"href":"https:\/\/archive.blogs.harvard.edu\/signal\/wp-json\/wp\/v2\/posts\/66\/revisions\/69"}],"wp:attachment":[{"href":"https:\/\/archive.blogs.harvard.edu\/signal\/wp-json\/wp\/v2\/media?parent=66"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/signal\/wp-json\/wp\/v2\/categories?post=66"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/archive.blogs.harvard.edu\/signal\/wp-json\/wp\/v2\/tags?post=66"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}