Archive for April, 2012

Many eyes project by IBM

Monday, April 16th, 2012

Today I had a quick look at the Many Eyes project. It looks very interesting.

 

You can make your own visualization using their nice collection of data sets.

Look at this example:

http://www-958.ibm.com/software/data/cognos/manyeyes/visualizations/food-group-consumption-by-state

 

Data sources

Monday, April 16th, 2012

Permanently under development:

Useful data sources

 

Harvard IQSS Dataverse

All source of data including Mostly Harmless Econometrics

dvn.iq.harvard.edu

 

The National Digital Forecast Database

Weather forecast

http://ndfd.weather.gov/technical.htm

Wind patterns in the United States

Monday, April 16th, 2012

See how is wind blowing in the United States right now:

http://hint.fm/wind/

 

Using PlotVectors from R

 

(HT Revolutions)

Ocean currents video by NASA

Saturday, April 14th, 2012

This time NASA made a video of ocean currents. NASA has similar videos for the Gulf Stream and for the Mediterranean Sea.

 

www.youtube.com/watch?v=WEe1bVjORN4

 

(HT: Revolution)

Edge-bundled, timeline-enhanced visualizations

Saturday, April 14th, 2012

This company provides solutions for using human element in the pattern detection. Based on videos, it looks very useful for almost any fraud detection application like money laundering or medical fraud.

Check out their videos: http://synerscope.com/resources

(HT: LG)

FAS webmail, gmail, forwarding and POP3

Wednesday, April 11th, 2012

As you might be aware, webmail at fas is not forwarding properly. It omits emails once a while. On the other side, it is still the best solution due to the speed, spamfilter and overall integration, but it can be enhanced by POP3. Following is my solution.

Log into webmail.fas.harvard.edu and in Settings go to mail forwarding. Insert your gmail address.

In your gmail mailbox, you can filter messages that were sent to your harvard address (“Filter messages like this”). You can skip Inbox, you can apply label “harvard,” do as you wish. Then in Settings go to Accounts and Imports, Check mail from other accounts (using POP3), and add new account. It almost automatically sets up everything. I download everything to my gmail account (and use IMAP it to my computer using Mozilla Thunderbird with GPG).

You get your emails immediately by forwarding, but if Harvard’s webmail forgets to send it (at least twice a month in my case), you get the email anyway.

Btw, you can send emails from gmail as if they were sent from fas.harvard.edu. THIS WILL BE EXPANDED: In   Setting, Accounts, Add Account, change SMTP to smtp.fas.harvard.edu…

Ocean Shipping Visualized

Wednesday, April 11th, 2012

Ben Schmidt has a very nice visualization of the ocean shipping in R. Enjoy:

(HT:  Revolutions)

R, parallelization and large datasets

Tuesday, April 10th, 2012

When we have a task that would take a long time, we can usually think about parallelization. In this post I will show how to deal with an issue when you have large shared data set (but not that big so you would need MapReduce).

Let’s first start with how to set up cluster in R:

Cluster set-up using doSNOW
Revolution Analytics pulled out doMC; therefore, I am using doSNOW.


library(foreach)
library(doSNOW)


numberofcores <- 4

makeCluster(numberofcores)
registerDoSNOW(cl)

foreach (ind=1:1000) %dopar% foo_with(bigdata)

stopCluster(cl)

There are two issues here. This code gives us an error message that the function foo_with and you are transporting a lot of data what causes slow down.

Solution for both problems
Push data into your cluster by:

clusterExport(cl, bigdata)

Function can be either pushed by clusterExport or we can use clusterApply or clusterApplyLB

clusterApplyLB(cl, array, foo_with_rewritten,...)

This blog post shows the solution in between simple SNOW (or different) cluster computing just MC or similar and cluster that needs MapReduce.