Archive for December, 2011

Determining the trustworthiness of what we read online is important.

Wednesday, December 21st, 2011

Yesterday I was informed that Senator Tom Coburn published a report entitled “Wastebook, A Guide to Some of the Most Wasteful and Low Priority Government Spending of 2011”, which included my NSF grant “Trails of Trustworthiness: Understanding and Supporting Quality of Information in Real-Time Streams” as example number 34.

I was not familiar with Senator Coburn’s publication and was surprised and curious to see what would characterize my project as “wasteful”. My colleagues and I have been working on the problem of information reliability over the last 5 years and we have published more than a dozen papers in refereed conferences and journals, three of which have received the “Best Paper” distinction. What was it that Senator Coburn found so unacceptable that reviewers and scientific audiences overlooked? After reading the relevant sections of his publication, I was even more confused as to what the Senator deemed objectionable. Everything he mentions regarding my project seems positive:

Do you trust your twitter feed? The National Science Foundation is providing $492,055 in taxpayer dollars to researchers at Wellesley College to answer that question.    Researchers cite “the tremendous growth of the so-called Social Web” as a factor that will “put new stress to human abilities to act under time pressure in making decisions and determine the quality of information received.” Their work will analyze the “trails of trustworthiness” that people leave on information channels like Twitter. They will study how users mark certain messages as trustworthy and how they interact with others whose “trust values” are known.    The NSF grant also includes funding for an online course to study “what critical thinking means in our highly interconnected world,” in which we might be “interacting regularly with people we may never meet”.

However, the proposal is condescendingly titled,  “To Trust or Not to Trust Tweets, That is the Question.” This suggests that the author of the report may think that trust in online communication is not worth studying, or that Twitter is unworthy to be mentioned in a scientific proposal. But to those who have actually read the details of the proposal, this is a superficial criticism. What we are proposing to do is to create semi-automatic methods for helping people determine the credibility of the information they receive online. From recent events in the Arab world, Russia, and Mexico, for example, we know that people look to online media to receive information they can trust, while oppressive governments and drug cartels try to confuse them by spreading misinformation. Even in the US, the cost of misinformation is high; investors have lost millions from untrustworthy online information and little-known groups are trying to influence our elections by spreading lies. Being able to determine what information can be trusted has always been important and will be critical in the future.

It’s unlikely that Senator Coburn himself actually read thousands of NSF grant descriptions to determine which ones appear wasteful. Furthermore, such proposals are written for a scientific audience and require specific expertise to evaluate. And I am sure that the Senator does not believe that critical thinking education and technologies for supporting trust and credibility are “wasteful”. So how did this proposal end up in his report?

 

On the Senator’s “Wastebook” web page, there is a link next to a picture of Uncle Sam inviting readers to “Submit a tip about Government Waste”. By clicking on it, one can suggest examples of wasteful spending to the Senator. I wouldn’t be surprised if someone with only a cursory understanding of our proposal recommended it as wasteful. In this case — and perhaps in many others — a provider of online information has misled Senator Coburn. Therefore, this report itself is proof that determining the trustworthiness of what we read online is important.

 

Predict the Future (and Tell the World about it!)

Saturday, December 10th, 2011

In my previous posting (Predict the Future!) I was arguing for the benefits and risks of making predictions using data gathered from Social media. I will take this opportunity to mention a Call-For-Papers that I am involved in. The online journal “Internet Research”, famous for having published the original article by Tim Berners-Lee on the creation of the WWW, is having a special issue on “The Power of Prediction with Social Media” to be published in 2012. Below are the details. If you have any questions, please do contact me or any of the other guest editors.

 


Special issue call for papers on
“The Power of Prediction with Social Media”
from Internet Research, ISSN: 1066-2243

Editor in Chief: Jim Jansen

Overview

Social media today provide an impressive amount of data about users and their societal interactions, thereby offering computer scientists, social scientists, economists, and statisticians many new opportunities for research exploration. Arguably one of the most interesting lines of work is that of forecasting future events and developments based on social media data, as we have recently seen in the areas of politics, finance, entertainment, market demands, health, etc.

But what can successfully be predicted and why? Since the first algorithms and techniques emerged rather recently, little is known about their overall potential, limitations and general applicability to different domains.

Better understanding the predictive power and limitations of social media is therefore of utmost importance, in order to –for example– avoid false expectations, misinformation or unintended consequences. Today, current methods and techniques are far from being well understood, and it is mostly unclear to what extent or under what conditions the different methods for prediction can be applied to social media. While there exists a respectable and growing amount of literature in this area, current work is fragmented, characterized by a lack of common evaluation approaches. Yet, this research seems to have reached a sufficient level of interest and relevance to justify a dedicated special issue.

This special issue aims to shape a vision of important questions to be addressed in this field and fill the gaps in current research by soliciting presentations of early research on algorithms, techniques, methods and empirical studies aimed at the prediction of future or present events based on user generated content in social media.

Topics

To address this guiding theme the special issue will be articulated around, but not limited to, the following topics:

  1. Politics, branding, and public opinion mining (e.g., electoral, market or stock market prediction).
  2. Health, mood, and threats (e.g., epidemic outbreaks, social movements).
  3. Methodological aspects (e.g., data collection, data sampling, privacy and data de-identification).
  4. Success and failure case studies (e.g., reproducibility of previous research or selection of baselines).

Schedule

  • Manuscript due date: June 1, 2012
  • Decisions due: August 1, 2012
  • Revised paper due: September 15, 2012
  • Notification of acceptance: October 1, 2012
  • Submission of final manuscript: October 31, 2012
  • Publication date: late 2012 / early 2013 (tentative)

Submission

All submitted manuscripts should be original contributions and not be under consideration in any other venue.

Publication of an enhanced version of a previously published conference paper is possible if the review process determines that the revision contains significant enhancements, amplification or clarification of the original material. Any prior appearance of a substantial amount of a submission should be noted in the submission letter and on the title page.

Submissions must adhere to the “Author Guidelines

Detailed instructions will be announced later this year.

Guest editors

 

Predict the Future!

Friday, December 9th, 2011

The title may seem redundant. Of course if you  are going to predict, you should predict the future — what else, predict the past? But, when referring to social media data it may not be that redundant. In recent years there has been an increase of research on social media data predicting the future, predicting the present, and predicting the past using knowledge acquired in the future.

Why is predicting important? Predicting is equivalent to intelligence, with an important qualification: We admire the intelligence of someone who can predict what is going to happen, but only when they can explain why they are able to do so. If one (e.g., an octopus) is able to predict without explanation, we tend to downgrade it as coincidence.

Earlier today, the Pew Research Center on Journalism published an analysis entitled “Twitter and the Campaign“. They present a detailed study of millions of tweets and blogs, about what people say on social media about the candidates for the 2012 elections. (Not too many nice things, it turns out, except for Ron Paul, who, at the same time, is trailing on the polls.)

So, what does this mean for the predictive power of Twitter? Is he going to win because tweets have good things to say about him, or will he lose because tweets have good things to say about him? (Hint: The answer is “yes”.)

Shepard Fairey meets Angry Birds: Poster of our 2011 ICWSM submission "Limits of Electoral Predictions using Twitter"

Earlier this year, with my colleagues Eni Mustafaraj, Dani Gayo-Avello and student Catherine Lui we studied this question. Can one, analyzing social media data, predict the outcome of the US congressional elections? We did not find encouraging results, in neither the Google Trends data nor the Twitter data — thus the ingenious poster above that Dani designed.

When it comes to something so important as the elections, social media will be manipulated, because the stakes are too high. One should keep that in mind as we get closer to election time and “news articles” will start appearing arguing that someone will win or lose based on the number of friends or followers this candidate has. If the author gets it right, he will make sure to remind us in the future. If he gets it wrong, he will forget it first.

Today's mentally flexible tweet. Why is this important? What is special about the last 24 hours? Who is missing?

This does not mean that nothing can be predicted using social media. Movie sales can be predicted, as Bernando Huberman and his colleague showed. Flu outbreaks and periodic sales can be predicted, too. But not elections. At least without some sophisticated filtering that makes them as representative and competitive to the professional pollsters.