~ Archive for PageRank ~

Defending your Domain Name

metaxas - July 14, 2014 @ 5:23 pm · Filed under PageRank, Propaganda, Web Spam, Ελληνικά

I recently had the uninvited opportunity to defend my domain name, metaxa.net, and I am writing this post because it may be useful to others who find themselves in such a position. The good news: You do not need to be a lawyer to do it yourself and the arbitration system works reasonably well. The bad news: You need to do a bit of reading and writing.

I registered and own metaxa.net since 1999. Back then it was essential to own a domain name since, at that time, there were no services to upload and share your photos, no easy ways to have email addresses for you or your family members, and the only clouds around were still up in the sky unable to store any files you may need while away from your office.

As the Web services expanded to cover everything under the sun, the above uses became secondary, and I started utilizing the domain name for other reasons. In 2004 I had started a line of research to discover how Web Spammers succeeded in gaming search engines and place their bogus postings in the top-10 page of relevant search results. Back then it was thought that the “PageRank” algorithm was like “42”: the answer to everything (related to the Web). We now know that search engines can be gamed, and that they spend considerable resources to avoid Web Spam.

[Side note: My research led me to discover the reasons that search engines can be gamed — pretty much for the same reasons we, humans, can be fooled. Web Spammers were using techniques very similar to the propagandistic techniques that politicians, advertisers and financial criminals are using to persuade us to vote for something, buy something or invest in something. I presented my work initially at AIRWeb 2005 (Web Spam, Progaganda and Trust) but if you are interested to read more you should check the journal version (Web Spam, Social Propaganda and the Evolution of Search Engine Rankings).]

Anyway, to implement a technique of discovering Web Spam sites I needed a method to evaluate similarity between Web site contents. It helped to have a Web site containing text that is unbiased towards a particular theme or product. Since there was nothing available online, I uploaded onto my own site a large collection of Associated Press news that were used by the TREC community. If you visited the top directory of metaxa.net you would be surprised to find huge files containing old news. But it was very unlikely that you would visit it. I did not include metaxa.net in any search engine listing, so it would never appear in your search results. Though I never planned to use metaxa.net as text repository, over the years it proved very useful. Many of my students used it to run their research projects, and after graduation some of them ended up working at Microsoft Bing and Google Search, fighting Web Spam on a regular basis.

I continued to own the domain name paying the fees on time, so it was a complete surprise when I received a letter in early April about a dispute filed with the World Intellectual Property Organization (WIPO) by Remy-Cointreau Luxemburg, the well known liqueur producing company. They wanted to take over my domain name! Remy-Cointreau had bought a well-known Greek Spirits company, METAXA, and over the years they had started buying every domain name that contained the string “metaxa” (including “metaxaswineestate”!) Given my research, I knew exactly why they were doing that: They wanted to “persuade” the search engines that any search of the term “metaxa” should lead to their official site only! They knew how to fool PageRank and were doing it legally too. Now they wanted to fool WIPO’s arbitrators. In fact, as they claimed in the Complaint they served me,

“Indeed, the term METAXA® is only known in relation to the Complainant. It has no meaning whatsoever in English or in any other language. A Google search on the term METAXA® displays several results, all of them being related to the Complainant“

Wow. Three lies in three sentences. Anyone who knows just a bit of modern Greek history or checks Wikipedia knows that the name Metaxas is not that rare. (In Greek, the female version of a name, or a reference to the family name itself, does not include the final “s”.) One can Google translate metaxa to see that it means “raw silk” and “silk trader” in Greek, depending on the intonation. And the first page of Google search results does not constitute a proof of unique association. (Yet, even their own submitted screenshot included other METAXA references!) And these were not the only “inaccurate claims” or logical fallacies in the Complaint. There were about a dozen of them. You can see a more detailed list (though not exhaustive) in my Response to their Complaint.

I was doubly stunned. Using such claims were ridiculing the WIPO, the body that they were asking to support them. How can they be so arrogant insulting WIPO’s arbitrators’ intelligence? Wouldn’t they expect that the Responder would point out their lies?

Probably not. My guess is that they expected that nobody would respond to their Complaint and they would win an uncontested case. You see, when a couple years ago we changed the domain record hosting company, the WHOIS information was not updated correctly, and it showed that the administrator was to be reached at … nocontactsfound@secureserver.net. Seems reasonable to expect that whoever owned the domain would not be reached, and so they could snatch it without contest. It would cost them a few thousand dollars, but for a company with deep pockets, that would not be a problem.

Unfortunately for them, due to a billing inquiry, I did get informed. Legalese never being part of my tongues, I turned to the wonderful Berkman community for advice. And the advice poured in immediately. Several Berkmanites, and primarily Prof. Jacques de Werra of the University of Geneva and Faculty Associate at the Berkman Center this academic year, suggested literature, gave me references to other relevant cases, recommended experts in the field, offered advice on my options. They even pointed out to other unfair activities that the company was involved in the past. In particular, a song written for the company’s ad campaign was stolen from Berkmanite musician Erin McKeown – you can read all about it at TechDirt Case study A Perspective On The Complexities Of Copyright And Creativity From A Victim Of Infringement.

Onto the technical part. It turns out that one can defeat a Complaint by convincing the arbitrators that any of these elements are not present:

your domain name is identical or confusingly similar to a trademark or service mark in which the complainant has rights;
you have no rights or legitimate interests in respect of the domain name; and
your domain name has been registered and is being used in bad faith.

The detailed policy and rules (called a UDRP, Uniform Domain-Name Dispute-Resolution Policy) are listed here:
http://www.icann.org/en/help/dndr/udrp

In my case, the first element could not be countered: There were good reasons why my domain name was identical to theirs. But I could convince the arbitrators that the Complaint was wrong for both the second and the third element. That I have legitimate rights, was straight forward: I had to just point out the lies within their claims. But you need to provide a complete counter argument. It is not enough to point out a lie and expect that the arbitrators will go looking for its validity. You have to provide it yourself. With screenshots, excerpts, clear arguments. And you better be exhaustive in your arguments because you may only get one shot. For example, even though I was quite sure that I could nullify the second element, I should better also nullify the third one, just in case. For the third element, I had to go digging for references and receipts showing that I was always the owner of the account and had used it in good faith. Showing “good faith” was important, as this is a main reason why UDRPs exist: to curb the efforts of cyber-squatters who buy domain names only to sell them to the higher bidder of competing companies.

While I was at it, I wanted to point out that the Complaint itself was not filed in good faith. Remy-Cointreau really did not really have a case. Only through the dozen of lies in the filing they could put a case together. I would love to get the arbitrators acknowledge the Company’s filing with bad faith. There are no penalties associated with such an acknowledgement, but future arbitrators may take it into account in the future.

At the end the arbitrators denied the Remy Cointreau – Metaxa Complaint, stopping at the fact that the second element was not proven by the company. In addition to the company losing a case and a few thousand dollars, they lost the opportunity of persuading search engines about the unique association of their trademark. Now that this is an officially recorded WIPO UDRP case, it may help reducing the number of future frivolous Complaints.

PS. I recently found a good guide on How to Choose the Right Domain Name, I hope you will find it useful.

Misinformation and Propaganda in Cyberspace

metaxas - February 26, 2012 @ 4:40 pm · Filed under Critical Thinking, Elections, PageRank, Propaganda, Social Media, Trustworthiness, Twitter

Dear Readers,

The following is a blog that I wrote recently for a conference on “Truthiness in Digital Media” that is organized by the Berkman Center in March. It summarizes some of the research findings that have shaped my approach to the serious challenges that misinformation propagation poses in Cyberspace.

Do you have examples of misinformation or propaganda that you have seen on the Web or on Social Media? I would love to hear from you.

Takis Metaxas

Misinformation and Propaganda in Cyberspace

Since the early days of the discipline, Computer Scientists have always been interested in developing environments that exhibit well-understood and predictable behavior. If a computer system were to behave unpredictably, then we would look into the specifications and we would, in theory, be able to detect what went wrong, fix it, and move on. To this end, the World Wide Web, created by Tim Berners-Lee, was not expected to evolve into a system with unpredictable behavior. After all, the creation of WWW was enabled by three simple ideas: the introduction of the URL, a globally addressable system of files, the HTTP, a very simple communication protocol that allowed a computer to request and receive a file from another computer, and the HTML, a document-description language to simplify the development of documents that are easily readable by non-experts. Why, then, in a few years did we start to see the development of technical papers that included terms such as “propaganda” and “trust“?

Soon after its creation the Web began to grow exponentially because anyone could add to it. Anyone could be an author, without any guarantee of quality. The exponential growth of the Web necessitated the development of search engines (SEs) that gave us the opportunity to locate information fast. They grew so successful that they became the main providers of answers to any question one may have. It does not matter that several million documents may all contain the keywords we were including in our query, a good search engine will give us all the important ones in its top-10 results. We have developed a deep trust in these search results because we have so often found them to be valuable — or, when they are not, we might not notice it.

As SEs became popular and successful, Web spammers appeared. These are entities (people, organizations, businesses) who realized that they could exploit the trust that Web users placed in search engines. They would game the search engines manipulating the quality and relevance metrics so as to force their own content in the ever-important top-10 of a relevant search. The Search Engines noticed this and a battle with the web spammers ensued: For every good idea that search engines introduced to better index and retrieve web documents, the spammers would come up with a trick to exploit the new situation. When the SEs introduced keyword frequency for ranking, the spammers came up with keyword stuffing (lots of repeating keywords to give the impression of high relevance); for web site popularity, they responded with link farms (lots of interlinked sites belonging to the same spammer); in response to the descriptive nature of anchor text they detonated Google bombs (use irrelevant keywords as anchor text to target a web site); and for the famous PageRank, they introduced mutual admiration societies (collaborating spammers exchanging links to increase everyone’s PageRank). In fact, one can describe the evolution of search results ranking technology as a response to Web spamming tricks. And since for each spamming technique there is a corresponding propagandistic one, they became the propagandists of cyberspace.

Around 2004, the first elements of misinformation around elections started to appear, and political advisers recognized that, even though the Web was not a major component of electoral campaigns at the time, it would soon become one. If they could just “persuade” search engines to rank positive articles about their candidates highly, along with negative articles about their opponents, they could convince a few casual Web users that their message was more valid and get their votes. Elections in the US, after all, often depend in a small number of closely contested races.

Search Engines have certainly tried hard to limit the success of spammers, who are seen as exploiting this technology to achieve their goals. Search results were adjusted to be less easily spammable, even if this meant that some results were hand-picked rather than algorithmically produced. In fact, during the 2008 and the 2010 elections, searching the Web for electoral candidates would yield results that contained official entries first: The candidate’s campaign sites, the office sites, and wikipedia entries topped the results, well above even well-respected news organizations. The embarrassment of being gamed and of the infamous “miserable failure” Google bomb would not be tolerated.

Around the same time we saw the development of the Social Web, networks that allow people connect, exchange ideas, air opinions, and keep up with their friends. The Social Web created opportunities both for spreading political (and other) messages, but also misinformation through spamming. In our research we have seen several examples of propagation of politically-motivated misinformation. During the important 2010 Special Senatorial election in MA, spammers used Twitter in order to create a Google bomb that would bring their own messages to the third position of the top-10 results by frequently repeating the same tweet. They also created the first Twitter bomb targeting individuals interested in the MASEN elections with misinformation about one of the candidates, and created a pre-fab Tweet factory imitating a grass-roots campaign, attacking news organizations and reporters (a technique known as “astroturfing“).

Like propaganda in society, spam will stay with us in cyberspace. And as we increasingly look to the Web for information, it is important that we are able to detect misinformation. Arguably, now is the most important time for such detection, since we do not currently have a system of trusted editors in cyberspace like that which has served us well in the past (newspapers, publishers, institutions). What can we do?

Retweeting reveals communities of likely-minded people: There are 2 larger groups that naturally arise when one considers the retweeting patterns of those tweeting during the 2010 MA special election. Sampling reveals that the smaller contains liberals and the larger conservatives. The larger one appears to consist of 3 different subgroups.

Some promising research in social media has shown potential in using technology to detect astroturfing. In particular, the following rules hold true most (though not all) of the time:

The credibility of the information you receive is related to the trust you have towards the original sender and to those who retweeted it.
Not only do Twitter friends (those that you follow) reveal a similarly-minded community, their retweeting patterns make these communities stronger and more visible.
While both truths and lies propagate in cyberspace, lies have shorter life-spans and are questioned more often.

While we might prefer an automatic way of detecting misinformation with the use of algorithms, this will not happen. Citizens of cyberspace must become educated about how to detect misinformation, be provided with tools that will help them question and verify information, and draw on the strengths of crowd sourcing through their own groups of trusted editors. This Berkman conference will help us push in this direction.

When Computation met Society

~ Archive for PageRank ~

Defending your Domain Name

Misinformation and Propaganda in Cyberspace

Misinformation and Propaganda in Cyberspace

Pages

Categories

Archives