You are viewing a read-only archive of the Blogs.Harvard network. Learn more.

Archive for the 'search' Category

D.H. on AOL and Basic Characteristics of Information


Daniel Haeusermann, Berkman intern and FIR-HSG researcher, has a great post on his brand-new blog about AOL’s publication of search queries, viewed from a (European) information law perspective. Stay tuned, Dan will have many interesting things to say.

Power of Search Engines: Some Highlights of Berlin Workshop


I’ve spent the past two days here in Berlin, attending an expert workshop on the rising power of search engines organized by Professor Marcel Machill and hosted by the Friedrich Ebert Stiftung, and a public conference on the same topic.

I much enjoyed yesterday’s presentations by a terrific group of scholars and practitioners from various countries and with different backgrounds, ranging from informatics, journalism, economics, and education to law and policy. The extended abstracts of the presentations are available here. I presented my recent paper on search engine law and policy. Among the workshop’s highlights (small selection only):

* Wolfgang Schulz and Thomas Held (Hans Bredow Institute, Univ. of Hamburg) discussed the differences between search-based filtering in China versus search engine content regulation in Germany. In essence, Schulz and Held argued that procedural safeguards (including independent review), transparency, and the requirement that legal filtering presupposes that the respective piece of content is “immediately and directly harmful” make the German system radically different from the Chinese censorship regime.

* Dag Elgesem (Univ. of Bergen, Department of information science) made an interesting argument with regard to the question how we (as scholars) perceive users as online searchers. While the shift from passive consumers to active users has been debated in the context of the creation/production of information, knowledge, and entertainment (one of my favorite topics, as many of you know), Dag argues that online searchers, too, have become “active users” in Benkler’s sense. In contrast, so Dag’s argument, much of our search engine policy discussion has assumed a rather passive user who just types in a search term and uses what he gets in response to the query. Evidently, the question of the underlying conception of users in their role as online searchers is important because it impacts the analysis whether regulatory interventions are necessary or not (e.g. with regard to transparency, market power, and “Meinungsmacht” of search engines.)

* Boris Rotenberg (DG Joint Research Center, European Commission, Sevilla) linked in an interesting way the question of the search engine user’s privacy – as expression of informational autonomy – with the user’s freedom of expression and information. He argues, in essence, that the increased use of personal data by search engine operators in the course of their attempts to personalize search might have a negative impact on freedom of information in at least three regards. First, extensive use of personal data may lead to user-side filtering ( scenario). Second, it might produce chilling effects by restricting “curious searches”. Third, personalization tends to create strong ties to a particular (personalized) search engine, hindering the user to use alternative engines (“stickiness”-argument).

* Benjamin Peters (Columbia University) used the Mohammed cartoon controversy to explore three questions: (1) As to what extent do search engines eliminate the role of traditional editors? (2) Do algorithms have any sort of in-built ethics? (Benjamin’s answer, based on David Weinberger’s notion of links as acts of generosity: yes, they have). (3) What are the elements of a “search engine democracy”?

* Dirk Lewandowski (Department of information science, Heinrich-Heine Univ.) provided a framework for assessing a search engine’s quality. He argues that the traditional measurement “precision” – as part of retrieval quality – is not a particularly useful criterion to evaluate and compare search engines’ quality, because the major search engines produce almost the same score on the precision scale (as Dirk empirically demonstrated.) Dirk’s current empirical research focuses on the search engine’s index quality, incl. elements such as reach (e.g. geographic reach), size of the index, and actuality/frequency of updates.

* Nadine Schmidt-Maenz (Univ. of Karlsruhe, Institute for Decision Theory and Management Science) presented the tentative results of an empirical long-term study on search queries. Nadine and her team have automatically observed and analyzed the live tickers of three different search engines and clustered over 29 million search terms. The results are fascinating and the idea of topic detection, tracking, and – even more interestingly – topic prediction (!) highly relevant for the search engine industry, both from a technological and business perspective. From a different angle, we also discussed the potential impact of reliable topic forecasting on agenda-setting and journalism.

* Ben Edelman (Department of Economics, Harvard Univ.) empirically demonstrated that search engines are at least in part responsible for the wide spread of spyware, viruses, pop-up ads, and spam, but that they have taken only limited steps to avoid sending users to hostile websites. He also offered potential solutions to the problems, including safety labeling of the individual search results by the search engine providers, and changes in the legal framework (liability rules) to create the right incentive structure for search engine operators to contribute to overall web safety.

Lot’s of food for thought. What I’d like to explore in greater detail is Dag’s argument that users as online searchers, too, have become highly (inter-)active, probably not only in the sense of active information retrievers, but increasingly also as active producers of information while being engaged in search activities (e.g. by reporting about search experiences, contributing to social search networks, etc.)

YJoLT-Paper on Search Engine Regulation


The Yale Journal of Law and Technology just published my article on search engine regulation. Here’s the extended abstract:

The use of search engines has become almost as important as e-mail as a primary online activity. Arguably, search engines are among the most important gatekeepers in today’s digitally networked environment. Thus, it does not come as a surprise that the evolution of search technology and the diffusion of search engines have been accompanied by a series of conflicts among stakeholders such as search operators, content creators, consumers/users, activists, and governments. This paper outlines the history of the technological evolution of search engines and explores the responses of the U.S. legal system to the search engine phenomenon in terms of both litigation and legislative action. The analysis reveals an emerging “law of search engines.” As the various conflicts over online search intensify, heterogeneous policy debates have arisen concerning what forms this emerging law should ultimately take. This paper offers a typology of the respective policy debates, sets out a number of challenges facing policy-makers in formulating search engine regulation, and concludes by offering a series of normative principles which should guide policy-makers in this endeavor.

As always, comments are welcome.

In the same volume, see also Eric Goldman‘s Search Engine Bias and the Demise of Search Engine Utopianism.

Global Online Freedom Act of 2006: Evil is in the Details


I’ve just read Rep. Chris Smith’s discussion draft of a “Global Online Freedom Act of 2006,” which has been made available online on Rebecca MacKinnon’s blog. Rebecca nicely summarizes the key points of the draft. From the legal scholar’s rather then the activist’s viewpoint, however, some of the draft bill’s nitty-gritty details are equally interesting. Among the important definitions is certainly the term “legitimate foreign law enforcement purposes,” which appears, for instance, in the definition of substantial restrictions on Internet freedom, and in sec. 206 on the integrity of user identifying information. According to the draft bill, the term ”legitimate foreign law enforcement purposes” means

“for purposes of enforcement, investigation, or prosecution by a foreign official based on a publicly promulgated law of reasonable specificity that proximately relates to the protection or promotion of health, safety, or morals of the citizens of that jurisdiction.”

And the next paragraph clarifies that

“the control, suppression, or punishment of peaceful expression of political or religious opinion does not constitute a legitimate foreign law enforcement purpose.” [Emphasis added.]

While the first part of the definition makes a lot of sense, the second part is more problematic to the extent that it suggests, at least at a glance, a de facto export of U.S. free speech standards to the rest of the world. Although recent Internet rulings by U.S. courts have suggested an expansion of the standard under which U.S. courts will assert jurisdictions over free speech disputes that arise in foreign jurisdiction, it has been my and others impression that U.S. courts are (still?) reluctant to globally export free speech protections (see, e.g. the 9th Circuit Court of Appeal’s recent Yahoo! ruling.)

Indeed, it would be interesting to see how the above-mentioned definition would relate to French legislation prohibiting certain forms of hatred speech, or German regulations banning certain forms of expression—black lists, by the way, which are also incorporated by European subsidiaries of U.S. based search engines and content hosting services.

While the intention of the draft bill is certainly a legitimate one and while some of the draft provisions (e.g. on international fora, code of conduct, etc.) deserve support, the evil—as usual—is in the details. Given its vague definitions, the draft bill (may it become law) may well produce spillover-effects by restricting business practices of U.S. Internet intermediaries even in democratic countries that happen (for legitimate, often historic reasons) not to share the U.S.’ extensive free speech values.

Addendum: Some comments on the draft bill from the investor’s perspective here. Note, however, that the draft bill also includes foreign subsidiaries of U.S businesses to the extent that the latter control the voting shares or other equities of the foreign subsidiary or authorize, direct, control, or participate in acts carried out by the sbusidiary that are prohibited by the Act.

Information Ethics: U.S. Hearing, but Global Responsibility


Today, the US House of Representatives’ IR Subcommittee on Africa, Global Human Rights and International Operations, and the Subcommittee on Asia and the Pacific are holding an open hearing on the question whether the Internet in China is a Tool for Freedom or Suppression. My colleague Professor John Palfrey, among the foremost Internet law & policy experts, has prepared an excellent written testimony. In his testimony, John summarizes the basic ethical dilemmas for U.S. corporations such as Google, Microsoft, Yahoo and others who have decided to do business in countries like China with extensive filtering and surveillance regimes. John also raises the question as to what extent a code of conduct for Internet intermediaries could guide these businesses and give them a base of support for resisting abusive surveillance and filtering requests and the role academia could play in developing such a set of principles.

I’m delighted that our Research Center at the University of St. Gallen in Switzerland is part of the research initiative mentioned in John’s testimony that is aimed at contributing to the development of ethical standards for Internet intermediaries. Over the past few years, a team of our researchers has explored the emergence, functionality, and enforcement of standards that seek to regulate the behavior of information intermediaries. It is my hope that this research, in one way or another, can contribute to the initiative announced today. Although the ethical issues in cyberspace are in several regards structurally different from those emerging in offline settings, I argue that we can benefit from prior experiences with and research on ethics for international businesses in general and information ethics in particular.

So far, the heated debate about the ethics of globally operating Internet intermediaries has been a debate about the practices of large and influential U.S. companies. On this side of the Atlantic, however, we should not make the mistake to think that the hard questions Palfrey and other experts will be discussing today before the above-menioned Committees are “U.S.-made” problems. Rather, the concern, challenge, and project – designing business activities that respect and foster human rights in a globalized economy with local laws and policies, including restrictive or even repressive regulatory regimes – are truly international in nature, especially in today’s information society. Viewed from that angle, it is almost surprising that we haven’t seen more constructive European contributions to this discourse. We should not forget that European Internet & IT companies, too, face tough ethical challenges in countries such as China. In that sense, the difficult, but open and transparent conversations in the U.S. are in my view an excellent model for Europe with its long-standing human rights tradition.

Update: Rebecca MacKinnon does a great, fast-speed job summarizing the written and oral testimonies. See especially her summary of and comments on the statements by Cisco, Yahoo!, Google, and Microsoft.

Google’s Alan Davidson on Areas of Special Concern


Alan Davidson, Washington Policy Counsel and head of Google’s new Washington DC government affairs office, made several interesting remarks in his panel statement. Among them: He identified the following two areas that are of special concern to search engine providers:

(1) Conceptual shift in speech regulation

  • Old approach (offline media): focused on publishers, readers
  • New & emerging generation of speech regulation: focus on deliverers – intermediaries are supposed to police the networks. Examples where this approach is currently up for discussion in D.C.: access to pharmaceutical products, blocking of gaming websites
  • Assessment: It’s not a good idea to target intermediaries: Due process, procedural problem: intermediary, e.g., can’t tell whether or not a particular site featuring copyrighted content is a fair use or not; by going after the intermediary you take the publisher out of equation, can’t go to courts to argue the case
  • Misguided, because search engines are only in the business of indexing existing content; they’re not editors (can’t be, given the scale.)

(2) Government access to information

  • Increasing pressures to provide personalized information (search history, etc.) to third parties
  • Best privacy policy doesn’t help if government wants information for national security reasons; standards really low; plus: search engines not allowed to inform users that info has been passed on to third parties.

Ed Felten on the Search Space


Here are some keywords of Ed Felten’s presentation here at Yale’s search conference:

  • Talks about what search is
  • Search is broader than we think that it is
  • Three steps, processes or elements: (1) observe, (2) analyze/learn/, (3) serve users
  • Observation of information, either crawled in the web or the university library, real world
  • Put information in a DB, so that it’s available in electronic/digital form. Analyze, index, learn, model that information; put some sort of value on top of it
  • Index, model, … built from it allows you to serve users, answering queries, answer questions
  • Broad definition, it’s not only search engines such as Yahoo, it also includes Google print, fixture sites (e.g. baseball statistics), attributes of P2P file sharing systems; also applies to consumer reporting organizations, e.g. choicepoint
  • Interesting issue among others: Question whether search is internal or external to the world it is studying. Eg. Ebay has searchable search engine inside for objects/auctions. Component that is part of the service they provide. Bidder’s Edge tried to build external search; eBay wasn’t happy. Other interesting case: Grokster, e.g., has/had internal search engine. BitTorrent didn’t provide a search engine, provided only transfer and got significant legal advantage.
  • Other interesting aspect: analyzing and learning brings the value, e.g. Google’s PageRank is the value added. Analysis step is where the heavy thinking happens and value is created. Interesting, b/c legal challenges have not challenged analysis element, but are challenge to crawling = observation stuff (e.g. eBay v. Bidder’s Edge).
  • Decentralization and P2P design. Complex issue. If analyzing and learning is key, but “observation” element of the search process is the target of law, it’s likely that we try to decentralize the observation part.
  • In sum, search is broad, we’re very early in development of this technology.

Update: More on the other panels here.

Regulating Search? Call for a Second Look


Here is my second position paper (find the first one here) in preparation of the upcoming Regulating Search? conference at ISP Yale. It provides a rough arc of a paper I will write together with my friend and colleague Ivan Reidel. The Yale conference on search has led to great discussions on this side of the Atlantic. Thanks to the FIR team, esp. Herbert Burkert and James Thurman, Mike McGuire, and to Sacha Wunsch-Vincent for continuing debate.

Regulating Search? Call for a Second Look

1. The use of search engines has become almost as important as email as a primary online activity on any given day, according to a recent PEW survey. According to an another survey, 87% of search engine users state that they have successful search experiences most of the time, while 68% of users say that search engines are a fair and unbiased source of information. This data combined with the fact that the Internet, among very experienced users, ranks even higher than TV, radio and newspapers as an important source of information, illustrates the enormous importance of search engines from a demand-side perspective, both in terms of actual information practices as well as with regard to users’ psychological acceptance.

2. The data also suggests that the transition from an analog/offline to a digital/online information environment has been accompanied by the emergence of new intermediaries. While traditional intermediaries between senders and receivers of information—most of them related to the production and dissemination of information (e.g. editorial boards, TV production centers, etc.)—have diminished, new ones such as search engines have entered the arena. Arguably, search engines have become the primary gatekeepers in the digitally networked environment. In fact, they can effectively control access to information by deciding about the listing of any given website in search results. But search engines not only shape the flow of digital information by controlling access; rather, search engines at least indirectly engage in the construction of the messages or meaning by shaping the categories and concepts users’ use to search the Internet. In other words, search engines have the power to influence agenda setting.

3. The power of search engines in the digitally networked environment with corresponding misuse scenarios is likely to increasingly attract policy- and lawmakers attention. However, it is important to note that search engines are not unregulated under the current regime. Markets for search engines regulate their behavior, although the regulatory effects of competition might be relatively weak because the search engine market is rather concentrated and centralized; a recent global user survey suggests that Google’s global usage share has reached 57.2%. In addition, not all search engines use their own technology. Instead, they rely on other search providers for listings. However, search engines are also regulated by existing law and regulations, including consumer protection laws, copyright law, unfair competition laws, and—at the intersection of market-based regulation and law-based regulation—antitrust law or (in the European terminology) competition law.

4. Against this backdrop, the initial question for policymakers then must concern the extent to which existing laws and regulations may feasibly address potential regulatory problems that emerge from search engines in the online environment. Only where existing legislation and regulation fails due to inadequacy, enforcement issues, or the like, the question of new, specific and narrowly tailored regulation should be considered. In order to analyze existing laws and regulation with regard to their ability to manage problems associated with search engines, one might be well-advised to take a case-by-case approach, looking at each concrete problem or emerging regulatory issue (“scenario”) on the one hand and discussion relevant to incumbent legal/regulatory mechanisms aimed at addressing conflicts of that sort on the other hand.

5. Antitrust law might serve as an illustration of such an approach. While the case law on unilateral refusals to deal is still one of the most problematic and contested areas in current antritrust analysis, the emergence of litigation applying this analytical framework to search engines seems very likely. Although most firms’ unilateral refusals to deal with other firms are generally regarded as legal, a firm’s refusal to deal with competitors can give rise to anti-trust liability if such firm possesses monopoly power and the refusal is part of a scheme designed to maintain or achieve further monopoly power. In the past, successful competitors like Aspen Skiing Co. and more recently Microsoft have been forced to collaborate with competitors and punished for actions that smaller companies could have probably gotten away with. In this sense, search engines might be the next arena where antitrust laws with regard to unilateral refusals to deal are tested. In addition to the scenario just described, the question arises as to whether search engines could be held liable for refusal to include particular businesses in their listings. Where a market giant such as Google has a “don’t be evil” policy and declines from featuring certain sites in its PageRank results because it deems these sites to be “evil,” there is an issue of whether Google is essentially shutting that site provider out of the online market through the exercise of its own position in the market for information. Likewise, the refusal to include certain books in the Google Print project would present troubling censorship-like issues. It is also important to note that Google’s editorial discretion with regard to its PageRank results was deemed to be protected by the First Amendment in the SearchKing case.

6. In conclusion, this paper suggests a cautious approach to rapid legislation and regulation of search engines. It is one of the lessons learned that one should not overestimate the need for new law to deal with apparently new phenomena emerging from new technologies. Rather, policy- and lawmakers would be well-advised to carefully evaluate the extent to which general and existing laws may address regulatory problems related to search and which issues exactly call for additional, specific legislation.

German Search Engines: Compliance With Own Code of Conduct?


Earlier this year, we reported that at all major search engines in Germany (Google, Lycos Europe, MSN Deutschland, AOL Deutschland, Yahoo, T-Online, and t-info) have reached an agreement to filter harmful-to-minors content.Recently, Marcell Machill tested the search engines’ complicance with their own code of conduct. Find a summary of the results here.

Search Engine Filtering Agreement (Germany)


EDRI-gram, a bi-weekly newsletter about digital civil rights in Europe, draws our attention to an earlier report by German online newsletter Heise, which reported a couple of days ago that all major search engines in Germany (Google, Lycos Europe, MSN Deutschland, AOL Deutschland, Yahoo, T-Online, and t-info) have reached an agreement to filter harmful-to-minors content which will make it much more difficult for German users to access such content. For this purpose, the search engines agreed to establish and run a self-regulatory organization that will block websites considered to be harmful based on a list of URLs provided by a government agency in charge with media content classification. According to the Heise report, the search engines take these steps because they fear that European legislators might become active if the harmful-to-minors-problem isn’t addressed by the industry itself.
Among many interesting details: (1) The search engines are not allowed to make public which sites are filtered. (2) It seems unclear how content considered to be harmful to minors can be searched and accessed by adults under the regime. Again, clash of cultures. For a much earlier (2002) analysis of Google content filtering in Germany, see this report by Professor Jonathan Zittrain and former Berkmaniac Ben Edelman.

Log in