It's About Search, Stupid

06/07/2010 05:12 am ET | Updated May 25, 2011

If websites, databases and other content are the landscape of the virtual world, then search engines are the maps. Without search engines, the landscape is confusing and getting lost a certainty. With them, finding one's way through the dense forest of information is possible if occasionally made difficult with unexpected detours and dead-ends.

Disappearing from the results of dominant search engines leads to invisibility. And if one has a website, a blog, an ecommerce site, or a database that no one knows exists, it is useless. Given how critical maps are to successful navigation, having accurate, affordable maps that fulfill the variety of needs of a diverse population is key. So, how would we all feel about giving one for-profit company the exclusive right to map, say, New Jersey or Mexico? If no one else could produce a map of New Jersey, there would be no market incentive to produce the best maps that met all the various needs of the population (shortest route to Delaware from New York, coffee with baby-changing stations). If the mapper wanted to direct traffic to its stores in Toms River, there would be no incentive to produce maps that showed the most direct route to Delaware instead of detouring through Toms Rivers.

Yet giving just such exclusive rights to some important internet territory is one of the key issues involved in a proposed settlement between Google and all the book publishers and authors in most of the English-speaking world.

Briefly -- Google undertook a project to digitize millions of books in the libraries of several major universities such as the University of Michigan and Stanford. Google copied books in their entirety that are in the public domain, as well as those still in copyright. A handful of US publishers and the Authors' Guild, a not-for-profit organization representing US book authors, sued Google for copyright infringement. Just a few weeks ago in federal court in downtown Manhattan, the judge listened to a day's worth of objections and support for a proposed settlement agreement that runs over 300 pages.

This complex agreement accomplishes several things that would be beneficial to the public, authors, and the scholarly community. Under it, digitized books that are part of Google's database would be made available in snippets as search results, and, unless the publisher or author objected, the entire book could be part of paid-for library subscriptions or various kinds of ebooks. Previously buried and obscure works would suddenly see the light of day. And, because Google would facilitate text-to-speech functions for this database, all of these some 17 million books (Google has given varying estimates of the numbers digitized) would become available to those who have sight disabilities.

Why would Google spend all that money -- millions to digitize, more millions to litigate the case it had to know would come, and more millions to settle that case -- for what will amount to a library lending and ebook business? Keep in mind that Google's revenue alone last year was $23.6 billion. This is more than half the $40.3 billion in total revenue generated in the United States by more than 100,000 publishers. And not one dollar of Google's revenue came from publishing books. It came from the enormous ad revenues generated by Google's search and Ad Sense business. With a profit margin of approximately 25%, search in 2010 is far more profitable than publishing.

If the settlement is approved by the Court, Google will be the only search engine that will serve up search results that include the contents of some 5-10 million books -- the books whose authors, publishers, copyright holders can't be found or don't want to be found. Because of the intersection of copyright and class action law woven together by the proposed settlement, no one else will be able to do that. What does that mean for Google? It means that the results and experience from a Google search, as opposed to the results from any other search engine, will be richer. It means that Google's ability to refine its algorithms for search results and its analysis of consumer behavior, interests, and needs will have a depth and a range that no one else can match.

A recent article in Ars Technica described Google's current practice of keeping consumer data for 9 months, much longer than any other major search engine, because it uses the data for a variety of important (and profitable) business needs: "Search data is mined . . . by watching how users correct their own spelling mistakes, how they write in their native language, and what sites they visit after searches. That information has been crucial to Google's famously algorithm-driven approach to problems like spell check, machine language translation, and improving its main search engine."

Google's exclusive ability to map these books, and to observe how consumers interact with that map and the content that these books represent, would give Google a significant competitive advantage in the most profitable internet related market in which it is already dominant. Not surprisingly, the Department of Justice has announced that it is investigating.

Google has publicly proclaimed that without this settlement these out-of-print books will remain buried in libraries with no ability for most people to find them. But is that necessarily true? If it is indeed a public good for these books to be accessible, then shouldn't it be public institutions, perhaps with private cooperation and funding where appropriate, that accomplish that result?

Couldn't the Library of Congress start to assemble a digital database that would be used (perhaps for a fee) by all search engines? After all, US copyright law currently requires that two copies of every work registered be deposited with the Library of Congress, unless exempted by regulation. Why not have one of them be digital with appropriate safeguards? Couldn't (and shouldn't) Congress finally enact some kind of safe harbor or compulsory license scheme so that digital copies of past work are made available for limited uses such as search with compensation to rightsholders where appropriate?

After all, if the goal is to create a library for benefit to the public then a private database won't cut it. If this settlement is approved and actually starts to operate, Google's insuperable advantage may well prevent all the other possible players, both public and private, from helping to create something truly public and accessible to all.