Google Book-Scanning Debate Heats Up
Already facing a legal challenge for alleged copyright infringement, Google
Inc.'s crusade to build a digital library has triggered a philosophical debate
with an alternative project promising better online access to the world's books,
art and historical documents.
The latest tensions revolve around Google's insistence on chaining the digital
content to its Internet-leading search engine and the nine major libraries that
have aligned themselves with the Mountain View-based company.
A splinter group called the Open Content Alliance favors a less restrictive
approach to prevent mankind's accumulated knowledge from being controlled by
a commercial entity, even if it's a company like Google that has embraced "Don't
Be Evil" as its creed.
"You are talking about the fruits of our civilization and culture. You
want to keep it open and certainly don't want any company to enclose it,"
said Doron Weber, program director of public understanding of science and technology
for the Alfred P. Sloan Foundation.
The New York-based foundation on Wednesday will announce a $1 million grant
to the Internet Archive, a leader in the Open Content Alliance, to help pay
for digital copies of collections owned by the Boston Public Library, the Getty
Research Institute, the Metropolitan Museum of Art.
The works to be scanned include the personal library of John Adams, the nation's
second president, and thousands of images from the Metropolitan Museum.
The Sloan grant also will be used to scan a collection of anti-slavery material
provided by the John Hopkins University Libraries and documents about the Gold
Rush from a library at the University of California at Berkeley.
The deal represents a coup for Internet Archive founder Brewster Kahle, a strident
critic of the controls that Google has imposed on its book-scanning initiative.
"They don't want the books to appear in anyone else's search engine but
their own, which is a little peculiar for a company that says its mission is
to make information universally accessible," Kahle said.
Google's restrictions on its digital book copies stem in part from the company's
decision to scan copyrighted material without explicit permission. Google wants
to ensure only small excerpts from the copyrighted material appear online -- snippets that the company believes fall under "fair use" protections
of U.S. law.
A group of authors and publishers nevertheless have sued Google for copyright
infringement in a year-old case that is slowly wending its way through federal
In contrast, the Open Content Alliance won't scan copyrighted content unless
it receives the permission of the copyright owner. Most of the roughly 100,000
books that the alliance has scanned so far are works whose copyrights have expired.
Google hasn't said how many digital copies it has made since announcing its
ambitious project two years ago. The company will only acknowledge that it is
scanning more than 3,000 books per day _ a rate that translates into more than
1 million annually. Google also is footing a bill expected to exceed $100 million
make the digital copies _ a commitment that appeals to many libraries.
The non-copyrighted material in Google's search engine can be downloaded and
printed out _ a feature that the company believes mirrors the goals of the Open
Although the Open Content Alliance depends on the Internet Archive to host
its digital copies, other search engines are being encouraged to index the material
Both Yahoo Inc. and Microsoft Corp., which run the two largest search engines
behind Google, belong to the alliance. The group has more than 60 members, consisting
mostly of libraries and universities.
None of Google's contracts prevent participating libraries from making separate
scanning arrangements with other organizations, said company spokeswoman Megan
"We encourage the digitization of more books by more organizations,"
Lamb said. "It's good for readers, publishers, authors and libraries."
The motives behind Google's own book-scanning initiative aren't entirely altruistic.
The company wants to stock its search engine with unique material to give people
more reasons to visit its Web site, the hub of an advertising network that generated
most of its $2 billion profit through the first nine months of this year.
Despite its ongoing support for the Open Content Alliance, Microsoft earlier
this month launched a book-scanning project to compete with Google. Like Google,
Microsoft won't allow its digital copies to be indexed by other search engines.
While Kahle says he was disappointed by Microsoft's recent move, he remains
more worried about Google's book-scanning initiative because it has gathered
so much attention and support.
All but one of the libraries contributing content to Google so far are part
of universities. They are: Harvard, Stanford, Michigan, Oxford, California,
Virginia, Wisconsin-Madison, and Complutense of Madrid. The New York Public
Library also is relying on Google to scan some of its books.
The University of California, which also belongs to the Open Content Alliance,
has no regrets about allowing Google to scan at least 2.5 million of the books
in its libraries. "We felt like we could get more from being a partner
with Google than by not being a partner," said university spokeswoman Jennifer
But some of the participating libraries may have second thoughts if Google's
system isn't set up to recognize some of their digital copies, said Gregory
Crane, a Tufts University professor who is currently studying the difficulty
accessing some digital content.
For instance, Tufts worries Google's optical reader won't recognize some books
written in classical Greek. If the same problem were to crop up with a digital
book in the Open Content Alliance, Crane thinks it will be more easily addressed
because the group is allowing outside access to the material.
Google "may end up aiming for the lowest common denominator and not be
able to do anything really deep" with the digital books, Crane said.