Google Book-Scanning Debate Heats Up -- Redmondmag.com

Google Book-Scanning Debate Heats Up

By The Associated Press
12/20/2006

Already facing a legal challenge for alleged copyright infringement, Google Inc.'s crusade to build a digital library has triggered a philosophical debate with an alternative project promising better online access to the world's books, art and historical documents.

The latest tensions revolve around Google's insistence on chaining the digital content to its Internet-leading search engine and the nine major libraries that have aligned themselves with the Mountain View-based company.

A splinter group called the Open Content Alliance favors a less restrictive approach to prevent mankind's accumulated knowledge from being controlled by a commercial entity, even if it's a company like Google that has embraced "Don't Be Evil" as its creed.

"You are talking about the fruits of our civilization and culture. You want to keep it open and certainly don't want any company to enclose it," said Doron Weber, program director of public understanding of science and technology for the Alfred P. Sloan Foundation.

The New York-based foundation on Wednesday will announce a $1 million grant to the Internet Archive, a leader in the Open Content Alliance, to help pay for digital copies of collections owned by the Boston Public Library, the Getty Research Institute, the Metropolitan Museum of Art.

The works to be scanned include the personal library of John Adams, the nation's second president, and thousands of images from the Metropolitan Museum.

The Sloan grant also will be used to scan a collection of anti-slavery material provided by the John Hopkins University Libraries and documents about the Gold Rush from a library at the University of California at Berkeley.

The deal represents a coup for Internet Archive founder Brewster Kahle, a strident critic of the controls that Google has imposed on its book-scanning initiative.

"They don't want the books to appear in anyone else's search engine but their own, which is a little peculiar for a company that says its mission is to make information universally accessible," Kahle said.

Google's restrictions on its digital book copies stem in part from the company's decision to scan copyrighted material without explicit permission. Google wants to ensure only small excerpts from the copyrighted material appear online -- snippets that the company believes fall under "fair use" protections of U.S. law.

A group of authors and publishers nevertheless have sued Google for copyright infringement in a year-old case that is slowly wending its way through federal court.

In contrast, the Open Content Alliance won't scan copyrighted content unless it receives the permission of the copyright owner. Most of the roughly 100,000 books that the alliance has scanned so far are works whose copyrights have expired.

Google hasn't said how many digital copies it has made since announcing its ambitious project two years ago. The company will only acknowledge that it is scanning more than 3,000 books per day _ a rate that translates into more than 1 million annually. Google also is footing a bill expected to exceed $100 million make the digital copies _ a commitment that appeals to many libraries.

The non-copyrighted material in Google's search engine can be downloaded and printed out _ a feature that the company believes mirrors the goals of the Open Content Alliance.

Although the Open Content Alliance depends on the Internet Archive to host its digital copies, other search engines are being encouraged to index the material too.

Both Yahoo Inc. and Microsoft Corp., which run the two largest search engines behind Google, belong to the alliance. The group has more than 60 members, consisting mostly of libraries and universities.

None of Google's contracts prevent participating libraries from making separate scanning arrangements with other organizations, said company spokeswoman Megan Lamb.

"We encourage the digitization of more books by more organizations," Lamb said. "It's good for readers, publishers, authors and libraries."

The motives behind Google's own book-scanning initiative aren't entirely altruistic. The company wants to stock its search engine with unique material to give people more reasons to visit its Web site, the hub of an advertising network that generated most of its $2 billion profit through the first nine months of this year.

Despite its ongoing support for the Open Content Alliance, Microsoft earlier this month launched a book-scanning project to compete with Google. Like Google, Microsoft won't allow its digital copies to be indexed by other search engines.

While Kahle says he was disappointed by Microsoft's recent move, he remains more worried about Google's book-scanning initiative because it has gathered so much attention and support.

All but one of the libraries contributing content to Google so far are part of universities. They are: Harvard, Stanford, Michigan, Oxford, California, Virginia, Wisconsin-Madison, and Complutense of Madrid. The New York Public Library also is relying on Google to scan some of its books.

The University of California, which also belongs to the Open Content Alliance, has no regrets about allowing Google to scan at least 2.5 million of the books in its libraries. "We felt like we could get more from being a partner with Google than by not being a partner," said university spokeswoman Jennifer Colvin.

But some of the participating libraries may have second thoughts if Google's system isn't set up to recognize some of their digital copies, said Gregory Crane, a Tufts University professor who is currently studying the difficulty accessing some digital content.

For instance, Tufts worries Google's optical reader won't recognize some books written in classical Greek. If the same problem were to crop up with a digital book in the Open Content Alliance, Crane thinks it will be more easily addressed because the group is allowing outside access to the material.

Google "may end up aiming for the lowest common denominator and not be able to do anything really deep" with the digital books, Crane said.

Featured

Microsoft Expands Defender Experts With New Threat Intelligence and Multicloud Coverage

Microsoft on Wednesday introduced a threat intelligence service and expanded its managed detection and response offering as the company looks to help security teams face growing volume of threat data into specific defensive actions.
What Happens When Malware Outlives its Intended Lifespan, Part 1?

Aging malware can remain dangerous long after its creators move on, leaving victims with fewer protections and no reliable recovery path.
Microsoft, 3M Partnership Targets AI Infrastructure and Enterprise Transformation

Microsoft and 3M on Wednesday announced a wide-ranging partnership that links two major areas of enterprise AI investment: the infrastructure needed to support AI data centers and the use of AI to modernize large organizations.
Microsoft's Record July Patch Tuesday Fixes 570 Flaws, Including Two Exploited Zero-Days

Microsoft's July Patch Tuesday release broke the record for a second straight month, delivering fixes for roughly 570 holes across Windows, SharePoint, Microsoft 365, Azure and others.
Why Most Backup Success Metrics Are Meaningless

Traditional backup metrics can show perfect health while failing to reveal whether critical workloads can actually be restored.

comments powered by Disqus

Subscribe on YouTube

Office 365 Watch

Email Address*Country*

Please type the letters/numbers you see above.

Upcoming Training Events

0 AM

TechMentor & Cybersecurity Live! @ Microsoft HQ
August 3-7, 2026

Virtual Hands-on Training Seminar: PowerShell Mastery Workshop: From Fundamentals to Advanced Automation
September 9-10, 2026

The AI Pivot
September 25, 2026

Live! 360 6-Week Training & Certification Course: Mastering the Microsoft AI Framework: Building Enterprise-Ready AI Agents with Microsoft Foundry
October 6–November 10, 2026

Live! 360 Orlando
November 15-20, 2026

Artificial Intelligence Live! Orlando
November 15-20, 2026

AI Enterprise Architecture Live! Orlando
November 15-20, 2026

Cybersecurity & Ransomware Live! Orlando
November 15-20, 2026

Data Platform Live! Orlando
November 15-20, 2026

TechMentor Orlando
November 15-20, 2026

Live! 360 2-Day Hands-On Seminar: AI-Powered .NET Development with Claude & Claude Code
December 8-9, 2026

Virtual Hands-on Training Seminar: AI-Powered PowerShell and Infrastructure Automation with Claude Code
December 10-11, 2026

TechMentor & Cybersecurity Live! @ Microsoft HQ
August 9-13, 2027

Webcasts

More Webcasts

Whitepapers

More Tech Library