Could Google Monopolize Human Knowledge?
As Microsoft Backs Away From Digitizing Old Texts, Some Worry One
Source Could Privatize It All
By GREGORY M. LAMB
CAMBRIDGE, Mass.
July 12, 2008
Should a single company be left in charge of putting all of the
world's books online?
An impressive list of world-class libraries and book publishers don't
seem to mind. In 2004, they signed on as partners with Google, the
Internet search and advertising colossus based in Mountain View,
Calif.
Yet some observers have strong concerns about Google Book Search and
how the collected thinking of human history will be accessed in the
future.
Those anxieties rose late last month when Microsoft announced that it
was withdrawing from a rival book-scanning project headed by the
nonprofit Internet Archive (archive.org).
About 750,000 books and 80 million journal articles scanned by
Microsoft were removed from its servers, but many remain accessible
elsewhere, including on servers maintained by the Internet Archive,
which has about 440,000 books online.
Microsoft, which said it still intends to give publishers digital
copies of their scanned books, may have made a rational business
decision from its perspective.
But the sudden shift also showed how vulnerable a digitizing project
is when it relies on a for-profit company, says Brewster Kahle,
executive director of the Internet Archive. Nothing would stop Google
from also suddenly shutting down its online book effort or limiting
access to it, he says.
If money gets tight, "there's a meeting behind closed doors, and
there's a notice put on the Web site that it's shut down," he says.
"That's what happens."
Internet access to books is becoming more important, some observers
say, as portable book readers, such as Amazon's Kindle, become more
common and as more people expect to find all their reading needs
online.
"I wouldn't say Google is 100 percent of the digital book world, but
it's getting near 90 percent," says Siva Vaidhyanathan, a cultural
historian and media scholar at the University of Virginia, who writes
a blog called "The Googlization of Everything."
Internet Archive has funds to scan 1,000 books per day through the end
of the year, Kahle says, including those at the Library of Congress.
He's exploring new partnerships that would allow the project to
continue into 2009 and beyond.
"It's not the end," he says, but he concedes that now would be a great
time for the next Andrew Carnegie -- the 19th-century industrialist
turned library-building philanthropist -- to step forward and leave
his or her own legacy by financing an open, nonprofit, worldwide
digital library.
"The best works of humankind are not on the Net yet," he says.
Google has partnered with more than two dozen libraries, including
those at Harvard, Stanford, Oxford and Princeton universities and the
New York Public Library. The company uses what amounts to a VIP
library card -- taking books on loan, scanning them, and then
returning them to the library unharmed, says Jon Orwant, engineering
manager of Google Book Search. The digitization costs the libraries
nothing.
In a separate deal with book publishers, Google scans new books with a
less gentile approach. The spines are chopped off and the pages fed
through an optical scanner.
Google won't say how many books it has scanned so far, but it's
certainly in the millions. The company estimates there may be more
than 100 million book titles in the world today.
So far, Google isn't aggressively trying to make money off its book
pages, though a few ads and links to buy hard copies from the
publisher do appear. Keeping users inside Google's online "universe"
seems to be the company's long-term motive.
Books published before 1923 have gone out of copyright and can be
scanned freely, downloaded, or printed. Google obtains permission from
publishers regarding how much of a new book it can display. Though
only short "snippets" of these books usually can be viewed, the whole
text is still searchable, helping readers decide if it contains
information that is useful to them.
Another controversial aspect of Google's stewardship involves the
quality of the digitization. After books are scanned, a process called
optical character recognition (OCR) converts each page into a digital
file whose words can be read by a computer, which makes it searchable.
Computer programs do a good job with OCR on new titles, but older
books with yellowed pages, faded print, or graffiti can prove to be a
problem. Google's final product is "less than 100 percent" accurate,
Orwant concedes.
"Google is doing a very, very poor job. ... Their OCR is very
inaccurate, the image quality is very poor," says Lotfi Belkhir, CEO
of Kirtas Technologies.
The company, in Victor, N.Y., bills itself as the world's leader in
converting books into digital form.
"You find cut-off text," Belkhir says. "You find dirty text. You find
incomplete pages."
He predicts that much of what Google has digitized so far will need to
be rescanned someday to bring it up to acceptable quality.
Mr. Belkhir is contacting libraries that had been working with
Microsoft and says they are receptive to letting Kirtas pick up where
it left off.
Google's Orwant defends his project.
"We certainly believe we're doing the world a very good service," he
says. "We're digitizing all this content. We're making it as open as
the laws allow."
Google always gives a digital copy back to its partners, Orwant says.
"We're never the only people with a copy."
And because Google's contracts with the libraries are nonexclusive,
the libraries are free to work with others to scan their collections
as well.
But that's not enough for critics.
"I don't blame the company, but the question is, 'What do we as
citizens want out of our information system?' " says Vaidhyanathan at
the University of Virginia.
"If we assume that a healthy, diverse, and accessible body of
information is essential to science, politics, creativity,
literature," he says, "then we really have to step back and say, 'Do
we really want to put this one company in the position of being the
filter for the world's information?' "
Copyright © 2008 ABC News Internet Ventures
http://abcnews.go.com/Business/CSM/story?id=5357748&page=1
Citizen Jimserac - 13 Jul 2008 14:41 GMT
> Could Google Monopolize Human Knowledge?
>
[quoted text clipped - 143 lines]
>
> Copyright © 2008 ABC News Internet Ventureshttp://abcnews.go.com/Business/CSM/story?id=5357748&page=1
Excuse me but there are medical, Chinese language and other books that
I would never have had a chance of reading
that were made available thanks to Google's risk
and history making move to make them available.
This continues the breaking of the information
monopoly once enjoyed by the corporatists
before the coming of the Internet.
A variety of sulking phony "innovation" companies,
like Muckrosoft (makers of the failed "Vapidsta" (sic)
oopsperating system) are envious of Google and
can't come close to their innovations.
Citizen Jimserac