[A2k] online project to build 'library' of every book in every language launched
Michelle Childs
michelle.childs@cptech.org
Wed Aug 1 11:22:04 2007
--
[ Picked text/plain from multipart/alternative ]
http://news.bbc.co.uk/1/hi/magazine/6924022.stm
A library bigger than any building
By Giles Turnbull
Library
An ambitious project to create an online catalogue of every book in
every language ever published is underway. Public goodwill is not in
doubt, but some libraries remain to be convinced.
A few years ago, the idea of getting random people around the world
to write their own encyclopaedia would have been madness - but that
didn't stop the founders of Wikipedia doing just that, and it has
turned out to be one of the most successful web projects of recent
years.
With that in mind, does it sound mad to want to try and build an
online catalogue of every book ever published, anywhere in the world?
The Open Library, newly launched in the USA but global in scope, is
designed to make that happen.
In the words of its creators, the idea is to build a virtual library
that stores details of not just "every book on sale, or every
important book, or even every book in English; but simply every book."
But what's the Open Library really for? Aaron Swartz, leader of the
technical team working on Open Library, suggests that every book ever
published needs a single authoritative page on the internet, a bit
like a personal homepage.
"Right now, if you want to link to a book on the web, the main place
people go is Amazon. It's kind of a bad idea for one commercial site
to be the definitive source for book information on the internet, so
we want to have a site that brings together information from
commercial publishers, reviewers, users, libraries, everywhere.
"This site will become the place where you can find interesting books
and information about them, whether they're in print, out of print,
out of copyright or whatever."
Such a library has to be virtual. No building would ever be large
enough to house all books; no single group or government could afford
to build it, or employ the necessary staff. If the Open Library is to
succeed, it has to be a virtual space, and open to everyone,
Wikipedia-style.
"There are tons of books out there and tons of information about
those books. There's no way even a large group of librarians is going
to be able to collect it all. We think of it as an analogue to
Wikipedia. There are some great encyclopaedias written by small
groups of experts, but to get something as wide-ranging and varied as
Wikipedia, you need to let everyone in."
To start things off, the Open Library is calling on other libraries
to donate their catalogues. This alone presents huge technical
challenges, since the data sets come in different formats and
different languages, and each set comes with its own quirks,
repetitions and errors.
What's important is keeping the data in a structured form, so that
the database working behind the scenes knows the difference between
an author, a title and a publisher.
"We had to build this new type of wiki software which was an exciting
challenge, because you had to set it up so that instead of just
having one kind of page people can edit, we have lots of different
kinds.
Google similarities
"People can edit authors, they can edit books, they can edit text
pages, and so on. So there's a lot of new stuff we had to build. And
that's just the infrastructure - there were also lots of things to
import, and book data to merge and make searchable."
An Open Library page is meant to be as comprehensive as possible.
There are data fields for every possible bit of information that
could exist about each published work. If copyright allows, there
will a copy of the book to download, or links to copies of it
elsewhere (such as the Gutenberg Project to digitise cultural works).
For the time being, funding comes from the Internet Archive, another
non-profit project that has the simple aim of keeping copies of the
internet for the benefit of generations to come. But in future, the
Open Library will depend on donations and taking a cut of any book
sales it hands over to the big online booksellers.
Income will matter more in the face of commercial competition. The
Google Books Library Project, part of the larger Google Book Search
service, has broadly similar aims.
The Google Book Search Library Project sets out "to work with
publishers and libraries to create a comprehensive, searchable,
virtual card catalogue of all books in all languages that helps users
discover new books and publishers discover new readers,"
Naturally, Google has its own commercial interests to protect and
invest in. The Open Library's approach is the opposite, committed as
it is to the ultimate in freedom of information acts: not only can
anyone browse, search, and read the books in its catalogue - they can
re-write the catalogue itself as they go.
Malicious alterations?
But while the rise of Wikipedia proves there is no shortage of
enthusiasm among the public to build informative sites for general
consumption, not all libraries are signed up to the Open Library
ethos, including the British Library.
Stephen Bury, head of European and American Collections at the
British Library in London, has some reservations about contributing
to the Open Library project.
"In the short term, I don't think we will send them a copy of our
catalogue. We only have limited resources and we need them to
concentrate their efforts on our own digitisation projects," he says.
"We have always supported digitisation, and the more the merrier. But
there's some scepticism as to whether one day the Open Library might
become a commercial site with adverts and so on."
Mr Bury was not keen on the idea of allowing ordinary people to edit
library catalogues themselves.
"I think there's a need for balance and some degree of control. You
might get people maliciously changing things."
Michelle Childs
Knowledge Ecology International
michelle.childs@cptech.org