[Ecommerce] BBC- Digital archives losses and open standards
Michelle Childs
michelle.childs@cptech.org
Thu Jul 12 17:37:01 2007
--
[ Picked text/plain from multipart/alternative ]
http://news.bbc.co.uk/1/hi/technology/6265976.stm
Warning of data ticking time bomb
The growing problem of accessing old digital file formats is a
"ticking time bomb", the chief executive of the UK National Archives
has warned.
Natalie Ceeney said society faced the possibility of "losing years of
critical knowledge" because modern PCs could not always open old file
formats.
She was speaking at the launch of a partnership with Microsoft to
ensure the Archives could read old formats.
Microsoft's UK head Gordon Frazer warned of a looming "digital dark
age".
He added: "Unless more work is done to ensure legacy file formats can
be read and edited in the future, we face a digital dark hole."
Research by the British Library suggests Europe loses 3bn euros each
year in business value because of issues around digital preservation.
The National Archives, which holds 900 years of written material, has
more than 580 terabytes of data - the equivalent of 580,000
encyclopaedias - in older file formats that are no longer
commercially available.
Ms Ceeney said: "If you put paper on shelves, it's pretty certain it
is going to be there in a hundred years.
"If you stored something on a floppy disc just three or four years
ago, you'd have a hard time finding a modern computer capable of
opening it."
"Digital information is in fact inherently far more ephemeral than
paper," warned Ms Ceeney.
She added: "The pace of software and hardware developments means we
are living in the world of a ticking time bomb when it comes to
digital preservation.
Historically within the IT industry the prevailing trend was for
proprietary file formats
Gordon Frazer, Microsoft
"We cannot afford to let digital assets being created today
disappear. We need to make information created in the digital age to
be as resilient as paper."
But Ms Ceeney said some digital documents held by the National
Archives had already been lost forever because the programs which
could read them no longer existed.
"We are starting to find an awful lot of cases of what has been lost.
What we have got to make sure is that it doesn't get any worse."
The root cause of the problem is the range of propriatorial file
formats which proliferated during the early digital revolution.
Technology companies, such as Microsoft, used file formats which were
not only incompatible with pieces of software from rival firms, but
also between different iterations of the same program.
Mr Frazer said Microsoft had shifted its position on file formats.
"Historically within the IT industry, the prevailing trend was for
proprietary file formats. We have worked very hard to embrace open
standards, specifically in the area of file formats."
Microsoft has developed a new document file format, called Open XML,
which is used to save files from programs such Word, Excel and
Powerpoint.
Mr Frazer said: "It's an open international standard under
independent control. These are no longer under control of Microsoft
and are free for access by all."
But some critics question Microsoft's approach and ask why the firm
has created its own new standard, rather than adopting a rival
system, called the Open Document Format.
Instead, Microsoft has released a tool which can translate between
the two formats.
Ben Laurie, director of the Open Rights Group, said: "This is a well-
known, standard Microsoft move.
"Microsoft likes lock-ins. Typically what happens is that you end up
with two or three standards."
The agreement between the National Archives and Microsoft centres on
the use of virtualisation.
The archive will be able to read older file formats in the format
they were originally saved by running emulated versions of the older
Windows operating systems on modern PCs.
Floppy discs
Data on floppy discs may be in unreadable formats
For example, if a Word document was saved using Office 97 under
Windows 95, then the National Archives will be able to open that
document by emulating the older operating system and software on a
modern machine.
Ms Ceeney said the issue of older file formats was a bigger problem
than reading outdated forms of media, such as floppy discs of various
sizes and punch cards.
"The media it is stored in is not relevant. Back-up is important, but
back-up is not preservation."
Adam Farquhar, head of e-architecture at the British Library, praised
Microsoft for its adoption of more open standards.
He said: "Microsoft has taken tremendous strides forward in
addressing this problem. There has been a sea change in attitude."
He warned that the issue of digital preservation did not just effect
National Archives and libraries.
"It's everybody - from small businesses to university research groups
and authors and scientists.
"It's a huge challenge for anyone who keeps digital information for
more than 15 years because you are talking about five different
technology generations."
The British Library and National Archives are members of the Planets
project which brings together European National Libraries and
Archives and technology companies to address the issue of digital
preservation.
He said that open file formats were an important step but there was
still work to be done.
"Automation is a key area to work on. We need to be able to convert
hundreds and even thousands of documents at a time," he said.
Michelle Childs
Head of European Affairs
Knowledge Ecology International
michelle.childs@cptech.org