大象传媒

芦 Previous | Main | Next 禄

Loss is not where you find it

Post categories:

Richard Wright Richard Wright | 18:00 UK time, Wednesday, 2 December 2009

My laptop up and died two weeks ago: came to a dead halt, every time, in the boot procedure.ROM diagnostics said "HDD read error".听听 As I'm supposed to be a 大象传媒 R&D and archive expert in digital preservation, this situation is embarrassing.And as I've been using computers since 1965 (really) I should know what I'm doing, unless my approaching geriatric status has caused me to lose the plot.


bsod_flickr_Justin_Marty.jpg
Blue Screen of Death- the result of data loss?听 CC Image from Flickr User Justin Marty

There is a plot: system complexity conspiring to make data inaccessible.It was no coincidence, I am sure, that my first complete disc failure in 17 years come withn two months of the conversion of my laptop's hard drive to full encryption.Lost laptops and compromised personal details are a national problem.The contents of my own laptop would bore anyone else silly, but I'm sure there are all sorts of laptops carrying private and confidential details that deserve full protection.


I just hope that the new encryption systems that now sits on millions of UK hard drives do indeed give protection, because compromise of data is only one risk.Few of us have data whose loss would compromise national security or even embarrass our employers, but all of us have data that we would hate to lose - as I found out at Denver airport when I was about two start two weeks' work in the USA, and had nothing to work with.


I also hope the UK's major (or not so major) IT departments are collecting statistics on computer failures where encryption is implicated.I know in my case they did not, as my efforts to find a way to diagnose the problem while 8000 miles from base led to various changes, and so my dead laptop is logged as 'system rebuild required', not as 'death by encryption'.It is only through statistics that we can understand incidence of failures and their types, and thereby understand the real risks posed by digital technology.Without knowledge of the risks, we can only speculate about where to place our collective efforts - and budgets - that fall in the general area of 'digital preservation'.


Of course, my dead laptop is but one data point, as I am myself, as to that.But I have more: several times recently I've come across major examples of 'loss of data' - and as with my hard drive, it wasn't the data itself that was lost, it was the complexity above the data that got its knickers twisted and so ceased to function.听听
  1. The has a panel, , that is looking at asset management systems.A senior engineer of one of the first such systems to specialise in broadcasting came to Geneva in July, to talk about how a broadcaster 'going tapeless' should go about moving into digital asset management. He mentioned entire collections of online content disappearing, owing to corruption of the database - because an asset management system is 'just' a lot of files, and a database with information about those files.听听 He'd personally experienced that situation twice, and in each case there were backups to revert to, to rebuild the collection and get back into business - after up to two weeks of travail.
  2. Our 大象传媒 collaborative project had a workshop at the beginning of October, where we asked about examples of loss (because I'm trying to collect evidence in order to establish risk - that's what I do).Again, a fully competent IT company working with a major Spanish broadcaster described a database corruption, of an asset management system, and in that case 80% of the material was recovered from backups (taking several days) - and 20% had to be re-ingested, which took several weeks (part time).
  3. All of which should remind us of the 大象传媒's online picture store Elvis, which crashed some years ago and again it was essentially database corruption, compounded by backups that largely failed to work.There was one very effective backup - Lisa, daughter of Elvis - but that held the video-quality scans and not the full-quality scans (as needed by Radio Times and all the other 大象传媒 print publication that Elvis also supports).Something like 100k high-resolution images were lost.


There is a common thread so far - the bits still exist, unaltered, on storage media - but the complexity sitting between the user and the bits has 'ceased to be' in some fashion, and so the whole thing is a dead parrot (and called a storage failure, though it is anything but).


This thread leads to an even greater problems: systems that haven't crashed, but still won't find things, because they are in some way inadequate.We're all now aware of metadata and its purposes, but - just as with data itself - there has to be effective technology using the metadata, or again the results is a 'digital dead parrot'.


You may not know that I'm a prize-winning poet.I was somewhat surprised to learn this myself, but indeed my entry into a competition in Ariel, the 大象传媒's in house paper, won second prize, and they were so pleased that they asked me to make a podcast.As with many print publications, this one also has an online version, where it sticks extras, like my podcast.The problem is, there is no search engine on the online version, or indeed ANY other search technology.There's a list or recent or popular pages, but once content falls off that list, it falls away completely, and becomes as inaccessible as the data on my dead hard drive.As with the hard drive, the data is still there, but inaccessible.The PDF's of the print version are indexed by a 大象传媒 search engine, but the online pages are not.The result is an inaccessible poem: even the person who posted my podcast can no longer find it!


An internal 大象传媒 publication is a tiny issue compared to bbc.co.uk itself, the 大象传媒's world-class media website.大象传媒 policy is to hold the text from bbc.co.uk in a sort-of archive, but reasons of space/budget/complexity mean that the audio and video content on bbc.co.uk is not archived.The justification is: all that audio and video goes out on radio and TV, and so gets archived separately.Has the validity of that statement been checked?How much audiovisual content is NOT also broadcast?I wish I knew!The business case to build a real archive (something with comprehensive capture, and access) was chopped and chopped until it was reduced entirely to a 90-day legal requirements system, with just a couple of access points.Meanwhile, anybody who does want to see 大象传媒 content that has been taken down from bbc.co.uk has to go to , where they do monthly (or thereabouts) scans of the entire internet, and make it available to all through their .


So there are a half-dozen examples, ranging from my laptop to bbc.co.uk, where data can no longer be found because, essentially, of failure or inadequacy of the system sitting between the user and the data.The robust solution to failure is to simplify that technology layer - and unfortunately IT systems are moving in the opposite direction.I fully expect an epidemic of data loss, in direct consequence of the mass installation of encryption on company hard drives.I hope I'm wrong.

Comments

  • Comment number 1.

    Richard Wright:

    That is sad news, that you're computer failed you...At least...your loss is not in vein...

    ~Dennis Junior~

  • Comment number 2.

    鈥淗ow much audiovisual content is NOT also broadcast? I wish I knew!鈥

    Interesting question. It鈥檚 a figure that鈥檚 only going to increase in time as 鈥渕ultiplatform鈥 efforts increase!

    It doesn鈥檛 help, of course, that unlike with 鈥渓ost episodes鈥 of TV series which people may have VHS tapes (or similar) lying around for, the 大象传媒 has a policy鈥 of not permitting downloads of any of the content served via EMP鈥攕o if the 大象传媒 doesn鈥檛 archive it, nobody else can either.


    鈥 Yes, I know why. Doesn鈥檛 mean it鈥檚 sensible in real terms.

  • Comment number 3.

    Do you have a feel for whether the issues lie in hardware failure (e.g. bit drop out) for persistently-stored structures associated with the intermediate layers that lead to the actual content, or whether the problem lies in the implementation of the intermediate layers leading to corrupted structures being stored in the first place?

  • Comment number 4.

    Jerry - I think none of the cases I mention are hardware -- they're either software getting too complex for its own good, or bad human decisions (policies). That really was the point- that we tend to focus on storage and storage management reliability, but all the cases I've recently encountered were so-called higher level:
    - failure of data encryption, locking up the PC; probably failure at the level of interaction between the encryption and the OS; nothing to do with storage, but it made the disc unusable!
    - 3 sorts of database corruption; I guess this counts as 'intermediate layers', though I think of it as just big applications that are imperfect
    - storing content where search doesn't go, and not saving the URL
    - not saving material in the first place: updating websites, meaning previous versions just get lost.

More from this blog...

大象传媒 iD

大象传媒 navigation

大象传媒 漏 2014 The 大象传媒 is not responsible for the content of external sites. Read more.

This page is best viewed in an up-to-date web browser with style sheets (CSS) enabled. While you will be able to view the content of this page in your current browser, you will not be able to get the full visual experience. Please consider upgrading your browser software or enabling style sheets (CSS) if you are able to do so.