Monday 25 July 2011

Forensic workstation pt 1

A key part of dealing with born-digital archives is the ability to receive and process material without making changes to the underlying metadata including date created, date accessed etc – data that researchers will be looking to use and rely on. As archivists we place considerable emphasis on our roles as custodians and with digital material it is important that we treat the material carefully and appropriately. Fortunately there are tools that help us with the authenticity of born-digital files the most obvious of which is the checksum.

An important legacy of the AIMS project for us at Hull is working towards our ability to take born-digital material from depositors as a normal part of our work. A key component of this is a forensic workstation – by which I mean a PC (or two) through which material can be safely captured following a clear process, in-effect replicating the isolation room for receiving paper material. This will allow us to undertake a forensic examination – to check the material is what we expected or agreed to take, including the ability to generate a manifest of the material to send to the depositor, and that it does not include viruses etc.

There seem to be two main routes – to purchase FRED which stands for Forensic Recovery Evidence Device (other digital forensic workstation solutions are available). A second and more organic solution, and the one we intend to adopt at Hull, is to start with a new PC and to add appropriate hardware and software to this to provide the equivalent functionality. At the moment we are pondering a name for this with current suggestions including:
- Hal - Hull Archives Laboratory
- Harold – Hull Archives Recovery Of Legacy Data
- Hilary - Hull Investigator for Library and Archives RecoverY
- Dawn – Digital Archives WorkstatioN
but we are open to other suggestions until the machine is installed and formally named!

We don’t want to become a computer museum with an extensive range of hardware, software and operating systems environments for any possible eventuality. We do want a core ability to handle material we reasonable expect to receive – including material on 3.5” floppy disks, zip disks, hard drives etc. We intend to develop and extend our capacity as need dictates – if we get material in a format we will consider whether we need to support this ourselves or whether a suitable 3rd party is more appropriate.

Central to this is the need for write blockers which prevent you from writing or updating the files. Having read countless websites I felt I knew what they were supposed to do but had a nagging doubt that my knowledge was incomplete.

A tour of the British Library eMss Labs courtesy of Jeremy Leighton John (as featured on the BBC Radio 4 programme 'Tales from the Digital Archives' broadcast in May but still available online) confirmed the simplicity of theory and the fragility of the media – just having the hardware isn’t enough – you also need some luck that you have the correct drivers to read the specific version of the media. In the next few weeks I hope to place our order for the various bits and pieces and will update you on this exciting journey!!

Monday 4 July 2011

Curators Workbench workshop

I was fortunate enough to attend the Curator’s Workbench workshop at the British Library last week. It was a chance to see, have a play and discuss the tool with its developers Greg Jansen and Erin O’Meara from University of North Carolina. The tool is designed to aid with the accession, arrangement, description and staging of material prior to ingest into a digital repository. Essentially the tool has an interface designed for archivists can use.

The session featured a walk-through and chance to have a play with experts on-hand if you had a problem – only necessary as we had latest ‘unstable’ release including the latest enhancements to functionality and GUI. Stable versions are available for download via GitHub. I am especially smitten with the crosswalk feature providing a drag’n’drop interface for mapping the metadata with METS. There is also the date recogniser which allows you to map the date format to the ISO standard, though there could be issues if the data is in a variety of formats, ie 1984 would be transformed to 01/01/1984.

It has a different take to where arrangement and description occurs in the workflow to that intended for Hypatia in the AIMS workflow, but it does raise some interesting questions that I hope to explore in more detail over the next few months.

It was also interesting to hear features and functionality on their wish-list including disc images, multiple users, recording processing notes, PREMIS and so the list goes on!
The discussion that followed was really enlightening as it highlighted the different approaches that archives are currently adopting to the preservation of born-digital archives.

I picked-up some useful pointers to software and tools I haven’t used before – Bulk extractor, Google Refine, and came away determined to throw more stuff at Curators Workbench, to join the users discussion list (done) and to figure out some of the aspects we have avoided so far things like PREMIS and METS etc !