Showing posts with label DROID. Show all posts
Showing posts with label DROID. Show all posts

Wednesday, 24 October 2012

Practical First Steps

Last week I helped organise a training day on born-digital archives for the East of England Regional Archive Council. I was joined by Chris Hilton from the Wellcome Library, Ellie Robinson from LSE and Grant Young from Cambridge University Library. The day followed a similar pattern to an event hosted in Hull last November. There were four main elements to the day:

Institutional Overview
The four of us gave a brief overview of the development of digital preservation in our respective institutions and included Chris’s now legendary simplification of OAIS to "Get Stuff - Put stuff somewhere - Keep stuff safe & Show stuff to people".  Ellie talked through the development at LSE from a risk analysis perspective to get institutional backing to then moving on to actually doing it - the latter sentiment being one of the mantras for the day. Grant talked about his work with digital content - much of it digitised rather than born-digital but now occupying an eye-watering 67TB (both LSE and Hull have about 120GB of born-digital material).

Practical First Steps
The four of us then gave a short presentation offering some practical tips; I looked at conducting a survey to identify material already held in the archives and how this often meant the media had been accessioned but not the contents! Chris shared the experiences at Wellcome of 'Dealing with depositors', Ellie looked at 'Handling born-digital material' including accessioning, virus check and other stages at LSE and Grant talked about 'Issues around File Formats' highlighting a number of challenges and suggesting strategies that could be adopted.

Questions and Answers

The day also included two question and answer sessions designed to get delegates talking about the particular aspects and issues of concern to them. Questions touched on a range of topics including depositors, DRAMBORA, how to approach hybrid collections and depositor agreements. We also heard of work being conducted in a number of local authority archives and hopefully they will share their work and experiences with colleagues in the near future.


Demonstrations
Delegates were split into four groups and given demonstrations on using Karen's Directory Printer, DROID and also using FTK Imager with a write-blocker to read a PC hard drive (from my garage) the fourth diversion was a look at two different born-digital scenarios for delegates to consider how they might respond.

There was common agreement on the need to do something, and widespread acknowledgement that there wasn't a single solution or approach. Wellcome, LSE and Hull were all looking at the issue of bulk-ingest into repositories whilst retaining the relationships between files as represented through an often complex series of folders. It so happens that at Hull one of our developers is looking at this very issue so I hope to have an update on this in the next few weeks.

A key theme of the day was collaborating and helpline colleagues and in this spirit all of the presentations are now available on the Hull History Centre born-digital archive pages - thanks to all of the speakers for making this an interesting and informative day.

Friday, 12 October 2012

Not a typical week

At the end of the AIMS project I returned to my post as Senior Archivist with digital archives added to my todo list alongside public searchroom duty, working with paper collections, responsibilities for maintaining our website and online catalogue, managing staff and volunteers etc etc.

This week has not been typical.

Monday
Accession two recent deposits including a small set of floppy disks created between 1995-1999 using a Psion (I think judging by some of the data visible using FTKImager).  The other item was a CD with minutes created in the last couple of years by a charity – so nothing to worry about in terms of formatting but it did highlight issues around filename consistency. I contacted the depositor and they were happy to receive suggestions about future naming conventions which will be a great help. I was also able to ask about material that reflected the complete range of activities of the charity and hope that further material will be forthcoming.

Tuesday
One of the outcomes following the publication of the AIMS White Paper has been to share experiences with colleagues in other institutions. On Tuesday our guests were Nancy McGovern and Kari Smith from MIT and it was a great opportunity to share experiences and discuss aspects surrounding processes, workflows and tools. As always I came away with a list of other tools to try and research papers to look out for! We were joined by my colleague Chris Awre who talked about the work at Hull using Fedora for our institutional repository and in particular Hydra and the opportunity this offered for sharing development work.

Wednesday
Spent some of Wednesday preparing for a one day workshop at Cambridge about born-digital archives next week. The day is designed to encourage colleagues to take the first steps and will include colleagues from LSE and the Wellcome Library and will feature demonstrations of write-blocker hardware and tools including Karen’s Directory Printer and DROID.

Thursday
Received an email out of the blue from a colleague working in Vancouver, which was really nice – they had been following the AIMS Blog and wanted to ask some questions and I was happy to clarify a few aspects that had been mentioned. In replying I also sought more information about their own experiences and whether we had tackled email. Whilst we haven’t tackled this explicitly (yet) I have had a play with the MUSE tool which gives a unique perspective on the stuff with-in an 'mbox' file and offers a sentiment graph that instantly grabs you.

Friday
What better for a Friday afternoon than a quick spell of taking photographs of the floppy disks I accessioned on Monday. It took longer than it should have done due to lack of practice and the need to find something to prop up the disk so we could capture the information written on the edge of the disk – our conservator Christine found a small clear display stand that is ideal and this has been requisitioned for future photographic needs.

This hasn't been a typical week – I have probably done more in the last five days than the preceding two months - but then things rarely are in archives – and for many working in the profession the range and variety is one of the best parts of the job.

Monday, 19 March 2012

Archives and Society

Two weeks ago I spoke at the Archives and Society series of tasks held at the Institute of Historical Research about the progress and work at Hull as a result of the AIMS project. Whilst highlighting the AIMS White Paper the bulk of the talk was about the practical steps we had taken at Hull with born-digital archives; starting with a simple survey of collections and then followed by photography of media and creating a forensic workstation (a tale told in multiple parts see - part 1, part 2 and part 3).

I sought to encourage those present to download software like Karen's Directory Printer and DROID and to have a go - using a few test files will help increase your familiarity with many of the issues associated with digital preservation.

I managed to stop in time for questions - and these included aspects relating to the fact that the issues I raised were not "new" and whether we would still be making the same case in 5 years time (I hope not) and the need for automated tools to help us cope with the sheer volume of material (an obvious need) and the associated risk of releasing material that you haven't explicitly checked because of the sheer volume of files..

A PDF version of the slides is available - the talk was also recorded and I will add a link to the podcast when it is available.

Friday, 4 March 2011

File type categories with PRONOM and DROID

In order to assess a born digital accession, the AIMS digital archivists expressed a need for a report on the count of files grouped by type. The compact listing gives the archivist an overview that is difficult to visualize from a long listing. The category report supplements the full list of all files, and helps with a quick assessment after creation of a SIP via Rubymatica. (In a later post I’ll point out some reasons why pre-SIP assessment is often not practical with born digital.)

At the moment we have six categories. Below is a small example ingest:

Category summary for accession ingested files
data3
moving image1
other2
sound2
still image26
textual12
Total46


Some time ago we decided to exclusively use DROID as our file identification software. It works well to identify a broad variety of files, and is constantly being improved. We initially were using file identities from FITS, but the particular identity was highly variable. FITS gives a “best” identity based meta data returned by several utility programs. We wanted a consistent identification as opposed to some files being identified by DROID, some by the “file utility” and some by Jhove. We are currently using the DROID identification by pulling the DROID information out of the FITS xml for each file. This is easy and required very little change to Rubymatica.

PRONOM has the ability to have “classifications” via the XML element FormatTypes. However, there are a couple of issues. The first problem is that the PRONOM team is focused primarily on building new signatures (file identification configurations) and doesn’t have time to focus on low priority tasks such as categories. Second, the categories will almost certainly be somewhat different at each institution.

Happily I was able to create an easy-to-use web page to manage DROID categories. It only took one day to create this handy tool, and the tool is built-in to Rubymatica. The Rubymatica file listing report now has three sections: 1) overview using the categories 2) list of donor files in the ingest with the PRONOM PUID and human readable format name 3) the full list of all files (technical and donor) in the SIP.

This simple report seems anticlimactic, but processing born digital materials consists of many small details, which collectively can be a huge burden if not properly managed and automated. Adding this category feature to Rubymatica was a pleasant process, largely because the PRONOM data is open source, readily available, and delivered in a standard format (XML). My thanks and gratitude to the PRONOM people for their continuing work.

http://www.nationalarchives.gov.uk/PRONOM/Default.aspx

http://droid.sourceforge.net/

As I write this I notice that DROID v6 has just been released! The new version certainly includes a greatly expanded set of signatures (technical data for file identifications). We look forward to exploring all the new features.