Showing posts with label ingest tools. Show all posts
Showing posts with label ingest tools. Show all posts

Monday, 4 July 2011

Curators Workbench workshop

I was fortunate enough to attend the Curator’s Workbench workshop at the British Library last week. It was a chance to see, have a play and discuss the tool with its developers Greg Jansen and Erin O’Meara from University of North Carolina. The tool is designed to aid with the accession, arrangement, description and staging of material prior to ingest into a digital repository. Essentially the tool has an interface designed for archivists can use.

The session featured a walk-through and chance to have a play with experts on-hand if you had a problem – only necessary as we had latest ‘unstable’ release including the latest enhancements to functionality and GUI. Stable versions are available for download via GitHub. I am especially smitten with the crosswalk feature providing a drag’n’drop interface for mapping the metadata with METS. There is also the date recogniser which allows you to map the date format to the ISO standard, though there could be issues if the data is in a variety of formats, ie 1984 would be transformed to 01/01/1984.

It has a different take to where arrangement and description occurs in the workflow to that intended for Hypatia in the AIMS workflow, but it does raise some interesting questions that I hope to explore in more detail over the next few months.

It was also interesting to hear features and functionality on their wish-list including disc images, multiple users, recording processing notes, PREMIS and so the list goes on!
The discussion that followed was really enlightening as it highlighted the different approaches that archives are currently adopting to the preservation of born-digital archives.

I picked-up some useful pointers to software and tools I haven’t used before – Bulk extractor, Google Refine, and came away determined to throw more stuff at Curators Workbench, to join the users discussion list (done) and to figure out some of the aspects we have avoided so far things like PREMIS and METS etc !

Friday, 4 March 2011

File type categories with PRONOM and DROID

In order to assess a born digital accession, the AIMS digital archivists expressed a need for a report on the count of files grouped by type. The compact listing gives the archivist an overview that is difficult to visualize from a long listing. The category report supplements the full list of all files, and helps with a quick assessment after creation of a SIP via Rubymatica. (In a later post I’ll point out some reasons why pre-SIP assessment is often not practical with born digital.)

At the moment we have six categories. Below is a small example ingest:

Category summary for accession ingested files
data3
moving image1
other2
sound2
still image26
textual12
Total46


Some time ago we decided to exclusively use DROID as our file identification software. It works well to identify a broad variety of files, and is constantly being improved. We initially were using file identities from FITS, but the particular identity was highly variable. FITS gives a “best” identity based meta data returned by several utility programs. We wanted a consistent identification as opposed to some files being identified by DROID, some by the “file utility” and some by Jhove. We are currently using the DROID identification by pulling the DROID information out of the FITS xml for each file. This is easy and required very little change to Rubymatica.

PRONOM has the ability to have “classifications” via the XML element FormatTypes. However, there are a couple of issues. The first problem is that the PRONOM team is focused primarily on building new signatures (file identification configurations) and doesn’t have time to focus on low priority tasks such as categories. Second, the categories will almost certainly be somewhat different at each institution.

Happily I was able to create an easy-to-use web page to manage DROID categories. It only took one day to create this handy tool, and the tool is built-in to Rubymatica. The Rubymatica file listing report now has three sections: 1) overview using the categories 2) list of donor files in the ingest with the PRONOM PUID and human readable format name 3) the full list of all files (technical and donor) in the SIP.

This simple report seems anticlimactic, but processing born digital materials consists of many small details, which collectively can be a huge burden if not properly managed and automated. Adding this category feature to Rubymatica was a pleasant process, largely because the PRONOM data is open source, readily available, and delivered in a standard format (XML). My thanks and gratitude to the PRONOM people for their continuing work.

http://www.nationalarchives.gov.uk/PRONOM/Default.aspx

http://droid.sourceforge.net/

As I write this I notice that DROID v6 has just been released! The new version certainly includes a greatly expanded set of signatures (technical data for file identifications). We look forward to exploring all the new features.

Tuesday, 25 May 2010

Practical Approaches to Electronic Records

On Friday I attended the excellent Practical Approaches to Electronic Records event in  Dundee. The programme included thought provoking discussions from Dr Ian Anderson (HATII) and Malcolm Todd (TNA) who both stressed the importance of demonstrating ‘value’ and ‘relevance’ to our organisations, and the need to develop new partnerships with colleagues working in digital forensics, ICT departments and universities to tackle the challenge of digital preservation. WillIam Kilbride (Digital Preservation Coalition) offered some interesting personal reflections on digital preservation and the conclusion that it is not about ‘data’, ‘access’ or ‘risk’ but about people and outcomes.

The afternoon was especially timely as it featured two demonstrations of ingest tools -  something the AIMS project is currently working on. Viv Cothey showed us the work he has done at Gloucestershire Archives on the SCAT tool and this was followed by Peter Cliff demonstrating the BEAM Ingest tool being developed at the Bodleian. Both tools have adopted a modular approach to utilise many of the excellent and widely adopted 3rd party tools such as PRONOM, Jhove and FITS and this is the obvious route to follow as we seek to create an ingest tool that is integrated with the Fedora digital repository.

The day ended with Chris Prom summarising his work to identify and compare many of the open source tools that are available. He encouraged everybody to get involved with a software project and listed the elements that he thought made an excellent Open Source project citing archivematica as a good example as it provided regular updates, clear documentation, availability of source code and support wiki.