Two weeks ago I spoke at the Archives and Society series of tasks held at the Institute of Historical Research about the progress and work at Hull as a result of the AIMS project. Whilst highlighting the AIMS White Paper the bulk of the talk was about the practical steps we had taken at Hull with born-digital archives; starting with a simple survey of collections and then followed by photography of media and creating a forensic workstation (a tale told in multiple parts see - part 1, part 2 and part 3).
I sought to encourage those present to download software like Karen's Directory Printer and DROID and to have a go - using a few test files will help increase your familiarity with many of the issues associated with digital preservation.
I managed to stop in time for questions - and these included aspects relating to the fact that the issues I raised were not "new" and whether we would still be making the same case in 5 years time (I hope not) and the need for automated tools to help us cope with the sheer volume of material (an obvious need) and the associated risk of releasing material that you haven't explicitly checked because of the sheer volume of files..
A PDF version of the slides is available - the talk was also recorded and I will add a link to the podcast when it is available.
Showing posts with label appraisal. Show all posts
Showing posts with label appraisal. Show all posts
Monday, 19 March 2012
Friday, 20 January 2012
AIMS White Paper now available
After a huge amount of effort the AIMS White Paper has finally been finished and is now available online.
The White Paper is intended as a framework for guide good practice in terms of archival tasks and objectives necessary for success. It builds upon the experiences of the four partner institutions - the universities of Hull, Stanford, Virginia and Yale - to process a range of collections with an array of format and media issues and using different software – we were keen to make this software agnostic and have gone back to the archival principles at the heart of the processes.
In many areas we found many similarities with existing practice with paper records and for some aspects we found there were multiple ways of achieving certain goals and we didn't want to be prescriptive in any way. So instead it highlights key decision points and aspects of policy that may be determined at an institutional level and is intended to help people making the same journey that we have made – finding out about projects, tools & case studies and starting to build knowledge, skills & infrastructure.
Although the publication of the White Paper officially marks the end of the AIMS project the institutions intend to continue collaborating and sharing their experiences on this blog.
We welcome your comments and feedback to the White Paper on this blog – whether you have implemented the framework or just found the guidance useful.
Friday, 4 March 2011
File type categories with PRONOM and DROID
In order to assess a born digital accession, the AIMS digital archivists expressed a need for a report on the count of files grouped by type. The compact listing gives the archivist an overview that is difficult to visualize from a long listing. The category report supplements the full list of all files, and helps with a quick assessment after creation of a SIP via Rubymatica. (In a later post I’ll point out some reasons why pre-SIP assessment is often not practical with born digital.)
At the moment we have six categories. Below is a small example ingest:
Category summary for accession ingested files
Some time ago we decided to exclusively use DROID as our file identification software. It works well to identify a broad variety of files, and is constantly being improved. We initially were using file identities from FITS, but the particular identity was highly variable. FITS gives a “best” identity based meta data returned by several utility programs. We wanted a consistent identification as opposed to some files being identified by DROID, some by the “file utility” and some by Jhove. We are currently using the DROID identification by pulling the DROID information out of the FITS xml for each file. This is easy and required very little change to Rubymatica.
PRONOM has the ability to have “classifications” via the XML element FormatTypes. However, there are a couple of issues. The first problem is that the PRONOM team is focused primarily on building new signatures (file identification configurations) and doesn’t have time to focus on low priority tasks such as categories. Second, the categories will almost certainly be somewhat different at each institution.
Happily I was able to create an easy-to-use web page to manage DROID categories. It only took one day to create this handy tool, and the tool is built-in to Rubymatica. The Rubymatica file listing report now has three sections: 1) overview using the categories 2) list of donor files in the ingest with the PRONOM PUID and human readable format name 3) the full list of all files (technical and donor) in the SIP.
This simple report seems anticlimactic, but processing born digital materials consists of many small details, which collectively can be a huge burden if not properly managed and automated. Adding this category feature to Rubymatica was a pleasant process, largely because the PRONOM data is open source, readily available, and delivered in a standard format (XML). My thanks and gratitude to the PRONOM people for their continuing work.
http://www.nationalarchives.gov.uk/PRONOM/Default.aspx
http://droid.sourceforge.net/
As I write this I notice that DROID v6 has just been released! The new version certainly includes a greatly expanded set of signatures (technical data for file identifications). We look forward to exploring all the new features.
At the moment we have six categories. Below is a small example ingest:
Category summary for accession ingested files
| data | 3 |
| moving image | 1 |
| other | 2 |
| sound | 2 |
| still image | 26 |
| textual | 12 |
| Total | 46 |
Some time ago we decided to exclusively use DROID as our file identification software. It works well to identify a broad variety of files, and is constantly being improved. We initially were using file identities from FITS, but the particular identity was highly variable. FITS gives a “best” identity based meta data returned by several utility programs. We wanted a consistent identification as opposed to some files being identified by DROID, some by the “file utility” and some by Jhove. We are currently using the DROID identification by pulling the DROID information out of the FITS xml for each file. This is easy and required very little change to Rubymatica.
PRONOM has the ability to have “classifications” via the XML element FormatTypes. However, there are a couple of issues. The first problem is that the PRONOM team is focused primarily on building new signatures (file identification configurations) and doesn’t have time to focus on low priority tasks such as categories. Second, the categories will almost certainly be somewhat different at each institution.
Happily I was able to create an easy-to-use web page to manage DROID categories. It only took one day to create this handy tool, and the tool is built-in to Rubymatica. The Rubymatica file listing report now has three sections: 1) overview using the categories 2) list of donor files in the ingest with the PRONOM PUID and human readable format name 3) the full list of all files (technical and donor) in the SIP.
This simple report seems anticlimactic, but processing born digital materials consists of many small details, which collectively can be a huge burden if not properly managed and automated. Adding this category feature to Rubymatica was a pleasant process, largely because the PRONOM data is open source, readily available, and delivered in a standard format (XML). My thanks and gratitude to the PRONOM people for their continuing work.
http://www.nationalarchives.gov.uk/PRONOM/Default.aspx
http://droid.sourceforge.net/
As I write this I notice that DROID v6 has just been released! The new version certainly includes a greatly expanded set of signatures (technical data for file identifications). We look forward to exploring all the new features.
Tuesday, 26 October 2010
Update on the Donor Survey
As our readers may recall, this past July, the AIMS archivists created a donor survey for born-digital archives. My colleague, Peter Chan, wrote fairly extensively on its origins and purpose; please go here to read up on the background of our survey.
A few months have gone by and the archivists have had the opportunity to think more about how we envision the donor survey fitting into both shared and institution-specific born-digital workflows. First of all, we all agreed that we wanted to move away, as much as is possible, from continuing to create paper-based forms and records regarding donors and content. Moving the donor survey to a web-based tool, complete with an SQLLite database back-end, seemed to be a good way to start (for technical specifics, please see Tom's forthcoming entry regarding the web form - coming up next!). In the web-based survey, we deliberately included a space for the archivist to record comments for each question and answer on the survey. We realized that by creating a place for the archivist to record their findings and/or elaborate on what was recorded by the donor/owner of the personal archive, we could make the process of determining the scope of the personal archive for transfer that much more transparent. As one of our senior archivists on the project pointed out, it's as important to know what was excluded from transfer and why as to have a trail of documentation as to what was transferred and why (especially if the processing of the collection follows many months later!). We hope that adding this feature to the survey will help with the recording of that process in a centralized location and perhaps serve as the digital equivalent to a donor file.
As to how the donor survey fits into our shared and institution-specific workflows, that is still a work in process. Generally speaking, it is intended that the data collected from the survey could be mapped to a submission agreement, which, in turn, would then be part of the SIP (submission information packet). We also intend to map portions of what had been collected from the survey and submission agreement in Archivists' Toolkit and Calm (collection management software from the UK) to form an accession record. Ideally, we want to have to enter/create data once and have it re-purposed as often as is needed throughout our workflow.
We invite you to test out our web survey and to give us your feedback. In our next entry, Tom will be posting a description of the technical side of the survey web form and he'll include a link for access. Other folks have been working on other versions of surveys for electronic records as well. If you're not already familiar with Chris Prom's blog, Practical E-Records, get a readin'. Chris recently posted a version of a donor survey; check it out here.
Liz Gushee
University of Virginia
A few months have gone by and the archivists have had the opportunity to think more about how we envision the donor survey fitting into both shared and institution-specific born-digital workflows. First of all, we all agreed that we wanted to move away, as much as is possible, from continuing to create paper-based forms and records regarding donors and content. Moving the donor survey to a web-based tool, complete with an SQLLite database back-end, seemed to be a good way to start (for technical specifics, please see Tom's forthcoming entry regarding the web form - coming up next!). In the web-based survey, we deliberately included a space for the archivist to record comments for each question and answer on the survey. We realized that by creating a place for the archivist to record their findings and/or elaborate on what was recorded by the donor/owner of the personal archive, we could make the process of determining the scope of the personal archive for transfer that much more transparent. As one of our senior archivists on the project pointed out, it's as important to know what was excluded from transfer and why as to have a trail of documentation as to what was transferred and why (especially if the processing of the collection follows many months later!). We hope that adding this feature to the survey will help with the recording of that process in a centralized location and perhaps serve as the digital equivalent to a donor file.
As to how the donor survey fits into our shared and institution-specific workflows, that is still a work in process. Generally speaking, it is intended that the data collected from the survey could be mapped to a submission agreement, which, in turn, would then be part of the SIP (submission information packet). We also intend to map portions of what had been collected from the survey and submission agreement in Archivists' Toolkit and Calm (collection management software from the UK) to form an accession record. Ideally, we want to have to enter/create data once and have it re-purposed as often as is needed throughout our workflow.
We invite you to test out our web survey and to give us your feedback. In our next entry, Tom will be posting a description of the technical side of the survey web form and he'll include a link for access. Other folks have been working on other versions of surveys for electronic records as well. If you're not already familiar with Chris Prom's blog, Practical E-Records, get a readin'. Chris recently posted a version of a donor survey; check it out here.
Liz Gushee
University of Virginia
Subscribe to:
Comments (Atom)