Wednesday, 28 July 2010

CALM Digital Records meeting

The last week has been a busy one with two deposits of born digital material (16GB representing more than 27,000 digital files) and a meeting with users of the CALM software to discuss digital records.

Adrian Brown (Parliamentary Archives) convenor of the meeting, hosted by The National Archives, reported on the main findings from a survey of CALM users conducted at the end of 2009. It was clear from the meeting that many archivists were actively investigating the options and issues surrounding a digital repository but that the lack of a digital repository with-in their organisation and the need for training were huge obstacles that needed to be overcome.

I gave a brief outline of the AIMS project and presented a diagram to highlight our current thinking about how Fedora and born-digital material can be integrated into our workflows. [This model is currently still conceptual but we will be working with Axiell to progress this – comments welcomed]. Natalie Walters (Wellcome Library) highlighted their work and how she had found that many of the professional archive skills used to handle and manage paper archives still applied in the born digital arena. Malcolm Todd (The National Archives) talked about four key aspects to digital repository technology modularity, interoperability, sustainability and cost effectiveness all of which are being actively embraced by the AIMS project.

Malcolm Howitt and Nigel Pegg (Axiell) spoke about their plans to extend CALM to link to digital repositories and it is hoped that we can work closely with them on this.

The rest of the meeting was spent discussing and identifying issues surrounding cataloguing and metadata; accession and ingest; user access and best practice. A number of common themes emerged:
• That the differences between paper and digital archives are often exaggerated with issues like provenance and integrity key to both
• That depositor’s perception of digital archives is very different to paper and that by acting promptly archivists is the only way we can avoid technological obsolesce and a digital dark hole in the historical record
• The need for archives staff to be actively involved in the digital repository and not leave it for ICT staff to develop/manage exclusively
• That born digital archives may open-up the archives to new audiences
• A desire to share experiences, documentation etc for the wider benefit of the profession
• A need for more opportunities for “hands-on” experiences with born-digital archives and repository to increase familiarity with-in the archives profession

Tuesday, 27 July 2010


The Born Digital Archives is the blog of the
AIMS team. We hope to stimulate dialog about practical solutions to archiving materials that originate in digital form. We invite all concerned archivists to chime in with questions and comments via the "comments" to each post. AIMS is inclusive with the intention to create open source solutions that are useful to both small and large institutions.

AIMS is “Born Digital Collections: An Inter-Institutional Model for Stewardship". Funded for two years by The Andrew W. Mellon Foundation, the four partners are The University of Virginia Library, Stanford University, the University of Hull, and Yale University. The purpose of AIMS is to implement best practices. AIMS will create and deploy open source software to manage the all steps in the acquisition, conservation, and eventual dissemination of digital collections. Given that we all have somewhat different administrative, hardware, and software needs, AIMS will strive to use portable open source tools which integrate reasonably seamlessly, and allow archivists a flexible workflow. We plan to process several important collections of born digital media, using Hydra to handle discovery. Hydra is based on the Fedora Commons Repository Software.

The AIMS digital archivists have the mandate to nurture a global community by publishing our lessons online, writing manuscripts, attending conferences, and generally being evangelists for born digital. We are optimistic that we can create (or discover) workable solutions to real-world problems involved with processing and preservation of digital objects. The work flow is wide ranging. Archivists often work with donors early in the process. Many collections pose technical and intellectual challenges such as arrangement or presentation. Legal aspects are involved in authority and access. Collections are sometimes ingested into content management systems. Of course, the eventual goal is to make the documents available via discovery by the general public, as well as accessible to scholarly research.

The AIMS team includes a software engineer, and our team is working closely with other software developers. We use normal software development conventions and open source software that runs on commodity hardware. We are agnostic about user interface and operating system so that our solutions will be portable, sustainable, easy to use, and accessible to the broadest possible audience.

Friday, 16 July 2010

Surveying Born Digital Collections

The idea of creating a survey form was initiated by Glynn Edwards (my direct supervisor) at Stanford in May. Glynn would like to have something to guide the discussion of collecting digital material from a donor in July. She started with the Paradigm records survey (published by the Bodleian Library, Oxford University) and asked people at Stanford for comments. After gathering the comments, I posted the revised survey for the AIMS team to discuss. The main comment was the donors (mostly technical non-savvy) might not be able to understand all the technical terms. I have to thank Michael Forstrom at Yale University for sharing his revision of the Paradigm survey which avoid the use of technical terms.

The main difference between the original Paradigm records survey and the AIMS Digital Material Survey are:

1. Additional technology questions (e.g. web based backup, mobile device, social networking sites, document sharing sites, etc.)
2. Division of the survey into 2 parts: Part I is designed to be a prompt sheet for phone / face-to-face interview with donors by curators / digital archivists. Part II is to be filled out by digital archivists regarding technical details of the tools used to create digital material.
3. Usage of non-technical terms.

I think the survey should be sent before the actual interview as "something for the donor to start thinking about". If the donor is willing to reply before the interview, it helps the digital archivist to prepare as well. In fact, I sent the survey to a donor in July and she replied before the interview mentioning that she used Eudora for her emails. Since I was not familiar with Eudora, the answer helped me to get prepared for the interview as well.

Finally, I have to thank Susan Thomas, project manager of the Paradigm and the futureArch project, for her comments on the AIMS Digital Material Survey and her sharing of the experience in using the Paradigm records survey.

We would like to seek your comments on the survey as well. If you are going to discuss with donors on personal digital archives, why not download the survey and give it a try. Even if you are not collecting personal digital archives in the near future, take a look and tell us what you think.

Click below for the survey:
AIMS Digital Material Survey– Personal Digital Archives

Wednesday, 7 July 2010

Digital Lives Research Seminar

On Monday I attended the Digital Lives Research Seminar Authenticity, Forensics, Materiality, Virtuality and Emulation and the presentations will be appearing online soon via the Digital Lives pages

There was a packed programme of speakers with a huge array of experience, of direct relevance to the AIMS work were the following:

Helen Broderick, British Library described her work as Curator, Modern Literary Manuscripts including cataloguing the born-digital material in the Ronald Harwood archive. The paper part of the collection had already been listed by a colleague and Helen encouraged hybrid collections to be tackled as single entity and this is what I intend to do with the Stephen Gallagher material at Hull.

Helen described using QuickView Plus software to view and used two screens (one to display the digital file and the second to record descriptive notes). Other thorny issues to be tackled include email and how this could be made available to others without infringing Data Protection and other privacy concerns.

Seth Shaw, Duke University gave an account of the current work at Duke, openly admitting that work on arrangement and description was very sporadic! They are looking to standardise their policies, documentation etc with the search interface another element on his todo list! It was clear that practice was being shaped by their experiences echoing the underlying element of the best practice guidelines that AIMS will produce based upon our combined experiences.

It was good to see colleague Michael Olson, Stanford University who gave an account of the forensics work at Stanford including the approach adopted for the Stephen J Gould material and outlined the resources in the Forensics Lab.

Gabriela Redwine, Harry Ransom Centre (University of Texas) provided an update on the forthcoming Computer Forensics and Born-Digital Content in Cultural Heritage Collections (see due to be published later this year. It came as no surprise to those present that the biggest challenge the research had identified was legacy hardware and software; other challenges included trust and authenticity. This led to a discussion around some of the ethical issues surrounding born digital materials and that we should be looking to multiple sources of information to build-up a complete picture (metadata, creator and forensics).

Erika Farr & Naomi Nelson, Emory University gave a fascinating account of their work on the digital material in the Salman Rushdie archive and the multi-disciplinary approach to tackling this collection. After discussion and consideration they agreed to respect the hybrid nature of the material; to balance the needs of the researcher and the donor but also the desire to provide an authentic ‘experience’. They had originally distinguished between paper and born-digital material with separate agreements but quickly revised this to one based on content and NOT format. They discussed with the donor his relationship with the PC and how he used it whether he customised parts etc to understand this aspect better. They were even able to recover files from a laptop he had accidentally damaged. The use of an emulator does give a totally different perspective to the born digital material that simply allowing access to the content can ever do. Whether this approach is always possible or practical remains to be seen.

Our host Jeremy John, British Library described their approach and workstream including imaging the disk and creation of digital replicates – viewed via original software and emulator and then facsimile versions for user viewing. He encouraged using hash values generated by two systems as additional level of verification. The British Library policy was to disk image wherever possible and were actively using emulators using a virtual machine based on the original hardware OS.

I was able to give a quick introduction to the AIMS project and from the questions that followed some of our work regarding access and use is of particular interest to others.

Jeff Ubois highlighted the main issues that arose from the Personal Digital Archiving Conference 2010 earlier this year including complexity of media with the need to compare donor agreements, interface design, suitability of tools re Facebook etc identified for future consideration and action. He also spoke about the public/private boundaries and mentioned a Research Libraries Group project ‘Good Terms’ about engaging with public companies for digitisation programs.