Tuesday, 16 August 2011

AIMS at SAA

Today's post is just a brief announcement...The AIMS team will be taking part in two events at next week's Society of American Archivists Annual Meeting. The first is a workshop we've developed to provide an opportunity for archivists and technologists to discuss issues related to collection development, accessioning, appraisal, arrangement and description, and discovery and access of these materials. Unfortunately, space issues have required us to limit registration and it is now full. However, we promise to post a longer recap to this blog after the event.

No such limitations exist for our other SAA event, a presentation entitled Born-Digital Archives in Collecting Repositories: Turning Challenges into Byte-Size Opportunities, which will be given August 27th at 8 a.m. At this presentation the AIMS Digital Archivists will describe a bit of the high-level framework being developed by the AIMS project to characterize archival workflows for born-digital materials in archival repositories.

We hope to see you there!

Friday, 12 August 2011

Digital Forensics for Digital Archivists

I’ve been very fortunate here at UVa to have at my disposal some wonderful resources for getting up to speed with born-digital theory and practice. First and foremost, UVa is home to Rare Book School which has offered a course on Born Digital Materials for the past two years (and I’ve just learned will offer it again in 2012). I was able to take this course in July along with 11 fellow classmates from around the country. A week and a half later I was then off to the headquarters of Digital Intelligence, Inc. makers of our Forensic Recovery of Evidence Device (FRED) for Computer Forensics with FRED. This was a two day course covering basic digital forensic skills as well as the FRED system.

Mulder and Scully are concerned about the viability of this forensic evidence gathered next to UVa's FRED...

Given my great bounty, and my belief in professional karma, I’ve decided to give a brief overview of both of these classes here on the blog followed by my thoughts on a potential Digital Forensics for Archivists class/workshop that I’d really like to see developed, by myself or whomever! Two major classes out there that I have not taken are the DigCCurr Professional Institute and SAA’s electronic records workshop. Anyone with experiences in those classes, please add your comparisons in the comments.

RBS L95 — Born Digital Materials: Theory and Practice

Overall, I’d say this class has the perfect name: there’s an almost equal amount of theory and practice. That may sound like faint praise, but it’s really not. It’s something that too few workshops or classes get right. Instructors Naomi Nelson and Matt Kirschenbaum deserve much credit for a well constructed week that built practice on top of theory.

For someone new to the field of the born-digital it’s a great foundation. Concepts like metadata, preservation, “the cloud,” essential characteristics, physicality/materiality and digital humanities are combined with real-life examples from libraries, archives, and the university. This overview allowed us to attack the fundamental question of the class: what should we be trying to accomplish when we attempt to “save” (or steward, curate, safeguard, preserve, “archive”) born-digital materials.

On the practical side of things, digital forensics is covered and students get the opportunity to do a few lab exercises with emulators, floppy drives, and older models of equipment. The syllabus and reading list provide an excellent bibliography for further research.

It’s a relatively high-level class and therefore a great way to get started or a great way to get administrators thinking intelligently about the issues they need to face. I think that a more practitioner-focused and through digital forensics curriculum in the archives or cultural heritage setting could complement the course very nicely.

Computer Forensics with FRED training

University of Virginia decided to invest in the FRED technology last year and has not regretted it. While the FRED can do lots of neat things, I feel it is important to note that many or all of the same things can be done with other hardware and software, it just takes a bit more persistence. Similarly, despite the name a lot of this course dealt with basic data and file system concepts, as well as a little bit about some of the specific hardware most commonly found. In the future, DI is going to be splitting this up into two classes: Digital Forensic Essentials and Digital Forensics with FRED. The first part is a two day course and covers the hardware, data, and system stuff. The second is a one day class that covers the specifics of FRED. Although the first class will be more expensive than the current combined class is, it would be of more interest to those in the archival world.

As it is geared for law enforcement, a lot of time was spent on detected deleted, fraudulent, or hidden material. While all the cops in the room thought that this would be of no use to me, I disagreed. I need to know what I am collecting (whether inadvertent or not), whether it is authentic, and how to communicate with donors to decide how to deal with it. In addition, if we can get donors to agree to let us transfer backup or deleted versions of manuscripts, we’ll gain a wealth of information about how the final version evolved. Knowing that such recovery is possible is one of the more glamorous promises of digital forensics.

We also learned how to create and navigate disk images. While some of this stuff was fairly easy for me to pick up beforehand from Peter Chan’s tutorials, the extra practice and insight was very useful.

Digital Forensics for Archivists

Based on my experiences in these two classes, I would propose a Digital Forensics for Archivists workshop geared specifically for those interested in incorporating forensic techniques into the capture and processing of digital materials. The outline of topics I would expect to see on the syllabus below is probably a bit ambitious for a one-day workshop and would certainly have some hurdles to overcome related to provisioning hardware for all. However, these are the areas I’ve come to think of as necessary for an archive to be prepared for the variety of media that we will be collecting for the continuing future.

Digital Forensics for Archivists


  • Hardware basics

    • IDE, SCSI, SATA, USB, Firewire
    • Floppy drives
    • Optical disks
    • Hard drives
    • Internal basics (motherboard, pci, power, etc.)

  • Operating Systems

    • DOS
    • Windows
    • MAC OS
    • Linux

  • File system basics

    • FAT

    • NTFS

    • HPFS

  • Forensic vs. logical copying

    • What happens to deleted data

    • How it can be recovered

    • Why you need to know…

  • Write blocking

    • How to achieve it

  • Image files

    • Types

    • Software

    • Uses

  • Emulation and Migration

    • Cost/benefit of each

    • Possible use cases for each

So what do you think? Pipe dream? Useful? Impractical? Let me know in the comments…


Monday, 25 July 2011

Forensic workstation pt 1

A key part of dealing with born-digital archives is the ability to receive and process material without making changes to the underlying metadata including date created, date accessed etc – data that researchers will be looking to use and rely on. As archivists we place considerable emphasis on our roles as custodians and with digital material it is important that we treat the material carefully and appropriately. Fortunately there are tools that help us with the authenticity of born-digital files the most obvious of which is the checksum.

An important legacy of the AIMS project for us at Hull is working towards our ability to take born-digital material from depositors as a normal part of our work. A key component of this is a forensic workstation – by which I mean a PC (or two) through which material can be safely captured following a clear process, in-effect replicating the isolation room for receiving paper material. This will allow us to undertake a forensic examination – to check the material is what we expected or agreed to take, including the ability to generate a manifest of the material to send to the depositor, and that it does not include viruses etc.

There seem to be two main routes – to purchase FRED which stands for Forensic Recovery Evidence Device (other digital forensic workstation solutions are available). A second and more organic solution, and the one we intend to adopt at Hull, is to start with a new PC and to add appropriate hardware and software to this to provide the equivalent functionality. At the moment we are pondering a name for this with current suggestions including:
- Hal - Hull Archives Laboratory
- Harold – Hull Archives Recovery Of Legacy Data
- Hilary - Hull Investigator for Library and Archives RecoverY
- Dawn – Digital Archives WorkstatioN
but we are open to other suggestions until the machine is installed and formally named!

We don’t want to become a computer museum with an extensive range of hardware, software and operating systems environments for any possible eventuality. We do want a core ability to handle material we reasonable expect to receive – including material on 3.5” floppy disks, zip disks, hard drives etc. We intend to develop and extend our capacity as need dictates – if we get material in a format we will consider whether we need to support this ourselves or whether a suitable 3rd party is more appropriate.

Central to this is the need for write blockers which prevent you from writing or updating the files. Having read countless websites I felt I knew what they were supposed to do but had a nagging doubt that my knowledge was incomplete.

A tour of the British Library eMss Labs courtesy of Jeremy Leighton John (as featured on the BBC Radio 4 programme 'Tales from the Digital Archives' broadcast in May but still available online) confirmed the simplicity of theory and the fragility of the media – just having the hardware isn’t enough – you also need some luck that you have the correct drivers to read the specific version of the media. In the next few weeks I hope to place our order for the various bits and pieces and will update you on this exciting journey!!

Monday, 4 July 2011

Curators Workbench workshop

I was fortunate enough to attend the Curator’s Workbench workshop at the British Library last week. It was a chance to see, have a play and discuss the tool with its developers Greg Jansen and Erin O’Meara from University of North Carolina. The tool is designed to aid with the accession, arrangement, description and staging of material prior to ingest into a digital repository. Essentially the tool has an interface designed for archivists can use.

The session featured a walk-through and chance to have a play with experts on-hand if you had a problem – only necessary as we had latest ‘unstable’ release including the latest enhancements to functionality and GUI. Stable versions are available for download via GitHub. I am especially smitten with the crosswalk feature providing a drag’n’drop interface for mapping the metadata with METS. There is also the date recogniser which allows you to map the date format to the ISO standard, though there could be issues if the data is in a variety of formats, ie 1984 would be transformed to 01/01/1984.

It has a different take to where arrangement and description occurs in the workflow to that intended for Hypatia in the AIMS workflow, but it does raise some interesting questions that I hope to explore in more detail over the next few months.

It was also interesting to hear features and functionality on their wish-list including disc images, multiple users, recording processing notes, PREMIS and so the list goes on!
The discussion that followed was really enlightening as it highlighted the different approaches that archives are currently adopting to the preservation of born-digital archives.

I picked-up some useful pointers to software and tools I haven’t used before – Bulk extractor, Google Refine, and came away determined to throw more stuff at Curators Workbench, to join the users discussion list (done) and to figure out some of the aspects we have avoided so far things like PREMIS and METS etc !

Tuesday, 21 June 2011

Photographing the digital: creating images of Hull University Archives’ digital media

A guest posting from Nicola Herbert, Digital Project Preservation Assistant at Hull University Archives

Over the last few months I have been working with the AIMS team at Hull University. My role entails getting stuck into some practical processing of the born-digital collections in the Hull University Archives as well as planning aspects of digital preservation. A lot of our work so far has been to discover and document the material that we already hold in what we thought were purely paper collections and I have written a workflow for the discovery of these items and their preparation for ingest into Fedora. As part of this workflow we decided to photograph all of the removable media we currently have and create a process for photography of new deposits when they arrive.

Why bother?
By retaining photographs of the original media alongside content we will be able to provide an image of the appearance of the original media to researchers if they request it. For the foreseeable future we are storing the image files on a shared drive, but they will eventually be stored as an element of metadata with the digital files in our Fedora Repository. We will be dealing with large numbers of media items so need to ensure consistency in the way the media is photographed and information recorded from those images.

Process
Having not previously numbered the discs, we decided on a simple running number within each accession. Despite our familiarity with labelling paper material, it seemed more complicated with digital. Our conservator advised against sticking labels (even conservation grade) onto the plastic casing of a floppy or Amstrad disc. Though a specialist CD marker can be used to label CDs, we were reluctant to permanently mark the items! After a worryingly long thought process we decided to stick to the old faithful method of writing in pencil on the existing label or case.

I then started planning the process. Despite trying to anticipate the different elements of information to include for each media type, it was only trial runs photographing actual media that gave the full picture - i.e. that Amstrad discs have three aspects to photograph (Side A, Side B and the edge). Lots of seemingly trivial questions arose - like whether to photograph the case or whether to photograph a label if blank. Getting the process right from the start will save time in the long run.


We decided to create a ‘clapperboard’ to photograph with the items for a failsafe way to ensure easy identification. I decided on a reusable form printed on a transparency which we can label with a drywipe marker. Putting theory into practice needed several trial runs; after each one I adapted the form and the procedure.

In addition I wrote up detailed notes describing the procedure for each type of media we anticipate encountering. We worked out a sensible image quality – so to ensure legibility of the labels without clogging up our servers with unnecessarily large images. Once the photographs have been taken they are renamed and filed. We also maintain an inventory of the items and record the media and label information alongside it. This ensures that if we send items (like our Amstrad discs) away to a third party we can match them to our records when they return.

This process has been satisfying to complete and enables us to tick at least one thing off our to-do list. Anyone can get this part of the process completed – even for material which is stored on a shared drive, photography of the original media is a useful process.

Wednesday, 18 May 2011

AIMS: the UnConference



Not two full weeks into my new job as Digital Archivist at UVa on the AIMS grant, I rolled up my sleeves to facilitate and host an unconference with my fellow Digital Archivists. Our unconference would be two full days of discussions, demonstrations, lightning talks, and networking with digital archivists from around the globe. At first the thought was a little terrifying – I’m not even fully sure I know what this job is yet, how could I actually lead discussions on the salient topics? But my fears were baseless: all the unconference attendees were thoughtful, articulate, and lively participants. I learned much more from them than they probably did from me.

The unconference was held on the 13th and 14th of May at the Omni Hotel in Charlottesville. The 27 participants represented libraries, archives, museums, and digital humanities centers across the US, Canada, and the United Kingdom. Despite the differences in our institutions, backgrounds, and training, we learned that we not only shared similar challenges, but also the same hopes for collaboration and innovation.

The first day started off with a round of lightning talks. Each participant had 5 minutes to present a topic, project, problem or idea that they were interested in talking about. The variety in the talks was remarkable to me, traversing the breadth and depth of all that can be thought of as “born-digital” and the many processes involved in managing it. The lightning talks were also great way to get an introduction to each participant as well as their perspective or the particular issues they were dealing with in their institution. A brief outline of each of the talks is available on the AIMS Unconference Wiki.

Thursday, 5 May 2011

Workshop on "Using FTK Imager and AccessData FTK to Capture and Process Born Digital Materials"

On April 22, I conducted a 2-hour workshop on "Using FTK Imager and AccessData FTK to Capture and Process Born Digital Materials.” The purpose of the workshop was to give staff a hands-on experience in using FTK Imager and AccessData FTK. Eight colleagues from the Stanford University Libraries attended the workshop – primarily from Special Collections and University Archives and the Humanities and Social Sciences Group.

The workshop covered the following:

FTK Imager – how to:
1. Download and install the software (free software - http://accessdata.com/support/adownloads).
2. Create a forensic image of an USB flash drive.
3. Create a logical image of the same flash drive.

AccessData FTK – how to:
1. Load an image – for this workshop we used a sampling from the Stephen Jay Gould papers.
2. View technical metadata generated by the software.
3. Arrange column settings to see specific file attribute (e.g. duplicate files).
4. Search for social security numbers using pattern search.
5. Test the full-text search function.
6. Flag files with sensitive information with "privileged" tag (such as those with social security numbers, etc.)
7. Use the bookmark feature for hierarchical information and apply it to groups of files (e.g. series, subseries, etc.)
8. Label groups of files with user defined labels (controlled vocabulary for computer storage media, document type suggested in the workshop, subject headings or access rights, etc.)
9. View files with specific bookmarks and labels.

Many incoming collections are hybrid collections – containing both analog and digital material. The digital component will become even greater as we move forward. Empowering all archivists to use a tool such as AccessData FTK to process the digital materials would be very useful.