Showing posts with label digital forensics. Show all posts
Showing posts with label digital forensics. Show all posts

Wednesday, 4 April 2012

Forensic workstation pt 4

Earlier parts of this series had touched on identifying our needs and requirements from a workstation (see part 1), re-purposing an old PC into our first workstation (see part 2) and our early experiences of write-blockers and FTK Imager (see part 3)

Our experiments with our write-blockers have been limited, but each time we get them out of their boxes they seem a little less scary. The recent deposit of some born-digital audio and video material totalling over 200GB has thrown-up a number of new issues for us to consider.

The files came to us on an external hard drive formatted for a Mac that we needed to return to the depositor at which point we knew the files would be deleted - placing greater emphasis on the need to get the capture process correct as we wouldn't be able to return to the depositor and try again!

We were unable to browse the files in Windows Explorer, but were able to see the files using FTK Imager and our USB write-blocker. The sheer size of the files is something we are going to have to get used to with a 45 minute QuickTime film is 10.1GB and a 43 minute wav file is 671MB.

The workstation already has PaintShop Photo Pro for viewing and converting image files but only the standard viewers for audio and video content. So we started to look for open source software for viewing and converting the audio and film files, I wanted something that had a graphical and not a command line interface, as I was keen for other staff to develop skills and experience in handling this type of content.

As with our earlier use of tools like Karen's Directory Printer and DROID once we have become familiar with software we then document our use by creating a simple 'Idiots Guide' - this allows us to record both issues and solutions that we have encountered.

A bit of browsing and a few recommendations later and we have now installed Audacity v2 and WinFF but we will also take a look at others including Handbrake and FFmpeg before making a final decision.

We are keenly awaiting the forthcoming release of the DPC Technology Watch Report on 'Preserving Moving Pictures and Sound' and revisiting the FutureArch blog entries on media formats.

Thursday, 22 March 2012

UKAD Archives Discovery Forum 2012


A few reflections on the UKAD Archives Discovery Forum 2012 that I attended yesterday

The day started with a really interesting keynote piece from Bill Thompson (BBC) about his role in using the archives to forge partnerships with a range of organisations and he highlighted the range and diversity by talking about several projects including a project this summer with the Arts Council, the centenary of the First World War and an exciting initiative called Digital Public Space (diagram on collectionslink website)

Joy Palmer (MIMAS) gave a talk about the JISC Discovery programme and the ongoing work to demystify aspects like APIs, persistent URIs, user interfaces and measuring both impact and value.

Teresa Doherty (The Women's Library) then spoke about name authority records and how you can help make your collections more discoverable by adding links to the archives from relevant biographical pages on Wikipedia - something we started to do at Hull several years ago but this was a useful reminder to revisit this simple approach that can have a huge impact on the visibility of your collections.

After a great networking opportunity called lunch Lindsay Ould (Kings College London) talked us through the JISC funded FIDO project (Forensic Information in Digital Objects) and their experiences, highlighting a range of technical, skills-based and ethical issues and also their use of OS Forensics software.

I then gave a presentation about born-digital archives, but took a different approach - instead of focussing on the work we have undertaken at Hull I presented a very brief SWOT analysis to highlight many of the issues we have experienced.

There was then a series of short presentations including Sam Velumyl (The National Archives) who gave an overview of the TNA Finding Archives project , Teresa Dixon (West Yorkshire Archives Service) spoke about the History to Herstory website which features over 80,000 images including the Amy Johnson letters held at the Hull History Centre.

Kimberly Kowal (British Library) spoke about a crowd-sourcing map project which saw 725 maps geo-referenced in a week (see a blog entry about this project) and Alison Cullingford spoke about the Research Libraries UK Unique and Distinctive Collections project.

The final session was from Bill Stockting (British Library) about the completion of the Integrating Archives and Manuscripts System - bringing a vast number of legacy data sources and systems and over 1.5m records together and this now sits behind the http://searcharchives.bl.uk/ site.

Despite all of this there were a host of other sessions I would like to have attended including linked data, the National Archives new catalogue and the Old Maps online project.

Update 2nd April - slides from the sessions have been added to The National Archives website, see the Documenting Collections page

Monday, 19 March 2012

Archives and Society

Two weeks ago I spoke at the Archives and Society series of tasks held at the Institute of Historical Research about the progress and work at Hull as a result of the AIMS project. Whilst highlighting the AIMS White Paper the bulk of the talk was about the practical steps we had taken at Hull with born-digital archives; starting with a simple survey of collections and then followed by photography of media and creating a forensic workstation (a tale told in multiple parts see - part 1, part 2 and part 3).

I sought to encourage those present to download software like Karen's Directory Printer and DROID and to have a go - using a few test files will help increase your familiarity with many of the issues associated with digital preservation.

I managed to stop in time for questions - and these included aspects relating to the fact that the issues I raised were not "new" and whether we would still be making the same case in 5 years time (I hope not) and the need for automated tools to help us cope with the sheer volume of material (an obvious need) and the associated risk of releasing material that you haven't explicitly checked because of the sheer volume of files..

A PDF version of the slides is available - the talk was also recorded and I will add a link to the podcast when it is available.

Friday, 20 January 2012

AIMS White Paper now available


After a huge amount of effort the AIMS White Paper has finally been finished and is now available online.

The White Paper is intended as a framework for guide good practice in terms of archival tasks and objectives necessary for success. It builds upon the experiences of the four partner institutions - the universities of Hull, Stanford, Virginia and Yale - to process a range of collections with an array of format and media issues and using different software – we were keen to make this software agnostic and have gone back to the archival principles at the heart of the processes.

In many areas we found many similarities with existing practice with paper records and for some aspects we found there were multiple ways of achieving certain goals and we didn't want to be prescriptive in any way.  So instead it highlights key decision points and aspects of policy that may be determined at an institutional level and is intended to help people making the same journey that we have made – finding out about projects, tools & case studies and starting to build knowledge, skills & infrastructure.  

Although the publication of the White Paper officially marks the end of the AIMS project the institutions intend to continue collaborating and sharing their experiences on this blog.

We welcome your comments and feedback to the White Paper on this blog – whether you have implemented the framework or just found the guidance useful.

Thursday, 6 October 2011

Day of Digital Archives – some personal reflections

To mark the Day of Digital Archives I thought I would add a personal note about the “journey” I have made in the last two years. It was about this time in 2009 that it was announced that the AIMS Project was being funded by the Andrew W. Mellon Foundation and that I would be seconded from my post as Senior Archivist to that of Digital Archivist for the project.

At the time I had considerable experience of digitisation but very little about digital archives. So I began reading a few texts and following references and links to other sources of information until I had a pile of paper several inches high of things to read. At first there was a huge amount to take-in – new acronyms, especially from the frightening OAIS, and plenty of projects like the EU funded Planets initiative. It seemed that the learning would never stop – there was always another link to follow another article to read and it was really difficult to determine how much was making sense.

Talking to colleagues who were already active in this field also revealed how little digital media we actually had at the University of Hull Archives – just over two years ago we literally had a handful of digital media whilst others were already talking about terabytes of stuff. Fortunately the AIMS project sought to breakdown the workflow into four distinct sub-functions and placed emphasis on understanding the process compared to ‘traditional’ paper archives which reduced the sense of being overwhelmed by it all.

Since then I feel I have come along way – I have attended a large number of events and spoken at a fair few and quickly become both familiar and comfortable with the language. I do appreciate the time I have been able to dedicate solely to the issue of digital archives and that many colleagues are embracing this “challenge” without this luxury.

The biggest recommendation I can make is to start having a play with the software – many of the tools that we use at Hull University are free – Karen’s Directory Printer for creating a manifest of records including checksums that have been received; FTK Imager for disc images etc etc. Nor do you have to wait for digital archives or risk changing key metadata whilst you are experimenting – you can use any series of digital files or old media that are lurking in the way of a drawer. We have also created a forensic workstation and shared our experiences via this blog.

Once we had started to experiment, we created draft workflows and documentation and refined this as we experimented further – all tasks from photography of media to using write-blockers do become less daunting the more frequently you do them. Having learnt from many colleagues we have started to add content to the born-digital archives section of the History Centre website. I have also used some of my own email to play with the MUSE visualisation tool to understand how it might allow us to provide significantly enhanced access to this material in the future.

Although the project funding has now finished and I have returned to my “normal” job I do think that digital archives has now become part of my normal work and each depositor is now specifically asked about digital archives and in public tours of the building we explicitly mention the challenges and the opportunities of digital archives. We don’t have all of the answers yet – archiving e-mail in particular still scares me, but don’t feel as daunted as I did two years ago.

Tuesday, 30 August 2011

Forensic Workstation pt3

A guest posting from Nicola Herbert, Digital Project Preservation Assistant at Hull University Archives

Once we had the forensic workstation up and running (see part 1 and part 2 in this on-going series) we installed MS Office and Mozilla Thunderbird (for working with Outlook .pst files). We also installed FTK Imager, Karen’s Directory Printer, DROID and the MUSE e-mail visualisation tool (in beta, but provides a very interesting perspective on the data). We are also planning to purchase Quickview Plus, a piece of software that enables viewing a range of file formats without requiring the original software on your PC.

We had already played around with these tools on our normal PCs and had run them on files copied from digital media prior to setting up the workstation.

Having received our two Tableau write-blockers we were eager to combine the separate processes we had developed into an integrated workflow. We have two write-blockers, one for USB devices (T8-R2) and one for internal hard drives from PCs and laptops (T35es). Simon’s visit to Jeremy John at the British Library had whetted our appetite for getting our mini digital forensics lab in operation.

USB devices
After a thorough read-through of the instructions we tested out the USB write-blocker first. Setting it up is relatively simple; the vital thing is to make the connections between device and write-blocker, write-blocker and forensic PC before switching on power to the write-blocker. The forensic workstation recognises the USB device as normal, and off you go.

We then run FTK Imager to create a logical image of the device. We tested the various formats and settings available and eventually decided that creating true forensic images would raise too many trust issues with potential depositors with regard to us being able to restore deleted files. For this reason we will create ‘Folder contents only’ forensic images which recreate the device as it would appear in normal use. From here we are exploring our options for exporting the files from the disk image, but we have found that the exported files display an altered Accessed date – any comments/suggestions on this issue would be gratefully received.

We also create directory listings of the contents with MD5 and SHA-1 checksums. From the disk image and directory listing we can start to consider the arrangement for the collection, using Quickview Plus to preview file contents.

Our second write-blocker can be used with IDE and SATA hard drives...but more of this in part 4!

Monday, 22 August 2011

Forensic Workstation pt2

When we moved from the University campus to our new joint facilities with Hull City Archives and the Local Studies Library we took the opportunity to upgrade many of our PCs – leaving a few older specimens “just in case” anybody was so desperate that they were willing to accept a machine that was reluctant to start-up!

Recently the library has been re-organising its stock and space-utilisation ahead of a major refurbishment. Our old PC was discovered in the basement and ear-marked for disposal (well recycling really but disposal is less ambiguous). It was at this point, and with a new-found digital archives perspective, that I realised the potential of this machine to become our first digital forensics workstation. With an internal 3.5” floppy drive, CD drive and 2 USB ports this was a combination that seemed to promise possibilities for dealing with a range of media but also the chance to transfer the files once they had been extracted. The PC with slightly grubby keyboard and monitor were shipped to their new home at the History Centre.

I had by this time, started to identify requirements for a new PC to act as a workstation for the capture of hard-drives and other large volume of material. This request intrigued a colleague Tom in ICT and a visit was duly arranged, Tom was really interested in our work and offered to help. Tom took our PC and returned it a few days later - with a clean version of the Windows XP image installed aswell as an internal zip drive added.

Tom has also promised to put aside a couple of internal 3.5” floppy drives as an insurance policy for the drives failing as Jeremy Leighton John at the British Library had reported mixed results when using the external USB floppy drives. Having two workstations, one old and one new, will give us an option for dealing with some media formats; a USB drive for 3.5" floppy drives and an external 250MB zip drive. The latter was found when clearing-out an old cupboard and came with all cables and even its original installation CD proving that assembling a forensic workstation does not have to cost a fortune and I have heard several tales of kit assembled via ebay purchases.

Friday, 12 August 2011

Digital Forensics for Digital Archivists

I’ve been very fortunate here at UVa to have at my disposal some wonderful resources for getting up to speed with born-digital theory and practice. First and foremost, UVa is home to Rare Book School which has offered a course on Born Digital Materials for the past two years (and I’ve just learned will offer it again in 2012). I was able to take this course in July along with 11 fellow classmates from around the country. A week and a half later I was then off to the headquarters of Digital Intelligence, Inc. makers of our Forensic Recovery of Evidence Device (FRED) for Computer Forensics with FRED. This was a two day course covering basic digital forensic skills as well as the FRED system.

Mulder and Scully are concerned about the viability of this forensic evidence gathered next to UVa's FRED...

Given my great bounty, and my belief in professional karma, I’ve decided to give a brief overview of both of these classes here on the blog followed by my thoughts on a potential Digital Forensics for Archivists class/workshop that I’d really like to see developed, by myself or whomever! Two major classes out there that I have not taken are the DigCCurr Professional Institute and SAA’s electronic records workshop. Anyone with experiences in those classes, please add your comparisons in the comments.

RBS L95 — Born Digital Materials: Theory and Practice

Overall, I’d say this class has the perfect name: there’s an almost equal amount of theory and practice. That may sound like faint praise, but it’s really not. It’s something that too few workshops or classes get right. Instructors Naomi Nelson and Matt Kirschenbaum deserve much credit for a well constructed week that built practice on top of theory.

For someone new to the field of the born-digital it’s a great foundation. Concepts like metadata, preservation, “the cloud,” essential characteristics, physicality/materiality and digital humanities are combined with real-life examples from libraries, archives, and the university. This overview allowed us to attack the fundamental question of the class: what should we be trying to accomplish when we attempt to “save” (or steward, curate, safeguard, preserve, “archive”) born-digital materials.

On the practical side of things, digital forensics is covered and students get the opportunity to do a few lab exercises with emulators, floppy drives, and older models of equipment. The syllabus and reading list provide an excellent bibliography for further research.

It’s a relatively high-level class and therefore a great way to get started or a great way to get administrators thinking intelligently about the issues they need to face. I think that a more practitioner-focused and through digital forensics curriculum in the archives or cultural heritage setting could complement the course very nicely.

Computer Forensics with FRED training

University of Virginia decided to invest in the FRED technology last year and has not regretted it. While the FRED can do lots of neat things, I feel it is important to note that many or all of the same things can be done with other hardware and software, it just takes a bit more persistence. Similarly, despite the name a lot of this course dealt with basic data and file system concepts, as well as a little bit about some of the specific hardware most commonly found. In the future, DI is going to be splitting this up into two classes: Digital Forensic Essentials and Digital Forensics with FRED. The first part is a two day course and covers the hardware, data, and system stuff. The second is a one day class that covers the specifics of FRED. Although the first class will be more expensive than the current combined class is, it would be of more interest to those in the archival world.

As it is geared for law enforcement, a lot of time was spent on detected deleted, fraudulent, or hidden material. While all the cops in the room thought that this would be of no use to me, I disagreed. I need to know what I am collecting (whether inadvertent or not), whether it is authentic, and how to communicate with donors to decide how to deal with it. In addition, if we can get donors to agree to let us transfer backup or deleted versions of manuscripts, we’ll gain a wealth of information about how the final version evolved. Knowing that such recovery is possible is one of the more glamorous promises of digital forensics.

We also learned how to create and navigate disk images. While some of this stuff was fairly easy for me to pick up beforehand from Peter Chan’s tutorials, the extra practice and insight was very useful.

Digital Forensics for Archivists

Based on my experiences in these two classes, I would propose a Digital Forensics for Archivists workshop geared specifically for those interested in incorporating forensic techniques into the capture and processing of digital materials. The outline of topics I would expect to see on the syllabus below is probably a bit ambitious for a one-day workshop and would certainly have some hurdles to overcome related to provisioning hardware for all. However, these are the areas I’ve come to think of as necessary for an archive to be prepared for the variety of media that we will be collecting for the continuing future.

Digital Forensics for Archivists


  • Hardware basics

    • IDE, SCSI, SATA, USB, Firewire
    • Floppy drives
    • Optical disks
    • Hard drives
    • Internal basics (motherboard, pci, power, etc.)

  • Operating Systems

    • DOS
    • Windows
    • MAC OS
    • Linux

  • File system basics

    • FAT

    • NTFS

    • HPFS

  • Forensic vs. logical copying

    • What happens to deleted data

    • How it can be recovered

    • Why you need to know…

  • Write blocking

    • How to achieve it

  • Image files

    • Types

    • Software

    • Uses

  • Emulation and Migration

    • Cost/benefit of each

    • Possible use cases for each

So what do you think? Pipe dream? Useful? Impractical? Let me know in the comments…


Monday, 25 July 2011

Forensic workstation pt 1

A key part of dealing with born-digital archives is the ability to receive and process material without making changes to the underlying metadata including date created, date accessed etc – data that researchers will be looking to use and rely on. As archivists we place considerable emphasis on our roles as custodians and with digital material it is important that we treat the material carefully and appropriately. Fortunately there are tools that help us with the authenticity of born-digital files the most obvious of which is the checksum.

An important legacy of the AIMS project for us at Hull is working towards our ability to take born-digital material from depositors as a normal part of our work. A key component of this is a forensic workstation – by which I mean a PC (or two) through which material can be safely captured following a clear process, in-effect replicating the isolation room for receiving paper material. This will allow us to undertake a forensic examination – to check the material is what we expected or agreed to take, including the ability to generate a manifest of the material to send to the depositor, and that it does not include viruses etc.

There seem to be two main routes – to purchase FRED which stands for Forensic Recovery Evidence Device (other digital forensic workstation solutions are available). A second and more organic solution, and the one we intend to adopt at Hull, is to start with a new PC and to add appropriate hardware and software to this to provide the equivalent functionality. At the moment we are pondering a name for this with current suggestions including:
- Hal - Hull Archives Laboratory
- Harold – Hull Archives Recovery Of Legacy Data
- Hilary - Hull Investigator for Library and Archives RecoverY
- Dawn – Digital Archives WorkstatioN
but we are open to other suggestions until the machine is installed and formally named!

We don’t want to become a computer museum with an extensive range of hardware, software and operating systems environments for any possible eventuality. We do want a core ability to handle material we reasonable expect to receive – including material on 3.5” floppy disks, zip disks, hard drives etc. We intend to develop and extend our capacity as need dictates – if we get material in a format we will consider whether we need to support this ourselves or whether a suitable 3rd party is more appropriate.

Central to this is the need for write blockers which prevent you from writing or updating the files. Having read countless websites I felt I knew what they were supposed to do but had a nagging doubt that my knowledge was incomplete.

A tour of the British Library eMss Labs courtesy of Jeremy Leighton John (as featured on the BBC Radio 4 programme 'Tales from the Digital Archives' broadcast in May but still available online) confirmed the simplicity of theory and the fragility of the media – just having the hardware isn’t enough – you also need some luck that you have the correct drivers to read the specific version of the media. In the next few weeks I hope to place our order for the various bits and pieces and will update you on this exciting journey!!

Wednesday, 7 July 2010

Digital Lives Research Seminar

On Monday I attended the Digital Lives Research Seminar Authenticity, Forensics, Materiality, Virtuality and Emulation and the presentations will be appearing online soon via the Digital Lives pages

There was a packed programme of speakers with a huge array of experience, of direct relevance to the AIMS work were the following:

Helen Broderick, British Library described her work as Curator, Modern Literary Manuscripts including cataloguing the born-digital material in the Ronald Harwood archive. The paper part of the collection had already been listed by a colleague and Helen encouraged hybrid collections to be tackled as single entity and this is what I intend to do with the Stephen Gallagher material at Hull.

Helen described using QuickView Plus software to view and used two screens (one to display the digital file and the second to record descriptive notes). Other thorny issues to be tackled include email and how this could be made available to others without infringing Data Protection and other privacy concerns.

Seth Shaw, Duke University gave an account of the current work at Duke, openly admitting that work on arrangement and description was very sporadic! They are looking to standardise their policies, documentation etc with the search interface another element on his todo list! It was clear that practice was being shaped by their experiences echoing the underlying element of the best practice guidelines that AIMS will produce based upon our combined experiences.

It was good to see colleague Michael Olson, Stanford University who gave an account of the forensics work at Stanford including the approach adopted for the Stephen J Gould material and outlined the resources in the Forensics Lab.

Gabriela Redwine, Harry Ransom Centre (University of Texas) provided an update on the forthcoming Computer Forensics and Born-Digital Content in Cultural Heritage Collections (see http://mith.info/forensics/) due to be published later this year. It came as no surprise to those present that the biggest challenge the research had identified was legacy hardware and software; other challenges included trust and authenticity. This led to a discussion around some of the ethical issues surrounding born digital materials and that we should be looking to multiple sources of information to build-up a complete picture (metadata, creator and forensics).

Erika Farr & Naomi Nelson, Emory University gave a fascinating account of their work on the digital material in the Salman Rushdie archive and the multi-disciplinary approach to tackling this collection. After discussion and consideration they agreed to respect the hybrid nature of the material; to balance the needs of the researcher and the donor but also the desire to provide an authentic ‘experience’. They had originally distinguished between paper and born-digital material with separate agreements but quickly revised this to one based on content and NOT format. They discussed with the donor his relationship with the PC and how he used it whether he customised parts etc to understand this aspect better. They were even able to recover files from a laptop he had accidentally damaged. The use of an emulator does give a totally different perspective to the born digital material that simply allowing access to the content can ever do. Whether this approach is always possible or practical remains to be seen.

Our host Jeremy John, British Library described their approach and workstream including imaging the disk and creation of digital replicates – viewed via original software and emulator and then facsimile versions for user viewing. He encouraged using hash values generated by two systems as additional level of verification. The British Library policy was to disk image wherever possible and were actively using emulators using a virtual machine based on the original hardware OS.

I was able to give a quick introduction to the AIMS project and from the questions that followed some of our work regarding access and use is of particular interest to others.

Jeff Ubois highlighted the main issues that arose from the Personal Digital Archiving Conference 2010 earlier this year including complexity of media with the need to compare donor agreements, interface design, suitability of tools re Facebook etc identified for future consideration and action. He also spoke about the public/private boundaries and mentioned a Research Libraries Group project ‘Good Terms’ about engaging with public companies for digitisation programs.