Friday, 8 April 2011

Data Management Planning?

Guest blogger: Andrew Sallans

Following on Tom's generous invitation to write a post for the AIMS partner blog, I am finally getting around to doing so. Tom and I have been holding monthly discussions about our respective projects since sometime last summer, and have talked in great length about the commonalities between what my group (the Scientific Data Consulting Group) is dealing with in regards to research data management versus what the AIMS group is dealing with in terms of born-digital archive material.

We have found that there are many areas of similarity, and that we face many of the same challenges, although we approach the problem quite differently and of course have entirely different terminology given our relative perspectives.

To get started, I have a pretty good understanding of the born-digital problem set, but have not been keeping detailed notes on the workflows and solutions that the AIMS group has identified as best practices throughout the life of this project. My intention for this post is to share the issues that we are dealing with in research data management and try to make some suggestions around areas where there may be overlap and opportunities for great information sharing and collaboration.

Starting this past January 18, 2011, the National Science Foundation (NSF) put into effect a new implementation of their pre-existing data management planning requirement. This revision now requires that researchers submit a 2 page data management plan (DMP) that specifies the steps they will take to share the data that underlies their published results. This DMP will undergo formal peer-review, require reporting in interim/final reports, and all future proposals. In effect, what one says must then be done, or else one runs the risk of losing future funding opportunities or worse, losing all funding for the institution from that particular agency. Although this requirement is focused on data sharing, it isn't possible for such an initiative to succeed without first addressing a mass of other data management issues, ranging from technical, to policy, to cultural. As we often point out these days, it is far easier to improve the process of data management up-front, in the operational process phase, than it is to begin thinking about how to share the data at the end of the project. I would expect that those attacking the born-digital archive problem can fully relate.

Here in the Scientific Data Consulting (SciDaC) Group in the UVA Library, we have been collecting and developing our local set of data management best practices for some time and have served as advisors to researchers in both the research data management and DMP development areas (they are of course interrelated, but sometimes have different levels of urgency). In doing so, we have developed what we call a "data interview/assessment" (based in large part upon the visionary work of others, Purdue's Data Curation Profile and work from the UK's Digital Curation Centre, to name a few), which is a series of questions that address many different areas of data management, including context, technical specifications (formats, file types, sizes, software, etc.), policies, opinions, and needs. We meet with researchers to have a conversation, educate them on emerging trends and regulations in data sharing, and listen to their concerns and challenges. In the end, we try to make recommendations on how they can improve their data management processes, and then we offer to connect them with people who can help with the specific details (if it isn't us). For the DMPs, we have a series of templates that are specifically configured for the respective program requirements. Again in this case, we do some education, then offer some feedback and advising on what qualifies as good data management decisions for a particular community. Behind all of these efforts, we know we don't know all the answers, but we do know most of the questions to ask and who we need to pull together to figure out the solutions. That's our basic operating principle.

So, sound a bit familiar? Based on conversations with Tom, and reading some of the posts in the AIMS blog myself, it sounds like we are up against some very similar challenges in regards to the front-end of the issue, around education, conducting inventories and assessments, and figuring out how to manage processes before it comes down to managing the information itself and providing access to it for others. Appraisal and selection is incredibly important to us, but is usually driven more by the type of data. As an example, reproducible data generated by a big machine might not be important to keep, but the instructions and context in which it is generated would be invaluable. On the other hand, data from natural observations (ie. like climate data) would be critical to save. These considerations are not always apparent to the researcher, as they often think within the context of their work, rather than others. I would expect that the back-end is even more similar, as we are all ultimately dealing with bits and bytes, formats, standards, and figuring out how to decide what to keep and how to do it.

Lastly, for now, I also would like to mention that I had the opportunity to attend the annual Duke-Dartmouth Advisory Council meeting at the Fall CNI Forum several months ago.

As you'll read, this project aims to bring together stakeholders from all areas of digital information across the institution, to talk about and plan in a collaborative and strategic way. They aim to tackle the challenges of management, technology, policy, and hardest, culture. I was incredibly impressed by the vision of this undertaking, and hope that we can continue to refine our efforts at developing a collaborative digital information management strategy as well. In practical terms, we all need to try and be attentive to how our effort plugs-in with others around the institution. The issue of digital information management is undoubtedly a very big one, and requires coordination and collaboration across many experts in order to appropriately treat the various bits that we encounter. Doing so will hopefully also provide us with the ability to bring best practices from one challenge to another.


Andrew is currently the Head of Strategic Data Initiatives and the Scientific Data Consulting Group at the UVA Library.

Contact info: Andrew Sallans, Email:, Twitter: asallans

Friday, 1 April 2011

Digital Collaboration Colloquium

On Tuesday I attended the Digital Collaboration Colloquium event in Sheffield organised to mark the end of the White Rose Libraries LIFE-SHARE Project.

The day included a number of talks about how institutions can collaborate including an interesting account of the Wales Higher Education Libraries Forum (WHELF) and experiences from the Victoria & Albert Museum. Although the majority of examples focussed on digitisation the principles and lessons learnt were all equally appropriate to a born-digital context.

As part of the day I presented a Pecha Kucha session on the AIMS project and some of the digital collaboration tools that we have found to be effective including Skype and GoogleDocs. In you are not familiar with this format it involves a presentation of 20 slides, changing automatically every 20 seconds and despite cutting the content quite heavily I still found myself chasing to keep-up with the changes. Other sessions looked at digitisation in-situ in a public setting – bringing behind the scenes in-front of the curtain, and other sessions on the Knitting patterns project at Southampton, the Addressing History project based at EDINA and the Yorkshire Playbills project.

The afternoon included a presentation form our hosts on the LIFE-SHARE project and their experiences of the collaboration continuum and a roundtable session that led to a good discussion between panel and audience. With alot covered in a relaxed and friendly atmosphere there was plenty of networking and I’m sure everybody took something from the day.

The presentations are available via slideshare