Following on Tom's generous invitation to write a post for the AIMS partner blog, I am finally getting around to doing so. Tom and I have been holding monthly discussions about our respective projects since sometime last summer, and have talked in great length about the commonalities between what my group (the Scientific Data Consulting Group) is dealing with in regards to research data management versus what the AIMS group is dealing with in terms of born-digital archive material.
We have found that there are many areas of similarity, and that we face many of the same challenges, although we approach the problem quite differently and of course have entirely different terminology given our relative perspectives.
To get started, I have a pretty good understanding of the born-digital problem set, but have not been keeping detailed notes on the workflows and solutions that the AIMS group has identified as best practices throughout the life of this project. My intention for this post is to share the issues that we are dealing with in research data management and try to make some suggestions around areas where there may be overlap and opportunities for great information sharing and collaboration.
Starting this past January 18, 2011, the National Science Foundation (NSF) put into effect a new implementation of their pre-existing data management planning requirement. This revision now requires that researchers submit a 2 page data management plan (DMP) that specifies the steps they will take to share the data that underlies their published results. This DMP will undergo formal peer-review, require reporting in interim/final reports, and all future proposals. In effect, what one says must then be done, or else one runs the risk of losing future funding opportunities or worse, losing all funding for the institution from that particular agency. Although this requirement is focused on data sharing, it isn't possible for such an initiative to succeed without first addressing a mass of other data management issues, ranging from technical, to policy, to cultural. As we often point out these days, it is far easier to improve the process of data management up-front, in the operational process phase, than it is to begin thinking about how to share the data at the end of the project. I would expect that those attacking the born-digital archive problem can fully relate.
Here in the Scientific Data Consulting (SciDaC) Group in the UVA Library, we have been collecting and developing our local set of data management best practices for some time and have served as advisors to researchers in both the research data management and DMP development areas (they are of course interrelated, but sometimes have different levels of urgency). In doing so, we have developed what we call a "data interview/assessment" (based in large part upon the visionary work of others, Purdue's Data Curation Profile and work from the UK's Digital Curation Centre, to name a few), which is a series of questions that address many different areas of data management, including context, technical specifications (formats, file types, sizes, software, etc.), policies, opinions, and needs. We meet with researchers to have a conversation, educate them on emerging trends and regulations in data sharing, and listen to their concerns and challenges. In the end, we try to make recommendations on how they can improve their data management processes, and then we offer to connect them with people who can help with the specific details (if it isn't us). For the DMPs, we have a series of templates that are specifically configured for the respective program requirements. Again in this case, we do some education, then offer some feedback and advising on what qualifies as good data management decisions for a particular community. Behind all of these efforts, we know we don't know all the answers, but we do know most of the questions to ask and who we need to pull together to figure out the solutions. That's our basic operating principle.
So, sound a bit familiar? Based on conversations with Tom, and reading some of the posts in the AIMS blog myself, it sounds like we are up against some very similar challenges in regards to the front-end of the issue, around education, conducting inventories and assessments, and figuring out how to manage processes before it comes down to managing the information itself and providing access to it for others. Appraisal and selection is incredibly important to us, but is usually driven more by the type of data. As an example, reproducible data generated by a big machine might not be important to keep, but the instructions and context in which it is generated would be invaluable. On the other hand, data from natural observations (ie. like climate data) would be critical to save. These considerations are not always apparent to the researcher, as they often think within the context of their work, rather than others. I would expect that the back-end is even more similar, as we are all ultimately dealing with bits and bytes, formats, standards, and figuring out how to decide what to keep and how to do it.
Lastly, for now, I also would like to mention that I had the opportunity to attend the annual Duke-Dartmouth Advisory Council meeting at the Fall CNI Forum several months ago.
As you'll read, this project aims to bring together stakeholders from all areas of digital information across the institution, to talk about and plan in a collaborative and strategic way. They aim to tackle the challenges of management, technology, policy, and hardest, culture. I was incredibly impressed by the vision of this undertaking, and hope that we can continue to refine our efforts at developing a collaborative digital information management strategy as well. In practical terms, we all need to try and be attentive to how our effort plugs-in with others around the institution. The issue of digital information management is undoubtedly a very big one, and requires coordination and collaboration across many experts in order to appropriately treat the various bits that we encounter. Doing so will hopefully also provide us with the ability to bring best practices from one challenge to another.
Andrew is currently the Head of Strategic Data Initiatives and the Scientific Data Consulting Group at the UVA Library.Contact info: Andrew Sallans, Email: firstname.lastname@example.org, Twitter: asallans