Data management is exciting!

Trust me.

No, this is a reflection of the level of enthusiasm we were asked to have as part of our data management subject at Uni this semester. Our first assignment was to write a press release explaining research data management to the general public in a way that wouldn’t send them straight to sleep. I chose to take a narrative approach and promised that if I got a good mark, I’d reproduce it here. If you work in data management, skip the next bit – but if you’re not in an academic or research library and you’re curious about what we are all talking about with data, you might like this.

Sydney is playing host this weekend to social science researchers from around the world as the inaugural Social Science Research Futures conference gets underway.

“Managing research data output will be a focus of the papers presented”, conference organiser Clare McKenzie said today. “Imagine the impact on your life if you lost your laptop with all the contacts, photos and other personal information in it. Now imagine you are a researcher on a project that has interviewed 500 homeless about their situation and that the laptop was storing all the responses to the questions.”

While such loss of data can be catastrophic to a project, managing research data is not just all about avoiding disaster. As many research projects are funded with public money, there has been a push in recent years to make the results of that research publicly available at the end of the project.

What exactly are research data? Broadly, they are the factual information collected and recorded during a research project in order to prove or disprove the original research question (Carlson 2011). The Australian public’s responses to the Australian Bureau of Statistics (ABS) census are data, as are the daily air temperature recordings a high school science student collects as part of a school project. The data are rarely meaningful without analysis, so the ABS puts the data together in combinations to look for trends and the high school student may graph the daily temperature to compare against the average for the time of year in order to draw conclusions.  All of this is research data.

Making arrangements for back up and proper storage of research data is just one aspect of data management and is part of what’s known as data management planning.  Jane Smith, a senior social sciences researcher at City University has developed a data management plan at the beginning of her last two research projects and likens it to the idea of business planning. “You don’t normally plan for your business to fail, but you can fail to plan for your business” she says. “Research projects are the same. If you don’t plan for the fact that someone may wish to access your data in twenty years when the technology is different and the original research team long dispersed, then all your hard work during the project can’t be shared or expanded.”

Researchers need to think about planning for storage, rights of use by others, naming the data in such a way that others can find it, putting details of the data in a repository where it can be found, as well as the possibility that files created today may become an obsolete format in the future (ICPSR 2012).  These details are known as metadata – literally “data about data” – and are a way of attaching useful information to an object such as a dataset.

When it comes to data management planning, it doesn’t matter whether the research is social sciences or the ‘hard sciences’. Both McKenzie and Smith advise that time spent creating a data management plan (DMP) at the start of a research project can save a lot of time further down the track, particularly if the project is large and collaborative with many individual researchers. Establishing file formats and file naming conventions such as the complex file naming system the ABS use (Australian Bureau of Statistics 2009) ensures consistency and accuracy of records no matter who is working on the project at the time. Smaller projects need not go to this level of complexity, but writing it all down in a DMP can help ensure these details are not forgotten or lost. In fact, some research funding bodies have made preparation and submission of a DMP a condition of applying for a grant (Van den Eynden et al 2011).

Sharing and re-use of data becomes easier if that data has been managed properly. Making data accessible to others or allowing re-use and re-purposing of that data later on for another project is part of making research more collaborative and reduces the chance that money will be wasted on ‘re-inventing the wheel’ (Van den Eynden et al 2011). It also may help establish trends, such as comparing the interviews with the homeless (from the lost laptop scenario above) to information collected again in five years time.

Smith comments that for one of her recent projects she was able to search Research Data Australia (RDA http://researchdata.ands.org.au/), an online catalogue of research datasets, to find details of a project from a number of years ago that had data relevant to her project. Through contact details in the RDA listing, Smith, in her words “got access to the most wonderful population data from five years ago that I was able to re-use in the context of my current research project”.

Like preparing a DMP, research funding bodies in Australia and overseas are beginning to make continuing access to research data a condition of the funding.

The future of publicly funded research in Australia is going to depend on good planning.

I enjoyed the subject, it was serendipitous timing with my secondment to Library Repository Services and like all my uni subjects, I’m now glad it’s over.

Data management, open access and more

Drinks & Data by Andrew Turner via Flickr CC

I seem to remember promising some sort of #newjob update. As I’ve now been there nearly 2 months (time flies!) it’s a good time to stop and think about what I’ve learned and what I’m doing.

I’m reading and reading and reading about research data management, funding body requirements, data management planning and data citation at the moment. It’s more interesting than I’ve just made that sound – but it is a fairly dry subject to write about! My first uni assignment this semester is to write a press release for the general public on the importance of research data management. Really? I’ve had to have the jargon-buster out on that one trust me.

I ended up falling back on the good old narrative, story, analogy, what have you. How do you begin to describe the need for data management to people outside either the narrow data librarian world or the (some would say equally narrow) research world? By likening it to losing your mobile phone or laptop with all your photos, contacts and documents you were working on. A hook? Perhaps. Time (and whoever does my marking) will tell. If I get a good mark, I’ll share the press release here 🙂

I do get to practice some of this in the real world soon. As part of my secondment to the library repository services team, I’m taking joint responsibility with a colleague for putting together some research data management information sessions for our academic services librarians (which is where I’m from). It’s an interesting juxtaposition – on the one hand I still feel like an academic services librarian, but I’m also starting to get my head around this data management stuff in a way that I hadn’t been able to before my secondment. The test will be if I can get a workshop written that convinces my former colleagues!

At the same time, I’m (slowly) getting a repository project underway. One of our faculties is about to acquire a collection of films and we are negotiating to store it in an open access repository. Enter a whole technical world full of phrases like harvesting, data streams, web-interface, deposit tool and wireframes. Yep.

Then there’s copyright. Particularly in relation to the upcoming film collection, I’ve spent weeks trying to get my head around copyright and licensing issues. Copyright in film is particularly complex – of course it is. I’m about to start adapting our legal-office-approved rights agreements that relate to theses and other written research outputs to suit film.

While most days I still feel like I’m going around in circles it is starting to make sense and writing it all down here has further crystallised some things for me. Proving that I need to blog more.

Data is the new black

Black-Eyed Susan 227 from cygnus921 via flickr CC

If you work in academic libraries sooner or later you are going to come across the issue of research data management. Increasingly, we are also working in an e-research space where everything from finding journal articles for a literature review through to making a copy of the finished work available in an institutional repository happens in an online space.

My previous posts on digital humanities send out a call for libraries to be more involved in this process and to come to the table as partners and collaborators with researchers.  This is an area of librarianship I didn’t know existed before starting at MPOW 15 months ago and it has caught my interest in a big way.

I did say much of this #blogjune from me would be about data. Now I can reveal that I have a new job for the next 12 months and am going to be working in our library’s repository services team, talking about research data management all day long. It starts next week – stay tuned!