Funding agencies now expect researchers to manage and share their research data following international standards and good practice from their field. But do you know what “research data” actually means? Before we get into the how, let’s focus on the “what”.
The definition of research data adopted by the Swiss National Science Foundation (SNSF) is as follows:
“Research data are the evidence that underpins the answer to the research question, and can be used to validate findings regardless of its form (e.g. print, digital, or physical).”
Concordat on Open Research Data, published on 28 July 2016
You might notice the form of the data matters less than its destination. If something – anything – is a base for research findings, it becomes research data. Of course, saying financial data and statistics are data is easy, but in the social sciences, data can be both qualitative and quantitative.
A historian or an anthropologist might, for example, base their research on texts, photographs, films, sounds, or interviews – all this is research data. Surveys are data too – both the questions and the answers. Protocols, lab notes, field notes, transcriptions, codebooks all fit the definition and your funding organisation expects them to be stored, documented, and shared appropriately.
What about non-digital data?
Objects, physical samples and artefacts are generally excluded for practical reasons, but their description and recorded characteristics are considered data within that context. Your handwritten lab notes or sketchbooks can, and often should, be digitised for sharing.
What about software and code?
Did the researcher write code or prepare software to answer their research question? If so, the code qualifies as research data. Software that wasn’t developed for the project is only considered a tool – Stata and R do not qualify as research data more than a recorder for an anthropologist, but a custom package should probably be conserved and shared.
What about data I got from someone else?
The question of secondary data usage and copyright is a long and different matter, which will be covered here soon. Secondary data is research data, but licencing and copyright mean they will need specific treatment, and might not be shared depending on their origin. You might even have transformed data so much to fit your needs that it becomes original – really, each case will be different.
What about things that are just related to my research?
Just because something happened during the research process doesn’t mean outputs were based on it. Early musings, drafts, peer reviews, plans for future research or e-mails to colleagues are not research data: they are not necessary to support your findings.
What about ethics, privacy, security, or confidentiality requirements?
Don’t worry – the SNSF doesn’t expect you to share all and any data used in your research. Funding institutions simply want you to manage and conserve – or sometimes anonymise or destroy – it following best practices, including those intended to protect research subjects from any inconvenience. We’ll talk about that some more in a future article.
What if I just have NO data?
In most cases, researchers do use or produce data. There are exceptions in specific disciplines, though: many researchers in international law, for example, base their research on legal texts and theory rather than empirical research, and there’s not much else than their bibliography to qualify as research data.
Don’t worry – this doesn’t mean a project will not be accepted. It just means that a data management plan will not be necessary for a grant application. But what is a DMP, you ask? Again, something we’ll cover in the near future.
This is the first in a series of posts about research data at the Institute. Let us know what we should talk about next – contact Guillaume Pasquier or Ask a librarian!
Original illustration (cropped): Research data management, by Janneke Staaks, released under creative commons licence CC BY-NC 2.0.