Documenting data: Readme.txt

Sharing a dataset is nice, but to make it truly open you must make sure it can be interpreted and used in a meaningful way. This means your data should always include documentation that explains everything a third party should know, and a Readme file is perhaps the easiest sort of documentation you can create.

When should you start?

Documentation should start before you even collect any data. A simple document can describe your plans, which will then help you organise and document your data in a more usable way. It will eventually describe the context, structure and contents of the whole dataset.

Once your data is collected, you can include this document at the root of your dataset and give it a filename that makes it stand out, such as “readme.txt”. It should contain any relevant information from study-level (about yourself and your project) down to the description of individual files or variables.

What is in a readme file?

A Readme file is usually a simple text document (in .txt or any other durable storage format), which should contain all the basic information about your dataset:

general information (title and creators of the dataset)
what the dataset contains and how it is organised
what can be done with it, and what it was used for
methodology, collection, and other useful information
how the data was processed, with which software
a description of codes, symbols, abbreviations or variables (codebook)
etc.

You do not need any technical knowledge to create that kind of documentation. Just be organised and describe what anyone should know about your dataset to be able to use it appropriately.

A more detailed description can be found in the dedicated section of our research data management guide.

Tell me more about codebooks and metadata!

If you are collecting quantitative data, you might need to document it using a codebook. This can also be included within your readme file, as mentioned in the bullet list above… but we will cover quantitative data further in a later post.

For qualitative media metadata (for example picture collections), you might be interested in reading our previous post on the subject.

Illustration: Papers, by Jerzy Gorecki (Pixabay license, cropped)