Creating a data management plan (DMP)
A data management plan is a living, written document explaining what you intend to do with your data during and following the conclusion of your research project. A data management plan is required by many funders. Even if it is not required a data management plan can save you time and effort during your research as it forces you to organize your data, prepare it for the next step in its lifecycle, and clarify who will have access to it, how, and when.
If you plan on sharing your data, a data management plan can help you troubleshoot the issues you should address to make sharing simple. Finally, a data management plan helps ensure that your data remains useable to both you, your collaborators, and other researchers beyond the end of your project.
Which datasets will be generated in your research?
Describe the source of each dataset.
- Describe how each dataset will be generated by your research
Categorize each dataset
Describe the type of dataset generated
Describe the form of each dataset
- Text (eg, field or laboratory notes, survey responses)
- Numeric (eg, tables, counts, measurements)
- Audiovisual (eg, images, sound recordings, video)
- Models, computer code
- Discipline-specific (eg, FITS in astronomy, CIF in chemistry)
- Instrument-specific (eg, equipment outputs)
Categorize the stability of each dataset
- Fixed dataset (never change after being collected or generated)
- Growing dataset (new data may be added, but the old data is never changed or deleted)
- Revisable dataset (new data may be added, and old data may be changed or deleted)
How will your datasets be named and referenced?
Create unique names for each dataset.
Assign persistent unique identifiers for each dataset.
Choose unique identifier schemes that are
- Globally unique across the internet
- Persistent for at least the life of your data but ideally beyond
Common examples include
DOI (Digital Object Identifier)
- DOIs became clickable when embedded in a URL.
There are annual fees for each DOI.
- DOIs generated on the OSF are free for users.
- DOI (Digital Object Identifier)
Which file formats will be used for each dataset?
Describe the file formats used for each dataset.
Choose file formats that are likely to be accessible.
File formats that are likely to be accessible are those that are
- For spreadsheets: Comma Separated Values (.csv)
- For text: plain text (.txt), or if formatting is needed, PDF/A (.pdf)
- For presentations: PDF/A (.pdf)
- For images: TIFF (.tif, .tiff), or PNG (.png)
- For videos: MPEG-4 (.mp4)
- Open, with documented standards
- In common usage by the research community
- Using standard character encodings (i.e., ASCII, UTF-8)
- Uncompressed (space permitting)
What data standards and metadata standards will each dataset follow?
Use existing standards of the discipline when possible
When there are no standards, describe the metadata you will create.
Who will have access to your datasets? How and when will you share your datasets, if applicable?
Describe who will have access to each dataset at each stage of the research
- How will access be managed and by whom?
- Will there be embargo periods?
Will you encrypt your datasets?
- Who will be able to de-encrypt the datasets, when, and how?
Describe how you will share each dataset, if applicable
How will your datasets be shared?
- How will the datasets be altered or transformed to prepare them for sharing (eg, de-identification, aggregation)
- What are the technical mechanisms for dissemination?
What software and hardware is needed to reuse your datasets?
- How do you plan for hardware and software obsolescence?
- What restrictions will you have on data sharing (eg, confidentiality, privacy)?
- If you cannot share your dataset, reasons should be provided to justify your decision
How will you archive and preserve your datasets?
Describe how you will preserve your datasets
How long you will preserve your data
- How long are you required to preserve your data (eg, funder policy)?
- How long will your dataset be useful?
- Associated costs of preservation and how they will be covered
- What repository will you use?
Describe how you will back-up your datasets
Describe how you will secure sensitive datasets
- How will you prevent unauthorized access online (eg, encryption)?
- How will you prevent unauthorized access on site (eg, locked rooms, locked drawers?
- How will you prevent unauthorized computer access (eg, passwords)?
How will you license your datasets?
Describe how each dataset will be licensed
- Data is not copyrightable. However, a presentation of data (such as a chart or table) may be.
- Data can be licensed. Some data providers apply licenses that limit how the data can be used to protect the privacy of study participants or to guide downstream uses of the data (e.g., requiring attribution or forbidding for-profit use)
If you want your data shared, use Creative Commons CC0 Declaration.
- See the lesson on how to licence your research for more information on why you should consider using CC0.
How will you deal with privacy or confidentiality, if applicable?
Describe whether each dataset contains direct or indirect identifiers
Describe how your plan is compliant with HIPAA
Describe whether consent to share the data will be gathered during the informed consent process
Describe how shared data will be anonymized, if applicable.
DMPTool is a useful tool for genereating a data management plan. An example data management plan based on the DMPTool's NIH-Generic template, using the OSF as an example repository, may be found here.
Numerous institutional libraries, such as the University of Notre Dame's, provide detailed data management documentation on their websites.
For researchers outside of the United States, Jisc provides relevant advice guides.