How to Make a Data Dictionary

A data dictionary is critical to making your research more reproducible because it allows others to understand your data. The purpose of a data dictionary is to explain what all the variable names and values in your spreadsheet really mean.

Example of a Data Dictionary

Variable names

The first column should contain your variable names exactly as they appear in your spreadsheet.

Readable variable name

This column should contain short but human-readable variable names

  • For instance, if ‘VAR1’ is a variable name referring to weight, then an appropriate readable variable name for VAR1 is ‘weight’.
  • You can use spaces, characters, and capital letters.
  • This is the name that you would use to label graphs and other figures.

Measurement units

This column should contain the measurement units for the variable.

  • For instance, if a column contains measurements of time, it should be clear whether they are measured in hours, minutes, or seconds.

Allowed values

A column should contain the range of values or accepted values for the variable.

  • This helps identify data entry errors.
  • Minimum and maximum values should be included.
  • Chosen values (e.g., “male”, “female”) should be included and detailed, if needed, in the description column (see below).

Definition of the variable

This column should contain a definition of the variable.

  • The variable definition reflects the way you use the term and intend the term to be used by others who wish to understand your work.
  • While there are many kinds of definition, where possible, please provide a definition with the following genus-differentia form:

“A is a B that Cs.”

  • For instance, “An a) attitude is a b) disposition c) to think or feel that is about something or someone, typically one that is reflected in a person's behavior.”
  • Avoid circular definitions (e.g. “A baseball is a ball used in baseball.”)

Synonyms for the variable name (optional)

  • This column should contain, if relevant, one or more words that could be substituted for the variable name.
  • These synonyms should reflect the meaning of the variable name as you use it, and not merely as the variable name might be used in a different context.
  • Again, the purpose is to convey the meaning of the variable term you use in your data.

Description of the variable (optional)

The final column should contain, where needed, a longer explanation of the variable.

  • This is a human readable description with enough information for others to understand what the variable refers to.
  • It should also explain terms in the variable’s definition in more depth if needed. For instance, a description of the variable might clarify what is intended by ‘disposition’ in the above definition.
  • It could provide sources for definitions if those definitions are not the researcher’s own.

Other resources

cc-zero.png   This article is licensed under CC0 for maximum reuse. 

Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.