How to Make a Data Dictionary
A data dictionary is critical to making your research more reproducible because it allows others to understand your data. The purpose of a data dictionary is to explain what all the variable names and values in your spreadsheet really mean.
- Variable names
- Readable variable name
- Measurement units
- Allowed values
- Definition of the variable
- Synonyms for the variable name (optional)
- Description of the variable (optional)
- Other resources
Variable names
The first column should contain your variable names exactly as they appear in your spreadsheet.
Readable variable name
This column should contain short but human-readable variable names
- For instance, if ‘VAR1’ is a variable name referring to weight, then an appropriate readable variable name for VAR1 is ‘weight’.
- You can use spaces, characters, and capital letters.
- This is the name that you would use to label graphs and other figures.
Measurement units
This column should contain the measurement units for the variable.
- For instance, if a column contains measurements of time, it should be clear whether they are measured in hours, minutes, or seconds.
Allowed values
A column should contain the range of values or accepted values for the variable.
- This helps identify data entry errors.
- Minimum and maximum values should be included.
- Chosen values (e.g., “maleâ€, “femaleâ€) should be included and detailed, if needed, in the description column (see below).
Definition of the variable
This column should contain a definition of the variable.
- The variable definition reflects the way you use the term and intend the term to be used by others who wish to understand your work.
- While there are many kinds of definition, where possible, please provide a definition with the following genus-differentia form:
“A is a B that Cs.â€
- For instance, “An a) attitude is a b) disposition c) to think or feel that is about something or someone, typically one that is reflected in a person's behavior.â€
- Avoid circular definitions (e.g. “A baseball is a ball used in baseball.â€)
Synonyms for the variable name (optional)
- This column should contain, if relevant, one or more words that could be substituted for the variable name.
- These synonyms should reflect the meaning of the variable name as you use it, and not merely as the variable name might be used in a different context.
- Again, the purpose is to convey the meaning of the variable term you use in your data.
Description of the variable (optional)
The final column should contain, where needed, a longer explanation of the variable.
- This is a human readable description with enough information for others to understand what the variable refers to.
- It should also explain terms in the variable’s definition in more depth if needed. For instance, a description of the variable might clarify what is intended by ‘disposition’ in the above definition.
- It could provide sources for definitions if those definitions are not the researcher’s own.
Other resources
- The Data Documentation Initiative (DDI) is an international standard for describing the data produced by surveys and other observational methods in the social, behavioral, economic, and health sciences. Learn more at: http://www.ddialliance.org/
- Example data dictionaries provided by the USGS: https://www.usgs.gov/products/data-and-tools/data-management/data-dictionaries