Evaluating Data Viability

Recap:

Once we’ve found some data that we think is relevant the next step is to evaluate the datasets: are they of sufficient quality to be reused and relevant to the research question? We have some good guidelines to use for evaluation relating to metadata quality, documentation completeness, indicators of use, and licensing. Prior to reading this help guide you should check out evaluating data for reuse. See Finding Relevant Data for step one.

Try This!

Evaluate Datasets: Start with Metadata Quality

We can first look at the metadata quality using some of the FAIR guidelines, which stands for findable, accessible, interoperable, and reusable. Applying those concepts to this record, some things we can see are:

This data is easily findable. In this case, it has a DOI making it easy to find in the future.
Accessibility refers to the ease of access, and in this case, we have the download of the full file readily available.
Interoperability refers to using standards, and at a glance, we can see that this set uses standard formats like CSV.
And finally, something like a license can indicate reusability.

These aren’t the only aspects that meet these criteria, but just some examples.

Documentation Quality

When judging whether data and documentation are complete we can look for things like logically structured directories, read-me files that have good descriptions, and meaningful file names. These are all good signs that we will have enough information to reuse the data. Let's look at an example dataset.

Good data documentation also tells the prospective user about the following:

Who collected the data, or who is the subject of it?
What type of data is it? In this case, software.
When was it created? How recently was it revised? What time period does it cover if it is time-based?
Where is the data from, in terms of the institution or repository which can help instill trust? It could also refer to geographies being covered or sources of origin.

Best Practice: Data Citations

Finally, a best practice is to record the citation of the data when it is gathered to make sure you have all the relevant details and don’t lose track. A good data citation includes:

Author
Title
Version
Publisher
Date
Persistent identifier (PID)

Want to Know More?

For more information and other OSF tips and tricks please see our support guides, or contact OSF Support for more information.

Return To Getting Started Home Page

This Article Is Licensed Under CCO For Maximum Reuse.