Evaluating Data Viability


Once we’ve found some data that we think is relevant the next step is to evaluate the datasets: are they of sufficient quality to be reused and relevant to the research question? We have some good guidelines to use for evaluation relating to metadata quality, documentation completeness, indicators of use, and licensing. Prior to reading this help guide you should check out evaluating data for reuse. See Finding Relevant Data for step one.

Try This!

Evaluate Datasets: Start with Metadata Quality

We can first look at the metadata quality using some of the FAIR guidelines, which stands for findable, accessible, interoperable, and reusable. Applying those concepts to this record, some things we can see are: 

  • This data is easily findable. In this case, it has a DOI making it easy to find in the future.
  • Accessibility refers to the ease of access, and in this case, we have the download of the full file readily available.
  • Interoperability refers to using standards, and at a glance, we can see that this set uses standard formats like CSV.
  • And finally, something like a license can indicate reusability.

These aren’t the only aspects that meet these criteria, but just some examples.

Documentation Quality

When judging whether data and documentation are complete we can look for things like logically structured directories, read-me files that have good descriptions, and meaningful file names. These are all good signs that we will have enough information to reuse the data. Let's look at an example dataset.