Finding and Reusing Data on a Generalist Repository

Finding relevant data

The OSF is a generalist repository, this means it is not limited to a single discipline. Other repositories may be more focused on disciplines, types of resources, or data created by specific institutions (for an exploration of different types of repositories, see this article from PLOS ONE). Generalist repositories can also accommodate data that may not have an existing disciplinary repository. Because the data can come from many fields, the common fields used to describe and categorize data can be fairly broad Once you are familiar with the repository though, there can be ways to hone in on metadata related to specific topics.

Try This!

Citation chaining 

The first strategy is to use citation chaining, which is the process of mining the citations in relevant literature to find more sources. While many of these will be to other published articles, you can also find citations to datasets. When checking citations, look specifically for PID references like a DOI, a good sign that the data will still be available somewhere since these indicate a commitment to persistent access. In OSF, anyone publishing a preprint can also post and link to their data in an OSF project. These links appear as Supplemental Resources.

Previous Reuse

Another method that could be helpful in finding reusable datasets is to look specifically at studies that successfully reproduce results…they are by default using reproducible data.

In OSF Registries you can search specifically for replication studies by limiting results to the “Replication Recipe” types.

The “Replication Recipe” registration type is a standardized template for describing studies that are intended to reproduce original results. There are two types in OSF: one for pre-registration (before the replication study is done) and one for post-completion.

In each of these cases, the fact that the study is being reproduced provides a good clue that this might be usable data (since it has already been reused once).

Targeted searching

When moving on to search directly for datasets, it’s good to understand the metadata structure of the repository. In OSF, metadata is a mix of “free-text” descriptive fields in which anything can be put (title, description, tags), and more controlled fields that use a list of specific terms like resource types and disciplines. Using the controlled terms from those lists in your searches may help you find more relevant data. 

A way to further pinpoint materials in OSF is to use filters when searching (we have them for OSF content type, tags, and licenses, but will be adding more in the future) You can also to look for key indicators like the badges on registries.

Finally, as a general tip or best practice, try to document your search strategy. Keep a record of the terms used, the filters, and other refinements, as well as the dates and repositories searched. This will help you to avoid repetition in one repository while helping you replicate the same strategies in others.

This Article Is Licensed Under CCO For Maximum Reuse.

