Version Control
Managing and tracking research materials can prove a significant challenge during the course of research, especially in collaborative projects. Improving the ability to consistently track and retrieve each version of a file can lead to more efficient collaboration and increased accuracy of research results. This is most effectively accomplished through the use of version control systems that automate various portions of the storage and record keeping.
Selecting Systems that Work in Theory and Practice
Version control systems vary widely in capabilities and complexities, from automatic tools that sync sequential versions of your files with various cloud storage offerings, to tools like git (see also gitlab, github) that allow for active management of multiple branching and merging versions of the same file. Both the theoretical capabilities of the tool and how those capabilities are to be utilized in practice should be considered.
When selecting a version control system for your research, the primary consideration should be what can be most consistently implemented in your current environment. If your chosen tools are too challenging to use or are not available on all platforms where research materials need to be managed, individual researchers may begin use of their own tools and systems, resulting in the same kind of confusion that you hoped to avoid. Existing familiarity with a particular tool, availability of tools, and ease of configuring those tools on new devices are all important considerations when selecting a version control system.
Once you have identified what systems can be implemented consistently in your research environment, some additional considerations are worth including in your evaluation. In addition to general software selection considerations (price, size of the community using and supporting the tool, amount of community control, Free and Open Source (FOSS) licensing terms, etc), some specific questions to ask follow.
Questions to Ask
- How much complexity is functionally required by your particular task and workflows? For instance collaborative software development may benefit from more complex tools than are needed to track sequential versions of a data file during analysis.
- How many versions do you need to keep and for how long? Some tools may only allow you to keep a certain number of versions, to only keep them for a certain period of time, or may limit whether you can migrate those versions to other tools. Ensure that any limitations are in line with your research and archival needs or select tools without such limits.
- How much of the metadata and provenance tracking data specified in your Data Management Plan will this tool record as it is actually implemented in your workflows? If your team delegates all interactions with the version management system to a small number of team members, your version control system may end up with less complete or less accurate information than if everyone interacted with it directly. For instance, if you have a single data manager who enters all the files into the group version control tool, it may be unclear who authored particular changes and when, whereas if each researcher had entered their own changes directly, that information would have been automatically captured by the version control tool.
Managing Versions on OSF
OSF has built-in version control for all files stored in your project, can render hundreds of different file types, and allows you to directly edit plain text files (including R and Python scripts) directly in the browser. See the following guides for details of how to use OSF to store, update, and interact with multiple versions of files.