Research Guides: Open Data: Tools to Support Open Data Practices

Workflow Management

Workflow based systems provide explicit representation of the structure of the experiments, automation of repetitive tasks and computations, and transparently capture provenance information.

Provenance in workflow systems include:

Prospective provenance - description of the experiment workflow structure such as modules, connections, and inputs.

Retrospective provenance – information on the execution of workflow and what happened when it was run.

Workflow evolution –history and versions of workflow (especially when data is iteratively refined)

To capture research provenance, 3 classes of tools need to be install: a shell or terminal program for access to command line, a plain text editor or a development environment and software allowing the user to write and execute code in a chosen programming language

Examples of Workflow Management tools include:

Registration and Registered Reports

When you preregister your research, you're simply specifying your research plan in advance of your study and submitting it to a registry. In clinical research in particular, the preregistration of a study is mandatory.

Preregistration separates hypothesis-generating (exploratory) from hypothesis-testing (confirmatory) research. Both are important. But the same data cannot be used to generate and test a hypothesis, which can happen unintentionally and reduce the credibility of your results. Addressing this problem through planning improves the quality and transparency of your research. This helps you clearly report your study and helps others who may wish to build on it.

For more information on registration. Please check the following links:

The Preregistration Revolution

Open Science Framework
by Lora Lennertz Last Updated Feb 6, 2024 405 views this year

Data processing and Analysis Best Practices

Data Cleanup and Processing are key components of replicability and reproducibility.

Here are some best practices:

Document all operations fully and automate as much as possible
Each step taken in the process should be recorded in detail fine enough for replication of cleaning strategies.
Encode the instructions for data processing as computer code that will read the raw data. If done manually, the file should be accompanied by a very detailed human readable description which is saved in a separate text file.
Design a workflow as a sequence of small steps that are glued together with intermediate outputs from one step feeding into the next step as inputs
Comment your code
Version control your code
Use free and open tools.

Data File Management
by Lora Lennertz Last Updated May 24, 2023 341 views this year

Acquiring Data and Electronic Lab Notebooks

Best practices for Acquisition of data include:

Create a spreadsheet and save work to a text file (csv,)
Clearly name your working files
Create and save a metadata file to document the source of the data and any information about it (create a data dictionary, README.txt,) which is saved in a simple text format,
Use an appropriate directory file structure.

Electronic Lab Notebooks are frequently used to encourage reproducible data acquisition practices.

Lab Notebooks Go Digital - Nature,v. 560 9 Aug 2018
Article describing the steps in selecting a Electronic Lab Notebook.
Electronic Lab Notebooks - for prospective users - The Gordon Institute - University of Cambridge
Provides information useful for researchers in selecting ELNs as well as a comparison guide for Electronic Lab Notebooks.
Electronic Lab Notebooks - Harvard Biomedical Data Management
Discusses best practices for ELNs and maintains a comparison grid of current Electronic Lab Notebooks on over 50 features.

File versioning

File versioning is a great way to maintain order in your research computations as well as to allow for better collaboration. There are many collaborative tools that support file versioning.

Check our guide for information on principles of good file naming, versioning and maintenance.

Data File Management
by Lora Lennertz Last Updated May 24, 2023 341 views this year

Git BASH
While there are other uploads for bash, Git bash integrates well with Github and GitLab
Subversion - Apache
CVS - Concurrent Versions System
Mercurial
Bazaar
Darcs
DataLad
Dat
git-annex

Data Validation and Code Checking

goodtables
Gerrit
a Perforce based code review tool to facilitate peer-review of changes prior to submission to the central code repository

Data Sharing Best Practices

Host code on a collaborative platform
Obtain a DOI for your data and code.
Avoid spreadsheets and any proprietary file structures when possible, plain data text preferable
Clearly separate, label, and document all data, files, and operations that occur on data and files
Share using open licensing
Upload preprints, try ScholarWorks
Release code near time of paper submission
Add a reproducibility statement
Keep an up-to-date web presence
Describe software properly with versions and software dependencies
Describe fully the environment of you computations
Scripts for data cleaning included with research materials and commentary to explain key decisions made about missing data and discarded data
Include a README file
Whenever possible use computation software that has license permissive enough to allow users to use the software, reproduce the results and extend them.

Data Storage and Repositories
by Lora Lennertz Last Updated Apr 10, 2024 196 views this year

Code documentation tools

Code documentation provides human readable elements embedded in your code to allow other users to follow your code.

Many of these documentation tools will also allow you to create runnable code within a text. You can also make complete documents and even books!

Jupyter Notebooks
Jupyter may be used with Python, Julia, R and C++
Jupyter Notebooks
Jupyter may be used with Python, Julia, R and C++
R Markdown
rmarkdown lets you insert R code into a markdown document. R then generates a final document, in a wide variety of formats, that replaces the R code with its results.
Doxygen
reStructuredText
reStructuredText
Sphinx
Markdown Guide
Markdown Live Preview

Packaging and sharing

Packaging tools allow you to collect multiple files (data, text, etc.) into...well...packages for easier portability. They are very helpful in moving files to and from a computer to a cloud environment.

Some packaging systems are being developed to also provide the appropriate software that is needed to run analyses within the code packages. CodeOcean is one of these systems.

Sharing and Preserving

Data Storage and Repositories
by Lora Lennertz Last Updated Apr 10, 2024 196 views this year