Data Storage and Repositories

This guide provides current information on data storage for short and long term uses.

New Guidance from the National Science and Technology Council

Organizational Infrastructure

Free and Easy Access

The repository provides broad, equitable, and maximally open access to datasets and their metadata free of charge in a timely manner after submission, consistent with legal and policy requirements related to maintaining privacy and confidentiality, Tribal and national data sovereignty, and protection of sensitive data.
Clear Use Guidance The repository ensures datasets are accompanied by documentation describing terms of dataset access and use (e.g., reuse licenses and need for approval by a data use committee).
Risk Management The repository has documented capabilities for ensuring that administrative, technical, and physical safeguards are employed to comply with applicable confidentiality, risk management, and continuous monitoring requirements for  sensitive data
Retention Policy The repository provides documentation on policies for data retention.
Long-term Organizational Sustainability The repository has a plan for long-term management of data, including maintaining integrity, authenticity, and availability of datasets; has contingency plans to ensure data are available and maintained during and after unforeseen events.

Digital Object Management

Unique Persistent Identifiers span style="line-height:107%">The repository assigns a dataset a citable, unique persistent identifier (PID or DPI), such as a digital object identifier (DOI), to support data discovery, reporting (e.g., of research progress), and research assessment (e.g., identifying the outputs of Federally funded research). The unique PID points to a persistent location that remains accessible even if the dataset is de-accessioned or no longer available.
Metadata The repository ensures datasets are accompanied by metadata to enable discovery, reuse, and citation of datasets, using schema that are appropriate to, and ideally widely used across, the communities that the repository serves.
Curation and Quality Assurance The repository provides or facilitates expert curation and quality assurance to improve the accuracy and integrity of datasets and metadata.
Curation and Quality Assurance The repository ensures datasets are accompanied by metadata that describe terms of reuse and provides the ability to measure attribution, citation, and reuse of data (e.g., through assignment of adequate and openly accessible metadata and unique PIDs).
Common Format The repository allows datasets and metadata to be accessed, downloaded, or exported from the repository in widely used, preferably non-proprietary, formats consistent with standards used in the disciplines the repository serves.
Provenance The repository has mechanisms in place to record the origin, the chain of custody, version control, and other modifications to submitted datasets and metadata.

Technology

 

Authentication

<The repository supports authentication of data submitters.

The repository has technical capabilities that facilitate associating submitter PIDs with those assigned to their deposited digital objects, such as datasets.
Preservation The repository has a plan for long-term management of data, building on a stable technical infrastructure and funding plans.
Security and Integrity The repository has documented measures in place to meet well established cybersecurity criteria for preventing unauthorized access to, modification of, or release of data, with levels of security that are appropriate to the sensitivity of data (e.g., the NIST Cybersecurity Framework:   https:www.nist.gov/cyberframework).

 

 

Types of Repositories - Pros and Cons

Type of Repository Pros Cons
Domain-specific data repository most likely to provide domain expertise for data retention and searching; most visible to your colleagues More selective; higher standards for metadata and documentation
General purpose data repository most likely to provide useful search and navigation tools More need for contract review for copyright, long-term preservation and appropriateness to your funder and/or publisher
Institutional data repository most likely to accept a variety of data and to ensure long-term preservation ScholarWorks@uark.edu is happy to accept your papers; however, it is unable to support data files.
Journal supplementary material services Most likely to comply to publisher's requirements May be costly and may not support long-term storage and preservation needs.
Departmental, project or personal webpage and collections Tailored to your data and collection. Traditional method of sharing Less visible to potential users, requires personal or departmental upkeep and unlikely to sustain long-term access to your data

 

Additional Considerations for Repositories Storing Human Data

 

Fidelity to Consent The repository employs documented procedures to restrict dataset access and use to those that are consistent with participant consent (such as for use only within the context of research on a specific disease or condition) and changes in consent.
Security The repository implements and provides documentation of appropriate approaches (e.g., tiered access, credentialing of data users, security safeguards against potential breaches) to protect human subjects’ data from inappropriate access.
Limited Use Compliant The repository employs documented procedures to communicate and enforce data use limitations, such as preventing reidentification or re-distribution to unauthorized users
Download Control The repository controls and audits access to and download of datasets.
Request Review The repository makes use of an established and transparent process for reviewing data access requests.
Plan for Breach The repository has security measures that include a response plan for detected data breaches.
Accountability The repository has procedures for addressing violations of terms-of-use and data mismanagement.