Data File Management

This guide discusses the best practices of data file naming, file formats, and file versioning and control

Data Services

For more information or for assistance, please contact Data Services at datalib@uark.edu.

 

Introduction to data file formats

Research data takes many forms.  Any one project may have multiple examples of these types of data.

  • Documents, spreadsheets
  • Questionnaires, surveys
  • Laboratory and field notebooks
  • Audio and video files
  • Photographs, film, slides
  • Digital objects
  • Artifacts, samples, specimens
  • Databases
  • Data files
  • Procedures and protocols

Open and Reproducible Data Practices

While gathering your data, you may not have a choice as to the formats you are using to retain your data.  However, when saving or sharing your data it is a BEST PRACTICE to follow open data guidelines. Open guidelines help your work to tolerate software and operating system changes.  You or someone else may need to use your data in the future!

When sharing data:

Keep data files in their original raw format  AND

Save data to share in a non-proprietary (open) file format.  If conversion to an open data format will result in some data loss from your files, you might consider saving the data in both the proprietary format and an open format. When it is necessary to save files in a proprietary format, consider including a readme.txt file in your directory that documents the name and version of the software used to generate the file, as well as the company who made the software.

Elements of an appropriate file format:

When selecting file formats for longer storage, the formats should ideally be:

  • Non-proprietary (use open software)
  • Not encrypted
  • Not compressed
  • Familiar to your research community
  • Interoperable among diverse platforms and applications
    • Fully published and available royalty-free
    • Fully and independently implementable by multiple software providers on multiple platforms without any intellectual property restrictions for necessary technology
    • Developed and maintained by an open standards organization with a well-defined inclusive process for evolution of the standard.

The Library of Congress has published a Recommended Formats Statement that discusses this topic in great depth.

 

Best practices in file formats

Proprietary file types should be avoided, and the open file types are preferred:

Data Type

Proprietary (AVOID)

Open (PREFERRED)

Containers

 

TAR, GZIP, ZIP

Databases

 

XML, CSV

Text

Word (DOC, DOCX)

ODF (Open Document Format), TXT, PDF/A, XML, HTML

Tabular

Excel (XLS, XLSX)

CSV, TSV, TAB

 images

Photoshop (PSD), Illustrator (AI)

TIFF, JPG/JPEG 2000, PNG, BMP, GIF

Audio

Windows Media Audio (WMA)

FLAC, WAV, AIFF, MXF

Video

Windows Media Video (WMV)

MOV, MPEG-4, AVI, MXF

Presentations

Windows PowerPoint (PPT, PPTX)

PDF/A, EPUB

Geospatial

CAD (DXF), MapInfo (MIF)

GeoTIFF, GeoPDF, Shapefile (SHP, SHX, PRJ, DBF), NetCDF

 

See the Library of Congress' Sustainability of Digital Formats web site for more complete listings and discussions of formats.