Archives as Data

This guide provides guidance on how to use archival data, including how to extract data sets from archival materials.

Steps for extracting archival data

When approaching the process of extracting archival data for the first time, it can be helpful to have a framework for approaching that work. Note however, that while this process is presented here as a series of suggested steps, the research process is iterative, not linear. For a given topic, for example, incarceration rates in Arkansas, you will likely need to go through this process for more than one collection or source, and you may need to revisit specific steps based on your findings. The process will vary by topic and the types of sources from which you are planning to extract data from, but the following steps will give you a basic outline of the process of extracting data from archival sources.

Extracting Data Sets from Archival sources:

  1. What is the question you are trying to answer?
  2. What types of information or data do you need to answer that question? Determine what kind of information you are looking for and why.
  3. Search for appropriate data
    1. Who is likely to have collected the information you need? This will help you narrow your search to legal documents, for research on incarceration rates, for example, as opposed to newspapers or school yearbooks. 
    2. Decide where to look for archival data:
      • Archives/libraries
      • Online
      • Government agencies
      • All of the above
    3. Are there archival sources with this kind of data? 
    4. Once you've found the data you are looking for, determine what type(s) of archival sources you will be using (could include census records, legal documents, research collections, rare books, government documents, environmental reports, newspaper articles).
    5. Do these archival sources provide data sets, or will you need to extract the data yourself?
    6. Devise a systematic collection strategy
  4. Initial analysis of data
    1. Is there any information missing?
    2. Has the data been inconsistently collected (for example, legal records available for the years 1880-1900, but not 1910-1915 because those records were lost due to water damage)
    3. Can you verify this data with other available records?
  5. Translate text-based or image-based data into numerical data
    1. This can be done through coding
  6. Decide how you plan to use the data now that you have it. Will it help you understand the context of an issue? Establish a baseline? Identify trends?

For more information on collecting and using archival data, you may want to consult the resources below: