StaffGuide: CONTENTdm Cookbook

Recipes for Metadata Entry for the University of Arkansas Libraries

Tips for Working with Compound Objects

Definitions

Compound objects are defined in CONTENTdm as “two or more files bound together with an XML structure.”  There are four types:

  • Document.  A compound object that does not have chapters, page limitations or other structure. Examples include diaries, letters, photo albums, reports, and sheet music.  This is the most common type of compound object.
  • Monograph.  A compound object that has a hierarchical structure, such as sections, chapters, and pages.
  • Picture cube.  A compound object that contains up to six images linked together, generally to provide views of a three-dimensional object.
  • Postcard.  A compound object that contains both the front and back images of two-sided items, such as tickets, flyers, and baseball cards.

PDFs

Multiple-page PDF files can be formatted in CONTENTdm for viewing as single items (navigated by scrolling), or, if PDF conversion is enabled, they can be automatically converted to PDF compound objects with individual pages that must be clicked through.  For University Libraries digital collections, generally prefer to format PDFs as single items.  However, when making a determination about format, take into account the length of the documents to be digitized, the language(s) involved, whether the document is handwritten, the document’s complexity, etc.  In any case, for the convenience of the end-user, use only one formatting method within a single collection.

Metadata

There are two levels of metadata for compound objects:

  • Compound object-level metadata.  The information applicable to the compound object as a whole, such as Title, Description, Subjects, or Creator(s).
  • Page-level metadata. The information about each page, which may include a full text transcript.

Prefer to record as much information as possible at the object level, and use page-level metadata only for recording characteristics unique to an individual component part.  Such page-level characteristics might include the Transcript, Type, or Format.  With the exception of the title, do not repeat data from the object-level metadata in the page-level description.  (Note for using templates: Before importing Compound Objects, clear “Project template (general)” and any other item-specific templates that would match the objects being imported.  Check mark “Compound object template” and edit template to add appropriate information.)

In cases where the component parts of a proposed compound object would require extensive page-level metadata, consider whether these various parts should instead be digitized as individual items.  When deciding, take into account the nature of the materials in question.  Does each part bear a distinct title?  Are different persons responsible for each part?  That is, are there truly distinct creators (not multiple persons performing separate functions, as is the case with co-creators or co-contributors)?  Also, were the parts derived from different source objects or publications?  Do the items belong together conceptually and/or bibliographically?  (For instance, an audio file of a song might logically be paired with a text transcript of its lyrics, even though the digital formats of the two are different.)

Note that when multiple-page PDFs are loaded as single items, only object-level metadata is created, and the entire transcript will be stored in that single record.  In contrast, when they are formatted as compound objects, each page of the PDF file receives a metadata record with the corresponding portion of the transcript.

Compound Objects and Tab-Delimited Text Files

To add compound objects one at a time, simply use the “Compound Objects Wizard.” Tab-delimited files may be used but are not necessary. 

More often, however, you will be working with an Excel spreadsheet (converted to tab-delimited text) and loading multiple compound objects.

The following are the two methods for doing so, along with a summary of how they are different.

Note that CONTENTdm considers PDFs to be single files, even if there are multiple pages, so batches of PDFs should be loaded with a tab-delimited text file using the “Add Multiple Items” function, not one of the compound object functions listed below.

Object List Method

(Complete instructions are at http://www.contentdm.org/help6/objects/multiple4.asp.)

  • Requires one tab-delimited text file containing the object-level metadata for all the items you wish to import. One designated field (Excel column) must contain the name of the compound object subdirectory for each item. 
  • Does not allow you to enter page-level metadata.
  • Limited to importing one type of compound object at a time.
  • Must establish a directory that contains all the files to be added.  Within this root directory are subdirectories containing the files of each compound object, and these subdirectories must be named exactly as listed in the designated field of the tab-delimited file.
  • Must include a transcripts subdirectory if you want to add full text transcripts.  The text files must be named with the same root name as the files that make up the compound object. For instance, item.txt would be the transcript for a scanned file named item.tif.
  • Preparing the Spreadsheet and Objects
  • The first row of the spreadsheet must contain the CONTENTdm field names corresponding to the metadata in each column.  Use the field name label, not the Dublin Core name.
  • Subsequent rows will contain the metadata for each object—one object per row.
  • Make sure the order of the fields on the spreadsheet matches the order of the fields in CONTENTdm.
  • The last column of the spreadsheet must contain the name of the folder (“subdirectory”) where the files for each object are stored.  For example, File10_Item1 might be the name of the folder where the component files MC1380_Box21_File10_Item1_image0001.tif and MC1380_Box21_File10_Item1_image0002.tif are stored.  Call this column “Directory Name.”
  • Save the spreadsheet as a Unicode Text (*.txt) file.  For the required steps to follow, see the appendixes “Tips for Encoding Transcripts & Spreadsheets” and “Tips for Preparing Data in Spreadsheet for Conversion to Tab Delimited Text.”
  • Move all of the object folders (“subdirectories” with individual files) to be uploaded into a single large folder (the “root directory”). This is the folder you will choose as the “scans directory” during import. 
  • The metadata spreadsheet needs to be in a separate folder from the object folders and files.  This .txt file will be the one you choose as the “file name” during import.

Directory Structure Method

(See instructions at http://www.contentdm.org/help6/objects/multiple3.asp.)

  • Each compound object must have its own tab-delimited text file that defines the metadata for the compound object. The tab-delimited text file must have the same name as the compound object directory. Optionally, the tab-delimited text file may also define the compound object's structure (for type “monographs”) and contain page-level metadata.
  • Allows you to enter page level metadata.
  • Limited to importing one type of compound object at a time.
  • Must establish a directory that contains all the files to be added.  Within this root directory are subdirectories containing the files of each compound object that you are adding (i.e., the compound object directories).
  • Within each compound object directory are stored the tab-delimited text file for the object and a further subdirectory for the files that make up the object.  The CONTENTdm default name for this subdirectory is “Scans,” but it can be changed.  
  • If you want to add full-text transcripts, the compound object directory must also include a
    “Transcripts” subdirectory.  The text files must be named with the same root name as the files that make up the compound object. For instance, item.txt would be the transcript for a scanned file named item.tif.

For more information, also consult the “Compound Objects” section of CONTENTdm Online Help at http://www.contentdm.org/help6/objects/index.asp

Import Settings

When using either of the two methods outlined above, CONTENTdm requires that you make several decisions about the import. 

  • Select the option to generate display images.
  • At “Specify Page Names,” always choose to use a sequence starting with the default “Page” and the number “1” (unless you will be naming pages using data in the tab-delimited text file).
  • Transcripts.  If you will be importing transcripts, browse to locate and select the name of the directory in which they are located.  Otherwise choose “No transcripts.”
  • Choose to create a PDF version of the compound object for printing.

 

6/12/14   D. Kulczak