CONTENTdm has very specific requirements for ingesting encoded text. In order to ensure smooth loading and ingestion of text, specific documents must be encoded as follows.
Text documents must be encoded in UTF-8 format.
In order to preserve this encoding, certain steps must be followed. Use of Notepad++ with the encoding setting “UTF-8 Without BOM” is necessary to fully preserve diacritics and ensure proper ingestion of data in the transcript.
Tab-Delimited Files Generated from Spreadsheets
- In Excel, use the font “Arial Unicode MS” for data entry.
- When data entry is complete, use the Save As command to save the spreadsheet as a tab-delimited text file. In the Save As dialog box, select “Unicode Text” from the “Save as type” menu.
- After selecting “Unicode Text”, select the “Tools” box to the left of the “Save” button. Select the “Encoding” tab in the “Web Options” dialog box. In the “Save this document as” box, select “Unicode (UTF-8)” from the drop-down menu. Select “OK” then “Save” in the Save As menu.
6/23/14 J. Dean