Appendix A: Finding Aids Platform Product Details

Feature

CONTENTdm 4.2
With
with Oregon API applied

CONTENTdm v.n βeta

DLXS 12

 

System Requirements—Describes compatibility with Library hardware and operating systems

Operating System

Solaris 10 

Solaris 10 

Solaris, Linux

Programming language(s)

PHP and HTML for Web template customization.  XSLT for EAD 2000 transformation

PHP and HTML for Web template customization.  XSLT for EAD 2000 transformation

SGML, XML, XSLT, HTML, PERL

Other software needed

n/a

n/a

X-Pat  (indexing software) Apache

Data storage (xml, html, db, etc.)

XML, XSL and files

XML, XSL and files

XML, DTD, HTML, XSLT, DB is MySQL and XPAT

Data platform(s); Data Structure

Internal CONTENTdm proprietary system

Internal CONTENTdm proprietary system

DLXS open source system developed at University of Michigan; XPAT indexing

Build from precompiled binaries

N/A

N/A

Yes

Backup and recovery

by DLT

By DLT

By DLT


 

 

Back End Processing — Describes steps, tools, and technical skills used to input finding aids

Describe steps, tools, and technical skills used to input finding aids by the batch. 

See “Local Customization Required for complete instructions.

 

Loading by batch is implicit in the design features of CONTENTdm.

  • Generate valid EAD2002 finding aids.
  • Use CONTENTdm XML Importer with Acquisition Station to import (single or batch)
  • Use tool to map EAD2002 to Qualified DC
  • Upload finding aids into CONTENTdm

 

See Appendix A for complete instructions.

All inputting is done as a batch, usually about 120-140 finding aids at a time, 5 times a year, taking about 2 hours to run the entire batch if there are no glitches. 

Time consuming difficulties arise from non-standard EADs. Therefore, most of the work in inputting is in validating the EAD documents before input. 

 

See attachment of step by step instructions from Chris Powell. 

Bill Tanzen follows similar procedures.  Cornell could not get their EAD 2002 to run with DLXS 12.

 

Skills: UNIX, XML, XSLT, perl, DLXS knowledge, XPAT, FTP

 

File Naming conventions

n/a

n/a

DLXS uses successive levels of directories, with an A-Z naming scheme.

Can load single finding aids or in batches

Yes.  Batches more typical.

Yes. 

Batches more typical

Inputting and maintaining collections

 

CONTENTdm Acquisition Station  - easy to import through Excel spreadsheet.  Editing can be done through Acq. Station or Web interface.

Ingests XML directly

No back end tools available. 

 

Data (Finding Aid)—Describes formats and standards required for successful input of finding aids

Describe Best Practice Guidelines for the EAD finding aid that the system requires for successful input of finding aids

Requires EAD2002 conforming to BPG of Northwest Digital Archives (or alteration of VB script for ingestion)

 

4.2 with XML generated via Oracle export – none

 

UTF8 testing needs to be done with PSUL data to see what if any work must be done before generating the XML files.

n/a

DLXS is extremely particular about any characters in the finding aid that deviates from UTF8 and the BPG.of the Bentley Historical Library. The BPG used by Bentley Historical Library does not include folder numbers, which we require.  The University of Minnesota BPG includes folder numbers.  Adapting the finding aid to include folder numbers may   significant experimentation with our finding aids to develop a BPG of our own.

 

Format required for ingestion

EAD 2002 converted to tab delimited file

Valid XML

EAD 2002

Best Practices

Yes, requires use of BPG (currently configured to use BPG of Northwest Digital Archives)

n/a

Yes, works best when in conformity with format used by Bentley Historical Library or University of Minnesota Special Collections

Validation Method for EAD

All versions:  Validation of EAD2002 files must be completed before ingest.  PSUL will need investigate validation tools to determine the best one(s) for local use.

 

Note:  Oregon uses RLG “EAD Report Card” to validate XML prior to running VB script.

http://www.rlg.org/en/page.php?Page_ID=20513   There are a many validation tools available.

EAD2002 XML Validation – any tool that does this will work.

Validate against the EAD 2.0 DTD using a tool called onsgmls that is available on Sourceforge.

Normalize each finding aid with the EAD 2.0 DTD (to ensure the attributes are in the same order that they are in the DTD) using osgmlnorm, that is available on Sourceforge.

5.  Because osgmlnorm has the sad effect of turning XML to SGML, convert them back to XML with osx, a tool that is available on Sourceforge.

 

Special character recognition—UTF8 compliant

Version 3.5, or greater, fully supports the Latin 1 character set (which includes Western European languages as well as others) in all of the software components including Acquisition Station, Server, and Search Client. 

Fully supported

Full Unicode support in place as of Version 12.

Validation Method for UTF8

n/a

n/a

As batch, check to see if there are any character entities other than the five allowed by XML using a script called "findentities.pl" that is part of the DLXS package.

Check to see if there are any Windows smart quotes using a tool called xpatutf8check that is part of the DLXS package.

Information on all these steps is at http://www.dlxs.org/training/workshop200607/conversion/index.html

 

Licensing and Support---describes licenses to operate system and support offered by vendor or user community

Support for installation on Solaris

Commercial support (no additional cost)

Commercial support (no additional cost)

User Community

Commercial with license, commercial for fee,

open source for fee,

free open source

 

Commercial support (no additional cost)

Commercial support (no additional cost)

User Community Support

 

Product support and user community

OCLC

OCLC

DLXS middleware is open source and maintained at U of Mich.  XPAT search engine is commercial.

Digital object types supported

Text, Images, Multi media, Finding Aids

Text, Images, Multi media, Finding Aids, Newspapers

Four classes:Text, Image, Bib, Finding Aid

Licensing issues, restrictions, etc.

Yes – enterprise unlimited license permits 200 collections of up to 1million items each.  Cost already covered by UL

Yes – enterprise unlimited license permits 200 collections of up to 1million items each.  Cost already covered by UL.  May include additional cost for CONTENTdm XML importer as add on (TBD)

DLXS – open source

X-PAT license required

Staff Client licensing?

Basic clients included in licensing.  OCR and JPEG2000 licenses available at additional cost (Not applicable for finding aids)

Basic clients included in licensing.  OCR and JPEG2000 licenses available at additional cost (Not applicable for finding aids.  May include additional cost for CONTENTdm XML importer as add on (TBD)

None

Rights Management—describes if we can restrict access to the file at the image or collection level

Authorization at Digital Object level

Yes

Yes

Not intrinsic to system. (If the digital object is a collection - yes)

Authorization at collection level

Yes

Yes

Yes

Limit public view at field level

Yes

Yes

Yes


 

Search Functionality—describes search delimiters  that enhance precision and recall

Indexes across full text of finding aid

Version 4.2 indexes up to first 128,000 characters in full text.

 

Yes (Million plus character fields need to be tested)

 

Yes

Meets minimum requirements for search: EAD fields, unittitle, persname, corpname, formgenre, famname, subject, scopecontent, bioghist 

Ability to add additional fields for indexing by editing VB script.  Or, if XML generated from Oracle, indexing is possible on any field that we select and map to qualified DC.

Search all EAD2002 fields mapped to qualified DC discretely plus full text of XML finding aid

Yes

Limit search to specified, multiple fields (Ex: date and corpname)

Yes

Yes

Yes

Limit search to a single  special collections unit (local data)

Yes

Yes

Yes

Search single group of collections or across other groups of collections

Yes

Yes

Cross class searching is not anticipated, but is in development.

Browse lists?

Yes

Yes

Yes

Able to search by formats (such as Dublin core books, digital object)

Yes

Yes

No

Able to search by index terms (such as format, names, places)

Yes

Yes

If customized

Able to search by date?

Yes

Yes

If customized


 

End User Output—describes the formats in which data can be exported and the data displayed on the result screen

Export output/download formats

Favorites saves link.
Also save HTML or link to URL, or export to Powerpoint through plugin.

Favorites saves link.
Also save HTML or link to URL, or export to Powerpoint through plugin

Bookbag - saves link

Capacity to export to METS/MODS/DC, etc.

Yes to METS

Yes to METS, DC

?

Persistent navigation

Yes via custom XSLT interface.