Appendix A: Finding Aids Platform Product Details

Feature

CONTENTdm 4.2
With
with Oregon API applied

CONTENTdm v.n βeta

DLXS 12

 

System Requirements—Describes compatibility with Library hardware and operating systems

Operating System

Solaris 10 

Solaris 10 

Solaris, Linux

Programming language(s)

PHP and HTML for Web template customization.  XSLT for EAD 2000 transformation

PHP and HTML for Web template customization.  XSLT for EAD 2000 transformation

SGML, XML, XSLT, HTML, PERL

Other software needed

n/a

n/a

X-Pat  (indexing software) Apache

Data storage (xml, html, db, etc.)

XML, XSL and files

XML, XSL and files

XML, DTD, HTML, XSLT, DB is MySQL and XPAT

Data platform(s); Data Structure

Internal CONTENTdm proprietary system

Internal CONTENTdm proprietary system

DLXS open source system developed at University of Michigan; XPAT indexing

Build from precompiled binaries

N/A

N/A

Yes

Backup and recovery

by DLT

By DLT

By DLT


 

 

Back End Processing — Describes steps, tools, and technical skills used to input finding aids

Describe steps, tools, and technical skills used to input finding aids by the batch. 

See “Local Customization Required for complete instructions.

 

Loading by batch is implicit in the design features of CONTENTdm.

  • Generate valid EAD2002 finding aids.
  • Use CONTENTdm XML Importer with Acquisition Station to import (single or batch)
  • Use tool to map EAD2002 to Qualified DC
  • Upload finding aids into CONTENTdm

 

See Appendix A for complete instructions.

All inputting is done as a batch, usually about 120-140 finding aids at a time, 5 times a year, taking about 2 hours to run the entire batch if there are no glitches. 

Time consuming difficulties arise from non-standard EADs. Therefore, most of the work in inputting is in validating the EAD documents before input. 

 

See attachment of step by step instructions from Chris Powell. 

Bill Tanzen follows similar procedures.  Cornell could not get their EAD 2002 to run with DLXS 12.

 

Skills: UNIX, XML, XSLT, perl, DLXS knowledge, XPAT, FTP

 

File Naming conventions

n/a

n/a

DLXS uses successive levels of directories, with an A-Z naming scheme.

Can load single finding aids or in batches

Yes.  Batches more typical.

Yes. 

Batches more typical

Inputting and maintaining collections

 

CONTENTdm Acquisition Station  - easy to import through Excel spreadsheet.  Editing can be done through Acq. Station or Web interface.

Ingests XML directly

No back end tools available. 

 

Data (Finding Aid)—Describes formats and standards required for successful input of finding aids

Describe Best Practice Guidelines for the EAD finding aid that the system requires for successful input of finding aids

Requires EAD2002 conforming to BPG of Northwest Digital Archives (or alteration of VB script for ingestion)

 

4.2 with XML generated via Oracle export – none

 

UTF8 testing needs to be done with PSUL data to see what if any work must be done before generating the XML files.

n/a

DLXS is extremely particular about any characters in the finding aid that deviates from UTF8 and the BPG.of the Bentley Historical Library. The BPG used by Bentley Historical Library does not include folder numbers, which we require.  The University of Minnesota BPG includes folder numbers.  Adapting the finding aid to include folder numbers may   significant experimentation with our finding aids to develop a BPG of our own.

 

Format required for ingestion

EAD 2002 converted to tab delimited file

Valid XML

EAD 2002

Best Practices

Yes, requires use of BPG (currently configured to use BPG of Northwest Digital Archives)

n/a

Yes, works best when in conformity with format used by Bentley Historical Library or University of Minnesota Special Collections

Validation Method for EAD

All versions:  Validation of EAD2002 files must be completed before ingest.  PSUL will need investigate validation tools to determine the best one(s) for local use.

 

Note:  Oregon uses RLG “EAD Report Card” to validate XML prior to running VB script.

http://www.rlg.org/en/page.php?Page_ID=20513   There are a many validation tools available.

EAD2002 XML Validation – any tool that does this will work.

Validate against the EAD 2.0 DTD using a tool called onsgmls that is available on Sourceforge.

Normalize each finding aid with the EAD 2.0 DTD (to ensure the attributes are in the same order that they are in the DTD) using osgmlnorm, that is available on Sourceforge.

5.  Because osgmlnorm has the sad effect of turning XML to SGML, convert them back to XML with osx, a tool that is available on Sourceforge.

 

Special character recognition—UTF8 compliant

Version 3.5, or greater, fully supports the Latin 1 character set (which includes Western European languages as well as others) in all of the software components including Acquisition Station, Server, and Search Client. 

Fully supported

Full Unicode support in place as of Version 12.

Validation Method for UTF8

n/a

n/a

As batch, check to see if there are any character entities other than the five allowed by XML using a script called "findentities.pl" that is part of the DLXS package.

Check to see if there are any Windows smart quotes using a tool called xpatutf8check that is part of the DLXS package.

Information on all these steps is at http://www.dlxs.org/training/workshop200607/conversion/index.html

 

Licensing and Support---describes licenses to operate system and support offered by vendor or user community

Support for installation on Solaris

Commercial support (no additional cost)

Commercial support (no additional cost)

User Community

Commercial with license, commercial for fee,

open source for fee,

free open source

 

Commercial support (no additional cost)

Commercial support (no additional cost)

User Community Support

 

Product support and user community

OCLC

OCLC

DLXS middleware is open source and maintained at U of Mich.  XPAT search engine is commercial.

Digital object types supported

Text, Images, Multi media, Finding Aids

Text, Images, Multi media, Finding Aids, Newspapers

Four classes:Text, Image, Bib, Finding Aid

Licensing issues, restrictions, etc.

Yes – enterprise unlimited license permits 200 collections of up to 1million items each.  Cost already covered by UL

Yes – enterprise unlimited license permits 200 collections of up to 1million items each.  Cost already covered by UL.  May include additional cost for CONTENTdm XML importer as add on (TBD)

DLXS – open source

X-PAT license required

Staff Client licensing?

Basic clients included in licensing.  OCR and JPEG2000 licenses available at additional cost (Not applicable for finding aids)

Basic clients included in licensing.  OCR and JPEG2000 licenses available at additional cost (Not applicable for finding aids.  May include additional cost for CONTENTdm XML importer as add on (TBD)

None

Rights Management—describes if we can restrict access to the file at the image or collection level

Authorization at Digital Object level

Yes

Yes

Not intrinsic to system. (If the digital object is a collection - yes)

Authorization at collection level

Yes

Yes

Yes

Limit public view at field level

Yes

Yes

Yes


 

Search Functionality—describes search delimiters  that enhance precision and recall

Indexes across full text of finding aid

Version 4.2 indexes up to first 128,000 characters in full text.

 

Yes (Million plus character fields need to be tested)

 

Yes

Meets minimum requirements for search: EAD fields, unittitle, persname, corpname, formgenre, famname, subject, scopecontent, bioghist 

Ability to add additional fields for indexing by editing VB script.  Or, if XML generated from Oracle, indexing is possible on any field that we select and map to qualified DC.

Search all EAD2002 fields mapped to qualified DC discretely plus full text of XML finding aid

Yes

Limit search to specified, multiple fields (Ex: date and corpname)

Yes

Yes

Yes

Limit search to a single  special collections unit (local data)

Yes

Yes

Yes

Search single group of collections or across other groups of collections

Yes

Yes

Cross class searching is not anticipated, but is in development.

Browse lists?

Yes

Yes

Yes

Able to search by formats (such as Dublin core books, digital object)

Yes

Yes

No

Able to search by index terms (such as format, names, places)

Yes

Yes

If customized

Able to search by date?

Yes

Yes

If customized


 

End User Output—describes the formats in which data can be exported and the data displayed on the result screen

Export output/download formats

Favorites saves link.
Also save HTML or link to URL, or export to Powerpoint through plugin.

Favorites saves link.
Also save HTML or link to URL, or export to Powerpoint through plugin

Bookbag - saves link

Capacity to export to METS/MODS/DC, etc.

Yes to METS

Yes to METS, DC

?

Persistent navigation

Yes via custom XSLT interface.

Yes via custom XSLT interface

Outline view of finding aids has a persistent index.  Driven by XSLT files.

Accommodates persistent URLs to individual finding aids   (PURLS)

 

Yes

Yes

If customized

Breadcrumbs for site

Yes via custom XSLT interface.

Yes via custom XSLT interface.

No but this could be done through custom interface work

 

Customizable output/display

Yes – driven by XSLT files

Yes – driven by XSLT files

Yes – driven by XSLT files

Output includes links to both outline view or full view

Yes – driven by XSLT files

Yes – driven by XSLT files

Yes – driven by XSLT files

Search term highlighted in results list (brief) and full finding aid view

No but could be done as local development

 

Yes, opens window to display highlighted terms.  Also highlights terms in outline view and full view of finding aid.

Results list configurable?

Yes

Yes

No, XPAT returns results in the order that they occurred in the concatenated XML file used for ingest. Can be sorted as below.  Page display is configured by XSLT

Sort and rank search results by relevance

No relevancy ranking available

No relevancy ranking available

Yes

Sort by , author, title, date, etc.

Yes

Yes

Yes, not date.

Mark results and perform operations on the marked list (print, review, email, etc.)

Yes

Yes

Yes

Email a page or item

No except through browser features

Through browser

Email through Bookbag

Personal collections or course groupings  - Save marked list

Yes - Favorites allows you to save favorites to a web page on your local drive.  Web page contains durable links back to main collection/wok.  Ability to export to Powerpoint with plugin.

 Yes - Favorites allows you to save favorites to a web page on your local drive.  Web page contains durable links back to main collection/wok.  Ability to export to Powerpoint with plugin

Yes with Bookbag

Refine search from results list

No

No

Yes

Print Options

Web page printout

Web page printout

Web page printout

Compatible within Angel

Yes

Yes

Yes

Inter institutional sharing of collections/items, etc.

Yes via OAI

Yes via OAI

Yes via OAI

Individual contributions of material to library collections (p2p-like)

No but LionShare has been developed to work with CONTENTdm

No but LionShare has been developed to work with CONTENTdm

No

Can be incorporated into federated search?

Yes via Z39.50 and

Can also be aggregated through WorldCat

Yes via Z39.50 and

Can also be aggregated through WorldCat

?

Ability to add link to CAT record in Finding Aid metadata

Yes

Yes

?

ADA Compliant (AD54)

Out of the box – does not validate. Local API development possible (PHP & XSLT)

Out of the box – does not validate. Local API development possible (PHP & XSLT)

Out of the box – does not.  Local API development possible (XSLT)

Search results display title, date, extent, summary abstract

Yes

Yes

Yes

Search results display finding aid file size

Yes

Yes

Yes

List of result set linked to an alpha list as intermediate navigation (rather than number ranges (e.g., A, B, C …)  vs. (1-300, 301-500, etc.))

No – could develop custom API

No – could develop custom API

No, Customizable?


 

Discovery/Sharing—this describes how accessible finding aids will be by Google, RLG, and OAI

OAI Harvesting

Yes, one click in collection config for DC metadata

 

Yes, one click in collection config for DC metadata

 

Yes

Findable/crawlable by RLG spiders, etc.  describe

Complete Finding Aid files found by RLG spiders based on file location.

Built in features to expose collections to WorldCat.

Complete Finding Aid files found by RLG spiders based on file location.

Built in features to expose collections to WorldCat

??

Findable by Google, etc.

Yes – CONTENTdm DC data

Yes – CONTENTdm DC data

Yes, recent discussion on DLXS listserv suggested ways to enhance Google visibility.


 

Cost—this describes all costs associated with installing and maintaining the platform

 

Initial cost

Already purchased and in production for images and text

– not yet certain whether CONTENTdm importer will be an add on or part of the base software release

DLXS Is free open sources software. XPAT search engine is $15,000 one time, followed by $5,000 annual renewal.  The $15,000 has already been paid.

Annual cost

Two server licenses $5,780 each annual renewal 

Two server licenses $5,780 each annual renewal 

 XPAT search engine $5,000 annual renewal

Hidden costs - beyond expected normal workflow (processing, look/feel, metadata, etc.)

If used "out of the box" hidden costs are primarily look and feel.

If used "out of the box" hidden costs are primarily look and feel.

Developer time to construct workflow tools to enable distribution of processes beyond server based command line access.

 

Developer and support time to maintain single-purpose software

Free trial/downloads available?

Yes

Yes

 


 

 

Local Customization Required—describes the time and skill required to bring the platform into compliance with Special Collections specifications

Describe the customization needed to search and display result as needed by Special Collections

A VB script reads the XML finding aid and transform more appropriate fields into tab delimited text that would then be imported into CONTENTdm.  CONTENTdm indexes the first 128,000 characters of the finding aid as full text. This would be a limitation on our longer finding aids.

The “out of the box”: display of the finding aid material would be replaced with XSLT driven displays (short and long) that conform to more standard ways of viewing finding aids.

 

Additional considerations:

Because PSUL finding aids will eventually be produced from Oracle data tables, we could bypass the VB script and parse the data from Oracle directly into a tab delimited format including any information that we choose.  This eliminates the need to customize the VB script to include PSUL preferred indexes. The VB script also requires conformity to BPG from NWDA standards.  We would eliminate this issue by deriving data directly from Oracle.

 

XSLT style sheets available as part of this shared API would need to be modified to include local variations.

 

An investigation will need to be conducted to determine how we might break long finding aids into multiple indexed fields to bypass the 128,000 character/field full text indexing limitation.

 

Next version (Fall 07)

CONTENTdm will include a new CONTENTdm importer that will map XML into CONTENTdm (indexing the fields that we specify) and import full XML file formats.

 

This will eliminate the need to use the custom API from Oregon.  Additionally, field size limits will be significantly improved so that search engine and indexing tools go to very large limits (exact number not available).

 

.

  • Generate valid EAD2002 finding aids.
  • Use CONTENTdm XML Importer with Acquisition Station to import (single or batch)
  • Use tool to map EAD2002 to Qualified DC
  • Upload finding aids into CONTENTdm

 

 

XSLT style sheets will need to be modified to include local variations.

 

DLXS is a complex system, configuring it for EAD 2002 has proven difficult here and at Cornell.  Configuring and inputting trial finding aids would require extensive support or a consultant from Michigan or Minnesota where this has been successful. 

 

Customizing DLXS display requires strong skills in XML as well as XLST and any customization will have to be upgraded with new versions of DLXS. 

 

Our finding aids will have to be customized to include folder numbers and this will have to be tested to find out  how they will play well with DLXS.

Interface  Look / Feel

XSLT

XSLT

XSLT

Initial configuration and implementation

Load php AP, provided by CDM, tweak VB script to index local fields. 

Modify EAD to comply with BPG of Northwest Digital Archives

Local customizations of XSLT displays

Extensive work required for installation, purl scripts, and validation of finding aids.

May recommend consultant

Additional customization to meet Special Collections minimal specifications

Box list will have to be broken and mapped into separate CDM txt fields to overcome the 128,000 character limit.  .

 

 

TBD

Configure EAD to include file numbers.

 

 

.