<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
    <channel>
        <title>What is Digital Library Architecture?</title>
        <link>http://www.personal.psu.edu/mjg36/blogs/</link>
        <description>Thoughts around digital libraries, technical architecture, content curation, preservation services, and so on.</description>
        <language>en</language>
        <copyright>Copyright 2012</copyright>
        <lastBuildDate>Sat, 27 Oct 2012 21:06:19 -0500</lastBuildDate>
        <generator>http://www.sixapart.com/movabletype/</generator>
        <docs>http://www.rssboard.org/rss-specification</docs>
        
        <item>
            <title>Understanding (e.g.) DOIs for data sets</title>
            <description><![CDATA[<div>
Data citation is a topic that frequently comes up in conversations around data management. During a call with a community of data curators yesterday, I was asked whether ScholarSphere supported <a href="http://www.doi.org/">DOIs</a> for citing data sets.
</div>

<div><br/></div>
<div>
I have to admit that while I understand the value of data citation &#8212; tracking use &amp; re-use, measuring impact of data sets independent of their publications, giving credit to data publishers, &amp;c. &#8212; I continually get stuck on how identifiers such as DOIs from <a href="http://datacite.org/">DataCite</a> or <a href="https://wiki.ucop.edu/display/Curation/ARK">ARKs</a> from <a href="http://www.cdlib.org/services/uc3/ezid/">EZID</a> fit into the picture. Or, rather, why such indirect identifiers are valued more than the native HTTP URIs that are minted and managed by data repositories. Here I assume that these data repositories are run by institutions whose missions &amp; business interests include a commitment to persistence of content and identifiers held within their repositories. (Is that a faulty or naïve assumption?)
</div>

<div><br/></div>
<div>
The argument for indirect identifiers &#8212; identifiers that point at and resolve to other identifiers &#8212; like DOIs usually goes like this: hey there, cultural heritage organizations and publishers have done a pretty poor job of persisting their identifiers so far, partly because they didn&#8217;t grok the commitment they were undertaking, or because they weren&#8217;t deliberate about crafting sustainable URIs from the outset, or because they selected software with brittle URIs, or because they fell flat on some area of sustainability planning (financial, technical, or otherwise), and so because you can&#8217;t trust these organizations or their software with your identifiers, you should use this other infrastructure for minting and managing quote persistent unquote identifiers.
</div>

<div><br/></div>
<div>
<b>SIDEBAR</b>: That&#8217;s a lot of becauses, all of which (to be perfectly frank) are painfully true. As an employee of a service provider within a very large academic library, I find this unacceptable. The solution from my perspective is not to punt responsibility for persistent identifiers. The solution is to confront each of those becauses and learn from our mistakes, and (as information service providers who oughta know better) to better steward and manage identifiers for data sets (and other deposits). I digress.
</div>

<div><br/></div>
<div>
Are there other compelling arguments for using indirect identifiers to cite data sets? This is where you come in. 
</div>

<div><br/></div>
<div>
Back to the main point.  Here is the million-dollar question about using (e.g.) DOIs for data sets: who manages these DOIs?  Is it the service provider (such as DataCite, or <a href="http://scholarsphere.psu.edu/">Penn State ScholarSphere</a>)?  Or is it the owner of the data set?
</div>

<div><br/></div>
<div>
If it&#8217;s the service provider, how are they to know when data owners move their content elsewhere?  And how does that scale?
</div>

<div><br/></div>
<div>
If it&#8217;s the data owner, uh, really? Do we realistically expect data owners to manage their own DOIs? I may be being cynical here, but I somehow don&#8217;t see that happening on any scale that has an appreciable impact on the broader issue of data citability and identifier persistence.
</div>
]]></description>
            <link>http://www.personal.psu.edu/mjg36/blogs/2012/10/understanding-eg-dois-for-data-sets.html</link>
            <guid>http://www.personal.psu.edu/mjg36/blogs/2012/10/understanding-eg-dois-for-data-sets.html</guid>
            
            
              
                <category domain="http://www.sixapart.com/ns/types#tag">content stewardship</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">data citation</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">data management</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">DOIs</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">identifier management</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">identifiers</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">persistent identifiers</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">repositories</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">URIs</category>
              
            
            <pubDate>Sat, 27 Oct 2012 21:06:19 -0500</pubDate>
	    
	    
        </item>
        
        <item>
            <title>Ingest: Lessons learned</title>
            <description><![CDATA[<meta http-equiv="content-type" content="text/html; charset=utf-8"><div>Now that we have a by-no-means-complete-but-still-useful <a href="http://www.personal.psu.edu/mjg36/blogs/2011/02/ingest-is-a-barrier-to-ingest.html">list of common barriers to ingest</a>, I thought I'd share the lessons learned. &nbsp;We hope to apply these lessons in building <a href="http://github.com/MaxFisher/caps">CAPS</a>, our prototype curation services platform.</div><div style="font-weight: bold; "><b><br /></b></div><div><ul><li><span class="Apple-style-span" style="font-weight: 800;">Create a namespace, or namespaces, for identifiers that far exceeds foreseeable needs </span>--<span class="Apple-style-span" style="font-weight: 800;">&nbsp;</span>Our first namespace can accommodate&nbsp;7,072,810,000 identifiers. &nbsp;We're using the <a href="https://wiki.ucop.edu/display/Curation/ARK">Archival Resource Key specification</a> for identifiers (each of which will be mapped to HTTP URIs), and the Python-based <a href="http://github.com/mjgiarlo/arkpy">arkpy</a> library for minting.</li><li><b>Decouple the ingest process from the publication process</b> -- We plan to build a small suite of applications and tools upon our curation services platform, the first of which is for what we've been unimaginatively calling "generic ingest &amp; management." &nbsp;The application is for authenticated, authorized users only -- it's a tool for curatorial operations not for end-user display. &nbsp;The ingest application will never automatically publish objects and it makes the assumption that all objects are private to the curator until the curator decides otherwise.</li><li><b><meta http-equiv="content-type" content="text/html; charset=utf-8"><span class="Apple-style-span" style="font-weight: normal; "><b>Plan for scale and test performance from the outset</b>&nbsp;-- The current phase of development on CAPS was given a very ambitious deadline so we have not had the time to focus on performance and scale as much as we would have liked. &nbsp;We have a list of areas to address in the next phase, however, and a laundry list of technologies to vet and test for our scaling needs. &nbsp;We've also lined up a small team of folks to help out with system testing &amp; QA.</span></b></li><li><b>Make metadata input optional</b> -- We believe that curators, not systems, curate, and thus allow them to decide how richly objects ought to be described. &nbsp;We intend to provide curators with tools that allow (and perhaps encourage) rich metadata to be attached to objects but as far as the "generic" ingest application (and the curation services platform underneath) is concerned, all elements in the data dictionary are repeatable and none are required. &nbsp;We will be building similar "profiled" ingest applications for specific purposes in the near future, such as an ingest application for electronic business records, which will, however, be more stringent about metadata (and also about file formats, which the generic ingest app couldn't care less about).</li><li><b>Allow stakeholders to drive decisions and, above all, communicate with users</b> -- This may be a meta-lesson, and it feels like the most important of them all. &nbsp;Our development team for CAPS consists of our lead developer, a digital curator, an archivist, a metadata librarian, a project manager, and an architect. &nbsp;Our project team is made up of our development team plus stakeholders from across the University Libraries including representation from our Digitization &amp; Preservation department, the Arts &amp; Architecture library, and University Archives. &nbsp;Our project team meets for one hour a week -- a Herculean task to find a mutually convenient slot, let me tell you -- and our development team meets for fifteen minutes every morning. &nbsp;The point here is that our stakeholders -- the primary eventual users of the ingest application -- are invested in the project, and they get a chance to see, criticize, and drive what we've developed every week as it evolves. Because we're in the same room so often, we get to communicate certain points regularly, e.g., what you ingest is only as permanent as you wish; identifiers are not precious at all; the curation platform is there for you to use in ways useful to you, so the typical "don't put that into the repo yet" mindset doesn't apply; your stuff is as findable as the richness of your metadata, but we can provide other ways to find your stuff (full-text search, etc.); and so forth.</li></ul><div><br /></div></div><div><span class="Apple-style-span" style="font-weight: 800;"><br /></span></div> ]]></description>
            <link>http://www.personal.psu.edu/mjg36/blogs/2011/02/ingest-lessons-learned.html</link>
            <guid>http://www.personal.psu.edu/mjg36/blogs/2011/02/ingest-lessons-learned.html</guid>
            
            
              
                <category domain="http://www.sixapart.com/ns/types#tag">content stewardship</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">digital library architecture</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">electronic records</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">repositories</category>
              
            
            <pubDate>Thu, 17 Feb 2011 08:13:46 -0500</pubDate>
	    
	    
        </item>
        
        <item>
            <title>Ingest is a barrier to ingest</title>
            <description><![CDATA[<div>Last week I attended the latest iteration of one of my favorite conferences, <a href="http://code4lib.org/conference/2011">Code4Lib 2011</a>, which included a full-day <a href="http://curatecamp.org/">CURATEcamp</a> hackfest as a pre-conference session&nbsp;(sponsored by the&nbsp;<a href="http://www.clir.org/dlf.html" style="text-decoration: underline; ">Digital Library Federation</a>). &nbsp;Rather than writing up a full report of the event -- no one really reads those, right? -- I wanted to comment on a conversation from the hackfest.</div><meta http-equiv="content-type" content="text/html; charset=utf-8"><div><br /></div><div>A group gathered to discuss digital forensics, specifically in the context of forensics work done pre-ingest. &nbsp;I've heard other folks talk about pre-ingest processes and so I wondered aloud: what does it say about our repositories, and the ingest process, that we do so much pre-ingest? &nbsp;The consensus was that the ingest process is frequently expensive. &nbsp;A subgroup split off to explore this.</div><div><br /></div><div>The ingest process is a topic I'm keenly interested in since we (Penn State's <a href="http://stewardship.psu.edu/">digital stewardship program</a>) are in the middle of building a prototype ingest application ("<a href="http://github.com/MaxFisher/caps">CAPS</a>"). &nbsp;If we can learn some lessons from our peers about how to make ingest easier and faster, the timing is right to build on these lessons and make novel, more interesting mistakes rather than boring, well-known ones.</div><div><br /></div><div>Here are the barriers to ingest that were identified:</div><div><br /></div><div><ul><li><b>Identifiers are precious</b> -- Ingesting an object usually kicks off a series of processes, one of which mints a new identifier for an object. &nbsp;There is a perception that identifiers are a limited commodity, that they are somehow precious or rare. &nbsp;</li><li><b>Promise of permanence</b>&nbsp;-- There is a perception that ingesting an object creates a contract for the permanence of that object. &nbsp;The contract may be illusory depending on the "repository" into which the object was ingested.</li><li><b>Findability</b> -- Once an object is ingested, it is difficult to find. &nbsp;I would have liked to pursue this point a bit further. &nbsp;What it suggests to me is that in some contexts, the repository has not been sufficiently incorporated into the workflows or work environments of those doing the ingest, so it feels like alien territory rather than the local filesystems and mapped drives they are accustomed to. &nbsp;Pure speculation on my part.</li><li><b>Complex downstream workflows</b> -- Given that ingest is a series of processes, there is concern that "just ingesting something" might cause breakage downstream. &nbsp;For instance, if an object is ingested, is it automatically published somewhere end-users can get to it, and has the object been fully prepared for publication? &nbsp;One such workflow might be automatic generation of derivatives, which is an expensive operation for certain formats and large files.</li><li><b>Rights</b> -- Related to the above bullet, there is concern that end-user access rights be cleared in advance to ingest, for fear that the object will wind up in the wrong hands.</li><li><b>Metadata</b> -- The ingest process requires too much metadata input. &nbsp;This concern is tied to findability above, and together they suggest an all-too-familiar tension: how much metadata is enough to make an object findable later, and how much is enough to make the ingest process cumbersome?</li><li><b>Psychological factors</b> -- There is a mindset wherein curation happens outside of the repository and preservation happens inside -- that these are distinctly different activities which happen serially if at all -- in which case one might be loath to ingest an object until it's "ready" for the repository, whatever that means.</li><li><b>Personal time</b>&nbsp;-- The ingester simply lacks the time to push the right buttons.</li><li><b>Software performance</b> -- The ingest process is slow due to lack of optimization, lack of attention to scale, lack of performance tuning, and so forth.</li></ul><div>There are a number of lessons to be learned from the above. &nbsp;I'll write soon about those and how we're applying them to our CAPS project at Penn State.</div></div><div><br /></div><div>Have any barriers to add to the list?</div> ]]></description>
            <link>http://www.personal.psu.edu/mjg36/blogs/2011/02/ingest-is-a-barrier-to-ingest.html</link>
            <guid>http://www.personal.psu.edu/mjg36/blogs/2011/02/ingest-is-a-barrier-to-ingest.html</guid>
            
            
              
                <category domain="http://www.sixapart.com/ns/types#tag">code4lib</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">CURATEcamp</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">ingest</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">repositories</category>
              
            
            <pubDate>Fri, 11 Feb 2011 14:35:14 -0500</pubDate>
	    
	    
        </item>
        
        <item>
            <title>Impressions from Open Repositories 2010</title>
            <description><![CDATA[One minor concern I brought to the <a href="http://or2010.fecyt.es/publico/Home/index.aspx">conference</a>, which has roots in my attendance at the 2007 conference, was whether it would be too system-oriented to be relevant, since Penn State doesn't plan to use Fedora, DSpace, or ePrints.&nbsp; I was pleased to see the increased attention to alternative approaches to preservation and to repositories as a set of services rather than (necessarily) as a system.<br /><br />Penn State's institutional digital stewardship program is investigating <a href="http://www.cdlib.org/services/uc3/curation/">curation microservices</a>, such as those developed by the University of California Curation Center, as an architecture for digital curation. So I came to OR2010 with an eye towards development in this space.&nbsp; I wasn't the only one; both the <a href="http://sun-pasig.ning.com/">PASIG</a> session and the <a href="http://duraspace.org/">DuraSpace</a> strategic overview identified microservices as a trend, and a number of microservices seem likely to be built into the 1.7 release of DSpace.<br /><br />I attended the curation microservices BOF, which was well-attended taking into account it was up against a <a href="http://or2010.fecyt.es/Publico/Developer/index.aspx">developer challenge event</a> -- institutions represented include Universitat Autònoma de Barcelona, Harvard, U. of Hull, California Digital Library, MIT, UNC-Chapel Hill, San Diego Supercomputer Center, Penn State, Northwestern, U. of Pennsylvania, and Princeton.<br /><br />We discussed our interests in the topic, experiences w/ the microservices approach, development of a community around microservices, the California Digital Library's role in sustaining said community, and governance of collaborative software development and of the community.<br /><br />The BOF covered a lot of ground in a short period of time, and we agreed to start having periodic open teleconferences to share information about microservices development.&nbsp; We'll also utilize the <a href="http://groups.google.com/group/digital-curation">digital-curation Google Group</a> for virtual communication, and use events such as Open Repositories, <a href="http://www.dcc.ac.uk/events/conferences/6th-international-digital-curation-conference">IDCC</a>, and <a href="http://www.ifs.tuwien.ac.at/dp/ipres2010/">iPRES</a> -- in addition to <a href="http://curatecamp.org/">Curation Technology Camp (CURATEcamp) events</a> -- for microservices get-togethers. ]]></description>
            <link>http://www.personal.psu.edu/mjg36/blogs/2010/07/impressions-from-open-repositories-2010.html</link>
            <guid>http://www.personal.psu.edu/mjg36/blogs/2010/07/impressions-from-open-repositories-2010.html</guid>
            
            
              
                <category domain="http://www.sixapart.com/ns/types#tag">CURATEcamp</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">microservices</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">or2010</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">repositories</category>
              
            
            <pubDate>Wed, 28 Jul 2010 09:34:11 -0500</pubDate>
	    
	    
        </item>
        
        <item>
            <title>Braindump for Q3 2010</title>
            <description><![CDATA[<ul><li><b><a href="http://www.personal.psu.edu/mjg36/blogs/2010/01/e-content-stewardship-program-kick-off.html">Reviewing
 digital library platforms</a> for the e-Content Stewardship Council -- </b>Patricia and I have completed all user interviews and platform demonstration sessions, and have finished evaluating all four in-scope platforms (CONTENTdm, Olive, DPubS, and ETD-db) along a set of twenty-odd criteria defined in a comparative analysis project at Purdue.&nbsp; Next up is identifying themes from the evaluation for our report's executive summary.&nbsp; We had hoped to finish this work in May, but apparently summer is a hard time to get stuff done.&nbsp; Who knew?</li><li><b>Institutional repository of 
electronic records -- </b>Work has begun on our e-records system via the inclusion of records use cases in another pilot project.&nbsp; More on that later.</li><li><b>Learning more about "<a href="http://www.aspeninstitute.org/publications/promise-peril-big-data">big
 data</a>" and continuing the <a href="https://blogs.psu.edu/mjg36/blogs/2010/01/data-management-discussion.html">data management discussion</a> -- </b>I attended <a href="http://www.asis.org/Conferences/IA10/ResearchDataAccessSummit2010.html">Research Data Access and Preservation Summit</a> in April. A number of themes emerged from RDAP: 1) methods for
involving researchers in curation activities, 2) the user-friendliness of the data deposit process, and 3) the boundary between preservation and curation, caused by the dynamic nature of research data and barriers to repository ingest such as complicated processes and a write-once assumption.&nbsp; We at Penn State have not yet gotten our big data focus group, under ITANA, off the ground but hope to do so later this year.</li><li><b>Storage strategies -- </b>Following the dissolution of the Data Storage Working Group, Digital Library Technologies continued the discussion of storage strategies to guide purchase, allocation, and management of storage from the short- to the mid-term.&nbsp; We have just this week written a project charter to explore the idea, culminating in a strategic plan for storage in December.</li><li><b>Evaluating next-generation information 
discovery tools for the libraries with the Libraries' Department of Information Technology</b> -- The RFP process has finished and we have selected a product that meets our many needs.&nbsp; We will be announcing our decision as soon as the ink dries on the paper.</li><li><b>Working on requirements for a draft institutional 
identifier standard with the <a href="http://www.niso.org/workrooms/i2">NISO
 I2 </a>working group -- </b>The I2 group distributed a survey about features and requirements of the draft I2 standard, and has begun analyzing the results.&nbsp; Feedback has been provided primarily from the library sector, and has largely validated our work thus far<b>.</b></li><li><b>Attending Open Repositories 2010 -- </b>See <a href="http://www.personal.psu.edu/mjg36/blogs/2010/07/impressions-from-open-repositories-2010.html">conference report</a>.</li><li><b>Planning Curation Technology Camp (CURATEcamp) 2010</b> -- Since I last <a href="http://www.personal.psu.edu/mjg36/blogs/2010/06/digital-curation-community.html">wrote</a> about the camp, the conference planning group has been busy dotting "i"s and crossing "t"s.&nbsp; We're all looking forward to the camp which is coming up soon (mid-August).<br /></li><li><b>Curation microservices pilot</b> -- A short-term pilot project involving software developers and curators will explore a number of strategic aims of the Content Stewardship Program: defining curatorial requirements, building and testing a curation architecture, engaging software developers and curators at other institutions, treating data in a cross-platform manner, exploring roles and workflows that cross unit boundaries, and building a testbed for electronic records curation services.&nbsp; Project work will include curating copies of a small sample of data selected from e-records, CONTENTdm, Olive, DPubS, and ETD-db; building and integrating existing   lightweight digital curation tools based upon curation microservice specifications; applying those tools and   specifications to curate the sample dataset; examining the  benefits, costs, and limitations of the microservice approach; and determining if microservice-based curation architecture is viable at Penn State.</li><li><b>MetaArchive implementation roadmap</b> -- Penn State is now a member of the MetaArchive distributed digital preservat cooperative.&nbsp; I am working with a team of four on an implementation roadmap, detailing a timeline, new roles that will need to be defined for our involvement, and hardware specifications.&nbsp; This is a short-term project.</li></ul>]]></description>
            <link>http://www.personal.psu.edu/mjg36/blogs/2010/07/braindump-for-q3-2010.html</link>
            <guid>http://www.personal.psu.edu/mjg36/blogs/2010/07/braindump-for-q3-2010.html</guid>
            
            
              
                <category domain="http://www.sixapart.com/ns/types#tag">archival storage</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">big data</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">content stewardship</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">CURATEcamp</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">electronic records</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">microservices</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">NISO I2</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">update</category>
              
            
            <pubDate>Fri, 02 Jul 2010 13:54:35 -0500</pubDate>
	    
	    
        </item>
        
        <item>
            <title>Digital curation community</title>
            <description><![CDATA[<div>I wrote <a href="http://www.personal.psu.edu/mjg36/blogs/2010/04/braindump-for-q2-2010.html">before</a> about a potential <a href="http://groups.google.com/group/digital-curation/web/curation-technology-sig">curation technology unconference</a> which has been dubbed CURATEcamp 2010.&nbsp; Not in my wildest dreams could I have imagined just how receptive folks have been to the idea.&nbsp; <br /><br />An <i>ad hoc</i> planning team -- consisting of folks from Penn State, the California Digital Library, the University of California-San Diego, and the Library of Congress -- has been hard at work bringing this idea to life. &nbsp;On June 15th, <a href="http://groups.google.com/group/digital-curation/msg/be42ea348491ff6b">we</a> <a href="http://twitter.com/mjgiarlo/status/16244857753">announced</a> on that registration for the event opened.&nbsp; Eight days later, we <a href="http://twitter.com/mjgiarlo/status/16875193143">announced</a> that all seventy-five slots had been filled.&nbsp; Fret not, though; you can still be added to a <a href="http://curatecamp2010.eventbrite.com/">waitlist</a>.&nbsp; <br /><br />The camp is now yours, digital curation community -- let's see what you've got.<br /></div><br />CURATEcamp 2010 is but one of many events within our community.&nbsp; I'd like to highlight some others.<br /><br /><ul><li>Delphine Khanna (UPenn) and Stephen Abrams (CDL) have planned a <a href="http://groups.google.com/group/digital-curation/msg/2a6f6464bea38ea7">microservices BOF session</a> at <a href="http://or2010.fecyt.es/publico/Home/index.aspx">Open Repositories 2010</a>.&nbsp; It will focus on discussion of community around curation microservices.</li><li><a href="http://twitter.com/pjvangarderen">Peter Van Garderen</a> has <a href="http://groups.google.com/group/digital-curation/msg/e40d43baa13c5cdd">proposed</a> a microservices BOF at <a href="http://www.ifs.tuwien.ac.at/dp/ipres2010/">iPRES 2010</a>.</li><li>Delphine Khanna has <a href="http://groups.google.com/group/digital-curation/msg/cf2b36dd10d51f3f">proposed</a> a pre- or post-conference session on curation microservices for the <a href="http://www.diglib.org/forums.htm">DLF Fall Forum</a> this November.</li><li><a href="http://twitter.com/pmhswe">Patricia Hswe</a>, <a href="http://twitter.com/declan">Declan Fleming</a>, and I have had a workshop proposal <a href="http://groups.google.com/group/digital-curation/msg/3bfd70ea2812bce5">accepted</a> for <a href="http://www.dcc.ac.uk/events/conferences/6th-international-digital-curation-conference">IDCC10</a>, and the provisional name for this workshop, a full-day unconference, is CURATEcamp II.<br /></li></ul>Whereas CURATEcamp 2010 focuses on the curation microservices approach, CURATEcamp II is a bit more general.<br /><br /><blockquote><meta http-equiv="content-type" content="text/html; charset=utf-8"><span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;"><span class="Apple-style-span"><b>ABSTRACT</b>: As the community of digital curation practitioners has grown, so has the need for collaboration and community. &nbsp;A small number of communities have been formed around digital curation, a few of which focus on the technical aspects of the practice.&nbsp; Extant communities address the implementation and support needs of specific curation platforms, without broader focus on common services and potential points of intersection. There is however a rich ecosystem of tools, practices, and standards around these platforms, and some that require no such platforms, that have potential to benefit the wider community of practitioners. CURATEcamp II is an unconference-style workshop for practitioners of digital curation to share best practices and discuss tools and technologies in a free-form and highly interactive forum. &nbsp;Topics of interest might include identifiers, versioning, transfer, packaging, object structure, filesystem usage, archiving / storage, metadata standards / vocabularies, discovery, and interoperability. The unconference format ensures that all participants are actively engaged in the workshop and gives everyone an opportunity to contribute. &nbsp;Activities may include roundtable discussions, presentations, whiteboard sessions, collaborative software development, and whatever else emerges from the collective creativity of participants.</span><br /><br /></span><meta http-equiv="content-type" content="text/html; charset=utf-8"><span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;"><span class="Apple-style-span"><b>AIMS</b>: CURATEcamp II is an opportunity to build a community of practice around curation tools, that bridges system-specific gaps that have formed in the community. It will encourage discussion about curation tools and practices across software-, project-, and institution-specific boundaries, and attempt to identify best practices and points of collaboration across these boundaries.&nbsp; The community that CURATEcamp II nurtures is intended to persist beyond the end of the IDCC, so another point of discussion will be around how to maintain connections between face-to-face gatherings. The informal approach of CURATEcamp II might also serve as a way to model knowledge sharing for the curator community, not unlike what occurs at BarCamp events, which are loosely structured but highly productive participatory sessions.<br /><br /><span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;"><span class="Apple-style-span"><b>AUDIENCE</b>: </span></span><meta http-equiv="content-type" content="text/html; charset=utf-8"><span class="T3__Char Apple-style-span"></span>CURATEcamp II will be of interest to digital curation practitioners (curators and technologists alike), especially those who have been using and building tools and architectures, and digital curators with experience assessing or evaluating curation tools and services.&nbsp;&nbsp; <br class="Apple-interchange-newline" /></span></span></blockquote>This is an exciting time to be working in the digital curation community!&nbsp; Wondering how to get involved?&nbsp; Hop on over to the <a href="http://groups.google.com/group/digital-curation">digital-curation Google group</a> and join the discussion; it's just getting started.<br /><br /><div>P.S. CURATEcamp 2010 would not be happening without the active engagement of all the folks doing the planning.&nbsp; Thanks to: <a href="http://twitter.com/declan">Declan</a> from UCSD for proposing the idea for the camp over Belgian beer at the <a href="http://monkpub.com/">Thirsty Monk</a> during <a href="http://code4lib.org/conference/2010">Code4Lib 2010</a>; <a href="http://twitter.com/dchud">Dan</a> from LC for focusing on the practical; <a href="http://twitter.com/edsu">Ed</a> from LC for evangelism and support; <a href="http://twitter.com/cpwillett">Perry</a> from CDL for all of his work with the conference venue and the registration system; my colleagues at Penn State for their support; and both Penn State and CDL, without whose contributions and commitment the camp would not have been possible.<br /></div>]]></description>
            <link>http://www.personal.psu.edu/mjg36/blogs/2010/06/digital-curation-community.html</link>
            <guid>http://www.personal.psu.edu/mjg36/blogs/2010/06/digital-curation-community.html</guid>
            
            
              
                <category domain="http://www.sixapart.com/ns/types#tag">CURATEcamp</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">dlf10f</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">idcc10</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">iPRES2010</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">microservices</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">or2010</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">update</category>
              
            
            <pubDate>Thu, 24 Jun 2010 21:44:05 -0500</pubDate>
	    
	    
        </item>
        
        <item>
            <title>Institutional Identifiers and RDF</title>
            <description><![CDATA[<meta http-equiv="content-type" content="text/html; charset=utf-8"><div>In my last <a href="http://www.personal.psu.edu/mjg36/blogs/2010/04/braindump-for-q2-2010.html">braindump</a>, I wrote:</div><div><br /></div><blockquote class="webkit-indent-blockquote" style="margin: 0 0 0 40px; border: none; padding: 0px;"><div><span class="Apple-style-span" style="color: rgb(0, 0, 0); font-family: georgia; line-height: 19px; ">The I2 working group is putting finishing touches on a draft standard and on core metadata required to identify institutions.&nbsp; We hope to share this draft and put out a request for comments in the coming months.&nbsp; I've been modeling the I2 domain in RDF both for more RDF experience and also with the hope that an eventual I2 core service will be exposed as&nbsp;<a href="http://linkeddata.org/" style="text-decoration: underline; outline-style: none; outline-width: initial; outline-color: initial; color: rgb(8, 98, 11); ">linked data</a>.</span></div></blockquote><meta http-equiv="content-type" content="text/html; charset=utf-8"><div><br /></div><div>I've now documented this experience on <a href="http://lackoftalent.org/michael/blog/2010/05/19/i2-resource-description/">my other blog</a>, which is the home of my <a href="http://lackoftalent.org/michael/blog/category/projects/niso-i2/">I2 ramblings</a>.</div><div><br /></div>]]></description>
            <link>http://www.personal.psu.edu/mjg36/blogs/2010/05/institutional-identifiers-and-rdf.html</link>
            <guid>http://www.personal.psu.edu/mjg36/blogs/2010/05/institutional-identifiers-and-rdf.html</guid>
            
            
              
                <category domain="http://www.sixapart.com/ns/types#tag">blogs</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">linked data</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">modeling</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">NISO I2</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">rdf</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">update</category>
              
            
            <pubDate>Wed, 19 May 2010 09:38:51 -0500</pubDate>
	    
	    
        </item>
        
        <item>
            <title>Braindump for Q2 2010 </title>
            <description><![CDATA[My my, has it really been three months since I wrote up my <a href="http://personal.psu.edu/mjg36/blogs/2010/01/my-agenda-for-q1-2010.html">agenda</a>?&nbsp; I've been busy chipping away at the agenda so I thought I'd document my progress now that Q2 is underway.<br /><br /><ul><li><b><a href="http://www.personal.psu.edu/mjg36/blogs/2010/01/e-content-stewardship-program-kick-off.html">Reviewing
 digital library platforms</a> for the e-Content Stewardship Council </b><br /><br />The platform review project that our digital collections curator and I have undertaken continues.&nbsp; We began the project by having folks demonstrate each platform and how they use it, and have been busy with small, informal interview sessions with many of the same folks but also others who work outside of the Libraries.&nbsp; We have a few more interview sessions to conduct and document, so the data gathering portion of the project is nearly complete.&nbsp; In the meantime we've been discussing evaluation criteria.&nbsp; We started off with a short list of criteria, but then noticed the criteria Purdue are using for their <a href="http://blogs.lib.purdue.edu/rep/">comparative analysis of institutional repository software</a> and adopted those instead.&nbsp; We sketched out a structure for the final report, which we hope to finish in May.<br /><br /></li><li><b>Reviewing
 functional requirements for an institution-wide repository of 
electronic records</b><br /><br />This work is still under way.&nbsp; We have a set of well-documented functional requirements for an e-records repository service but have yet to make progress on building anything.&nbsp; We've been talking about applying for a grant to help fund some additional staffing which might be used to help build out proof-of-concept curation services (preservation, provenance, description, discovery) for e-records.&nbsp; I'm really keen on applying <a href="http://www.cdlib.org/services/uc3/curation/">curation micro-services</a>, such as those used at <a href="http://www.cdlib.org/">CDL</a>, to the e-records domain.&nbsp; I see this effort as benefiting both the curation micro-services community and the e-records community -- not to mention our own electronic records initiatives here at Penn state.&nbsp; An all-around win, if you ask me, but then I'm biased.&nbsp; This will be a major activity in the latter half of this year continuing into the next.<br /><br /></li><li><b>Learning more about "<a href="http://www.aspeninstitute.org/publications/promise-peril-big-data">big
 data</a>" and continuing the <a href="https://blogs.psu.edu/mjg36/blogs/2010/01/data-management-discussion.html">data management discussion</a></b><br /><br />Our content stewardship program will doubtless need to address research data.&nbsp; We're not there yet.&nbsp; In the meantime, Penn State's ITANA chapter will be pulling together a working group on the technological and architectural challenges of research data.&nbsp; <a href="http://www.personal.psu.edu/nucci/">Jeff Nucciarone</a> and I will be chairing the group.&nbsp; In the meantime, research data has been on my mind for two reasons: <a href="http://www.lesk.com/mlesk/">Michael Lesk</a> gave a <a href="http://ist.psu.edu/newsevents/?pageID=736&amp;HeadlineID=2028">talk</a> at the information school urging libraries to turn their attention to research data; and I'll be attending the <a href="http://www.asis.org/Conferences/IA10/ResearchDataAccessSummit2010.html">Research Data Access and Preservation Summit</a> in Phoenix later this week.<br /><br /></li><li><b>Evaluating the <a href="http://dlt.its.psu.edu/">DLT</a> <a href="https://lib.stanford.edu/files/PSU_Archival_Storage_Prototype_Final_Report_External%20v5.pdf">archival
 storage prototype</a> and joining the technical team of the Data 
Storage Working Group</b><br /><br />The Data Storage Working Group effort has been repurposed.&nbsp; The steering team will continue to meet informally and discuss archival storage and curation needs across the campuses.&nbsp; The technical team has been dissolved, and the majority of us (who already work together in DLT in support of the same mission) will continue to work in this space.<br /><br /></li><li><b>Evaluating next-generation information 
discovery tools for the libraries with the Libraries' Department of Information Technology</b> <br /><br />The RFP process continues.&nbsp; We hope to have wrapped up our evaluation by the beginning of summer.<a href="http://www.libraries.psu.edu/psul/itech.html"><br /><br /></a></li><li><b>Evaluating
 change management solutions with a team from Penn State's <a href="http://www.personal.psu.edu/kxm/blogs/LibertyRoad/2007/08/itana-itsitana-1.html">ITANA</a>
 group</b><br /><br />I haven't found much time to stay involved with this team, unfortunately, but their work continues apace.<br /><br /></li><li><b>Working on requirements for a draft institutional 
identifier standard with the <a href="http://www.niso.org/workrooms/i2">NISO
 I2 </a>working group</b><br /><br />The I2 working group is putting finishing touches on a draft standard and on core metadata required to identify institutions.&nbsp; We hope to share this draft and put out a request for comments in the coming months.&nbsp; I've been modeling the I2 domain in RDF both for more RDF experience and also with the hope that an eventual I2 core service will be exposed as <a href="http://linkeddata.org/">linked data</a>.<br /><br /></li><li><b>Attending <a href="http://code4lib.org/conference/2010">Code4Lib 2010</a></b><br /><br />You can tell how good a code4lib conference is by how little you 
remember of it.&nbsp; By that measure, this year's conference was the best 
yet.&nbsp; Some of the highlights for me: 1) Linked data, a pattern for exposing 
resources and metadata via the web, continues to be a hot topic among 
cutting-edge library developers. There was a focus this year on how to 
participate in the linked data web in practical and lowish-barrier 
ways.&nbsp; The speed with which concepts move, at code4lib, from "novel, and
 interesting to a few" to "widely talked about and deployed" is 
dizzying; 2) Software development practices continue to mature in 
libraries.&nbsp; We're talking more and more about test-driven design and 
agile development.&nbsp; While these methodologies are beneficial to 
developers themselves, I find this remarkable because it means the gap 
between coders and stakeholders is being bridged, and that means better 
and more usable software, and happier users; 3) Repositories are not 
typically a hot topic at code4lib, but there were a number of prepared 
talks, lightning talks, and breakout sessions on the topic.&nbsp; Fedora 
tends to be the repository most often talked about, if only because it 
is the repository that requires the most hacking -- and these are the 
people doing the hacking.&nbsp; What I found interesting this year was the 
dissatisfaction with monolithic repository software packages, and the 
movement towards "homebrewed", though standards-based, repository 
services, such as those being advocated by the California Digital 
Library.<br /><br /></li><li><b>Meetings,
 meetings, <a href="http://www.flickr.com/photos/sara013/4269735919/">meetings</a></b><br /><br />The meetings, they continue.&nbsp; <br /><br /></li><li><b>Continuing
 to absorb as many of the following as possible: strategic plans, 
project portfolios, process management documents, and various and sundry
 reports, wikis, and blogs</b><br /><br />And this continues as well, though it's hard to find time to contextualize when you've got actual tasks and deadlines.</li></ul><br />And here are some new and upcoming things.<br /><br /><ul><li>I've written about my <a href="http://personal.psu.edu/mjg36/blogs/2010/01/whats-in-a-title.html">search for a practice-oriented curation technology/architecture community</a>, and I'm glad to say I've made some progress on finding said community.&nbsp; I've been part of a conversation revolving loosely around the <a href="http://groups.google.com/group/digital-curation">digital-curation group</a> and that conversation has now turned to planning a <a href="http://groups.google.com/group/digital-curation/web/curation-technology-sig">curation technology workshop</a> which we're called CURATEcamp (CURAtion TEchnology Camp).&nbsp; I hope to have more details to share soon.<br /></li><li>I am attending <a href="http://or2010.fecyt.es/publico/Home/index.aspx">Open Repositories 2010</a> in Madrid this July.&nbsp; I expect to learn about how folks are using repository systems such as Fedora, DSpace, and ePrints, but am more interested in all the other stuff happening on the periphery.&nbsp; There has also been talk of a curation micro-services birds-of-a-feather session, which might serve as a good event to get potential CURATEcampers talking.<br /></li><li>I'll be in Washington, DC in a few weeks working on a team to evaluate <a href="http://www.imls.gov/applicants/grants/NationalLeadership.shtm">IMLS National Leadership grant</a> applications.&nbsp; This will be a new experience for me, and one to which I need to devote a significant chunk of time between now and then, so I'm excited.&nbsp; It will be interesting to see what folks are doing outside of Penn State, and also to get an idea for what sorts of projects wind up getting funded.<br /></li><li>I have some vague ideas for project charters but have yet to really flesh them out.&nbsp; One involves some collaborative development on tools around curation microservices, to be used and evaluated by honest-to-goodness curators with honest-to-goodness data, and the other is about benchmarking some distributed filesystems.</li><li>Techies at Penn State need to talk more.&nbsp; I want a BarCamp-style event for PSU techies so that we can discuss issues across departmental boundaries.&nbsp; Administrators have been nothing but supportive of the idea, and now I just need to find some time to sketch what I have in mind.<br /></li><li>Digital Library Technologies, my department, is hiring!&nbsp; We're looking for someone to come develop software to support our content stewardship program.&nbsp; Like writing code?&nbsp; Interested in how data is curated, stored, and discovered at scale?&nbsp; Consider applying.&nbsp; (Will link to position when it goes public later this week.)</li></ul>Braindump complete.&nbsp; Brain now empty, except to say: boy, State College sure is lovely in the spring.<br /><br /> ]]></description>
            <link>http://www.personal.psu.edu/mjg36/blogs/2010/04/braindump-for-q2-2010.html</link>
            <guid>http://www.personal.psu.edu/mjg36/blogs/2010/04/braindump-for-q2-2010.html</guid>
            
            
              
                <category domain="http://www.sixapart.com/ns/types#tag">archival storage</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">big data</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">content stewardship</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">CURATEcamp</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">data management</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">data storage working group</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">electronic records</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">hiring</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">IMLS</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">ITANA</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">linked data</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">microservices</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">NISO I2</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">or2010</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">PSUTechCamp</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">update</category>
              
            
            <pubDate>Tue, 06 Apr 2010 11:31:28 -0500</pubDate>
	    
	    
        </item>
        
        <item>
            <title>Web discovery and library resources, or, SEO 101</title>
            <description><![CDATA[I attended the Penn State library faculty research <a href="http://search.twitter.com/search?q=psulrq">colloquium</a> on Wednesday, during which I learned all sorts of things about the interesting research being done by my colleagues in the University Libraries.&nbsp; One very interesting talk was by Doris Malkmus, one of our archivists, who was studying how history professors use primary sources, online and otherwise, in their undergraduate lectures.&nbsp; It was no surprise to me that the #1 discovery method for online primary sources was Google, and that institutional repositories hardly rank at all.<br /><br />(Sidebar: I wonder: with online content fragmenting, multiplying, and getting remixed and aggregated, does the definition of "primary source" strain for digital networked resources?)<br /><br />This discovery elicited a number of responses about how difficult search engine optimization is and how we really need to ramp up our marketing efforts.<br /><br />I wouldn't argue with either reaction, really.&nbsp; I do sense a huge missed opportunity here, though, one that we are perfectly capable of not missing.&nbsp; And let me be perfectly clear: I'm no SEO expert.&nbsp; But let me also say that I've seen, firsthand, major SEO advancements in libraries I've worked at, and much of the work was pretty straightforward.<br /><br />I <a href="http://twitter.com/mjgiarlo/statuses/9257943271">tweeted</a> my "SEO for dummies" list and got a couple of <a href="http://twitter.com/dchud/statuses/9266883971">very</a> <a href="http://twitter.com/vphill/statuses/9259297130">good</a> responses in addition to a retweet or two.&nbsp; Here's what I said:<br /><br /><blockquote><span class="status-body"><span class="entry-content">Googleability = 
increased findability + low-cost marketing. How do to it: 1. allow 
crawlers; 2. clean URLs; 3. rich item metadata; 4. links.</span></span><br /><span class="status-body"><span class="entry-content"></span></span></blockquote><span class="status-body"><span class="entry-content">To this list, folks suggested I add "0. stable application" and "5. sitemaps", both key suggestions, though I don't have much experience with sitemaps so I won't say more about those.<br /><br />What's my point?&nbsp; It's not rocket science to get our web resources discoverable on Google and the other major search engines.&nbsp; <br /><br />What's the value in that?&nbsp; More people are going to find library materials via a Google search than by navigating the dark alleys and dead ends of library websites.&nbsp; Yes, our silo boundaries have been useful to us to keep dissimilar materials apart for management and such, but no, they are totally useless to our users.&nbsp; My former colleague <a href="http://inkdroid.org/ehs">Ed Summers</a> reminded me today that a silo is not really a silo if it's on the web.&nbsp; Merely being on the web isn't enough, though, and here are the simple and practical lessons I've learned that may be the difference between getting found on Google and "NO JUICE FOR YOU!"<br /><br /></span></span><ol><li><span class="status-body"><span class="entry-content"><b>Stable application</b>: If your site isn't reliably up, user-agents will have a hard time finding it.&nbsp; That means disgruntled users and crawlers who never fully find you.<br /></span></span></li><li><span class="status-body"><span class="entry-content"><b>Allow web crawlers</b>: Unless you have a really compelling (read: legal) reason to disallow crawlers (and robots and spiders, oh my), you really ought to allow them.&nbsp; But only if you care about discoverability.&nbsp; If your app cannot handle the load of crawlers, go back to #1 and start over.&nbsp; Hire an engineer who knows about scale and performance, preferably.&nbsp; (See anecdote 1 later in this post.)<br /></span></span></li><li><span class="status-body"><span class="entry-content"><b>Clean URLs</b>: I'm not sure this is entirely necessary for SEO, to be honest, but it does seem like a common practice among those who are good web citizens. <br /></span></span></li><li><span class="status-body"><span class="entry-content"><b>Rich item metadata</b>: Collection-level metadata is not good enough.&nbsp; Collections are a useful abstraction for librarians but less so for users.&nbsp; Rather than impose a collection view upon users, move relationships among items and common metadata elements into item pages.&nbsp; (See anecdote 2 later in this post.)<br /></span></span></li><li><span class="status-body"><span class="entry-content"><b>Links</b>: Link out to stuff.&nbsp; Get folks to link in to your stuff.&nbsp; (See anecdote 3 later in this post.) <br /></span></span></li></ol><span class="status-body"><span class="entry-content"><br /></span></span><span class="status-body"><span class="entry-content"><b>Anecdote 1</b>: The Library of Congress has a digital newspaper application called <a href="http://chroniclingamerica.loc.gov/">Chronicling America</a>.&nbsp; At the time it was created, it served as a test bed for some technologies that had not seen wide uptake at the Library, but in time its developers realized the architecture couldn't keep up with the traffic coming in from the web crawlers.&nbsp; A robots.txt file was created restricting crawlers and time went by.&nbsp; The application was rebuilt from the ground up with the intent "to increase the usability of [the] application by providing faster responses to HTTP requests, allowing these requests via standardized APIs, as well as allowing all pages to be crawled by search engines."</span></span>&nbsp; The results were remarkable: average hits per day grew from roughly 75,000 to nearly 500,000.&nbsp; <br /><span class="status-body"><span class="entry-content"><br /><b>Anecdote 2</b>: When the Library of Congress went live with the <a href="http://www.wdl.org/en/">World Digital Library</a>, clearly helped by a massive press event at UNESCO in Paris (the largest such event in UNESCO history, apparently), its developers watched the mentions roll in via <a href="http://search.twitter.com/search?q=wdl">Twitter Search</a>.&nbsp; The most interesting thing I learned that day is despite all the cool maps and timelines and facets, users were primarily linking directly to item pages (each of which was helped by surfacing all of the rich descriptive metadata as well as links to related and similar items).<br /><br /><b>Anecdote 3</b>: The digital 
initiatives team at the University of Washington libraries has done some
 <a href="http://www.dlib.org/dlib/may07/lally/05lally.html">studies</a>
 assessing the impact of adding links to their digital collections from 
Wikipedia pages.</span></span>&nbsp; Usage spiked after the links were put in place, thanks to Wikipedia's popularity and the mechanics of Google's PageRank algorithm for judging relevance.<br /><br />These are practical steps we can take, and frankly may be the best marketing (judging cost v. impact) libraries can do to increase usage of our digital materials. <br />]]></description>
            <link>http://www.personal.psu.edu/mjg36/blogs/2010/02/web-discovery-and-library-resources-or-seo-101.html</link>
            <guid>http://www.personal.psu.edu/mjg36/blogs/2010/02/web-discovery-and-library-resources-or-seo-101.html</guid>
            
            
              
                <category domain="http://www.sixapart.com/ns/types#tag">discovery</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">http</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">SEO</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">web architecture</category>
              
            
            <pubDate>Fri, 19 Feb 2010 15:08:15 -0500</pubDate>
	    
	    
        </item>
        
        <item>
            <title>What&apos;s in a title?</title>
            <description><![CDATA[Hamlet.&nbsp; The Declaration of Independence.&nbsp; Gutenberg Bible.&nbsp; Holy Roman Emperor.&nbsp; All of these are words, but more than being words they are titles.&nbsp; Titles are names we give things, things such as works of art and stations we hold.&nbsp; <br /><br />Names are important to the extent that being able to talk about things is important.&nbsp; Names identify things.&nbsp; The way we indicate things when we talk about them, typically, is names or substitutes therefor, e.g., <a href="http://en.wikipedia.org/wiki/Deixis" bitly="BITLY_PROCESSED">deixis</a>.&nbsp; <br /><br /><i>Ergo</i>, titles are important.&nbsp; That's been my line of thinking, at least, as I continue to struggle -- here meant in a light sense, such as "I am struggling with eating this delicious cookie" -- with what precisely it means to have a title with the word "architect" in it.&nbsp; <br /><br />I say that with a twang of cognitive dissonance, for I know and understand very well <a href="http://www.personal.psu.edu/mjg36/blogs/2010/01/my-agenda-for-q1-2010.html" bitly="BITLY_PROCESSED">what I do on a daily basis and what I will be doing in the near future</a>.&nbsp; Perhaps then titles are not very important, or I should say, more important than titles is knowing what is expected of you and exceeding those expectations.&nbsp; Shape your title rather than allowing it to shape you.<br /><br /><i>Ergo</i>, maybe titles aren't equally important in all contexts.<br /><br />Self-help tropes aside, I still wonder about what folks' expectations of a digital library architect are.&nbsp; There is a line of thinking in libraries that our problems are unique rather than of a class.&nbsp; Some argue fiercely that library issues are, in fact, <a href="http://www.libraryjournal.com/blog/1090000309/post/800010880.html" bitly="BITLY_PROCESSED">not special</a>.&nbsp; <br /><br />I'm undecided.&nbsp; For instance, would a digital library architect have any concerns or areas of expertise an IT/enterprise architect would not?&nbsp; Or does digital library architecture amount to little more than a re-brand reflecting the "we're special" way of thinking?<br /><br />Related, I would wager that the number of titular digital library architects is much smaller than the number of folks doing architecture work in digital libraries.&nbsp; Digital repository librarians and library systems analysts, etc., I'm looking at you.<br /><br />Why am I thinking about this?&nbsp; In my last two jobs as a software developer in academic and research libraries I was spoiled by being in a large and vibrant community of similar folks: <a href="http://code4lib.org/" bitly="BITLY_PROCESSED">code4lib</a> and also the <a href="http://vre2.upei.ca/access2009/" bitly="BITLY_PROCESSED">Access</a> folks up north.&nbsp; I'm looking for the same in my current job: some forum, conference, mailing list, or what have you, where there is discussion of architectural issues in the digital libraries context.<br /><br />For now, I have contented myself that a digital library architect is a technologist who thinks architecturally about digital libraries.&nbsp; What does that mean?&nbsp; Someone who, to mix metaphors mightily, puts his or her arms around the big picture (rather like an art thief).&nbsp; <br /><br />What does that mean?&nbsp; Someone who knows all of the systems and standards and protocols and workflows and operations and the connections between them, in an institution, in the context (typically) of serving digital content over the web (though this context is expanding into other areas such as institutional e-records management and research data curation).&nbsp; Said someone will probably have been hired in fact not merely to know all of that mess but to think systematically and strategically about whether all of that mess meets needs and requirements and best practices, and not only think about that but work deliberately to make that so.<br /><br />Is there a community for such folks?&nbsp; There are many possible related communities (and here I'm intentionally casting a broad net by mingling conferences, lists, and professional organizations): code4lib, Access, <a href="http://or2010.fecyt.es/publico/Home/index.aspx" bitly="BITLY_PROCESSED">Open Repositories</a>, <a href="http://groups.google.com/group/digital-curation" bitly="BITLY_PROCESSED">digital-curation</a>, <a href="http://www.itana.org/" bitly="BITLY_PROCESSED">ITANA</a>, <a href="http://net.educause.edu/e10/" bitly="BITLY_PROCESSED">EDUCAUSE</a>, <a href="http://www.asis.org/Conferences/IA10/ResearchDataAccessSummit2010.html" bitly="BITLY_PROCESSED">ASIS&amp;T</a>, <a href="http://www.ils.unc.edu/digccurr/institute.html" bitly="BITLY_PROCESSED">DigCCurr</a>, <a href="http://www.ifs.tuwien.ac.at/dp/ipres2010/" bitly="BITLY_PROCESSED">iPres</a>, <a href="http://www.arl.org/sparc/media/09-0223.shtml" bitly="BITLY_PROCESSED">SPARC</a>, <a href="http://www.cni.org/tfms/" bitly="BITLY_PROCESSED">CNI</a>, <a href="http://www.diglib.org/" bitly="BITLY_PROCESSED">DLF</a>/<a href="http://www.clir.org/" bitly="BITLY_PROCESSED">CLIR</a>, and so on.&nbsp; Heck if I know.&nbsp; Do you?<br /><br />]]></description>
            <link>http://www.personal.psu.edu/mjg36/blogs/2010/01/whats-in-a-title.html</link>
            <guid>http://www.personal.psu.edu/mjg36/blogs/2010/01/whats-in-a-title.html</guid>
            
            
              
                <category domain="http://www.sixapart.com/ns/types#tag">digital library architecture</category>
              
            
            <pubDate>Wed, 20 Jan 2010 08:31:32 -0500</pubDate>
	    
	    
        </item>
        
        <item>
            <title>My agenda for Q1 2010</title>
            <description><![CDATA[My agenda for the first few months of 2010 is becoming clearer.  Consider this a snapshot of the things that will be crossing my desk and bouncing around my mind.<br /><br /><ul><li><a href="http://www.personal.psu.edu/mjg36/blogs/2010/01/e-content-stewardship-program-kick-off.html">Reviewing digital library platforms</a> for the e-Content Stewardship Council</li><li>Reviewing functional requirements for an institution-wide repository of electronic records</li><li>Learning more about "<a href="http://www.aspeninstitute.org/publications/promise-peril-big-data">big data</a>" and continuing the <a href="http://www.personal.psu.edu/mjg36/blogs/2010/01/data-management-discussion.html">data management discussion<br /></a></li><li>Evaluating the <a href="http://dlt.its.psu.edu/">DLT</a> <a href="https://lib.stanford.edu/files/PSU_Archival_Storage_Prototype_Final_Report_External%20v5.pdf">archival storage prototype</a> and joining the technical team of the Data Storage Working Group</li><li>Evaluating next-generation information discovery tools for the libraries with <a href="http://www.libraries.psu.edu/psul/itech.html">I-Tech</a></li><li>Evaluating change management solutions with a team from Penn State's <a href="http://www.personal.psu.edu/kxm/blogs/LibertyRoad/2007/08/itana-itsitana-1.html">ITANA</a> group</li><li>Working on requirements for a draft institutional identifier standard with the <a href="http://www.niso.org/workrooms/i2">NISO I2 </a>working group</li><li>Attending <a href="http://code4lib.org/conference/2010">Code4Lib 2010</a><br /></li><li>Meetings, meetings, <a href="http://www.flickr.com/photos/sara013/4269735919/">meetings</a><br /></li><li>Continuing to absorb as many of the following as possible: strategic plans, project portfolios, process management documents, and various and sundry reports, wikis, and blogs<br /></li></ul>My plate is rather full; fortunately, for now, I've got a big appetite.<br /><br />]]></description>
            <link>http://www.personal.psu.edu/mjg36/blogs/2010/01/my-agenda-for-q1-2010.html</link>
            <guid>http://www.personal.psu.edu/mjg36/blogs/2010/01/my-agenda-for-q1-2010.html</guid>
            
            
              
                <category domain="http://www.sixapart.com/ns/types#tag">archival storage</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">big data</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">code4lib</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">content stewardship</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">data management</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">data storage working group</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">electronic records</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">ITANA</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">meetings</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">next-gen catalogs</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">NISO I2</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">transparency</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">update</category>
              
            
            <pubDate>Wed, 13 Jan 2010 09:43:50 -0500</pubDate>
	    
	    
        </item>
        
        <item>
            <title>e-Content Stewardship program kick-off</title>
            <description><![CDATA[One of my primary foci is a new program jointly undertaken by the libraries and ITS, known as e-Content Stewardship.&nbsp; (For more background information, Mairead Martin <a href="http://www.personal.psu.edu/mum28/blogs/Mairead/2009/04/e-content-stewardship-program.html">set the scene</a>.)&nbsp; The program is largely the brain-child of Mairead (Sr. Director of Digital Library Technologies), <a href="http://www.libraries.psu.edu/psul/admin/adsc.html">Mike Furlough</a> (Assistant Dean for Scholarly Communications), and <a href="http://www.libraries.psu.edu/psul/admin/adtcs.html">Lisa German</a> (Assistant Dean for Technical and Collections Services), but has doubtless been shaped by many others in the libraries and scholarly communication circles.<br /><br />Yesterday the three of those folks got together with myself and Patricia Hswe, Digital Collections Curator, to orient us -- it's Patricia's and my first week at Penn State -- and also to set some direction for how the two of us might carve out an initial project.&nbsp; <br /><br />Our first project is quite a clever idea (for which I take no credit, since it came from Mairead, Lisa, and the other Mike): we will review the digital library platforms currently being used by the libraries.&nbsp; In so doing we will better orient ourselves for later efforts, so it's beneficial to us, but it's also beneficial to the libraries and ITS to have new and fresh eyes looking at how folks are using our systems.&nbsp; I'm particularly interested in what APIs the platforms support and how we might get them interoperating, in addition to how the products themselves are evolving -- is the software moribund or under active development?&nbsp; What's the support like?&nbsp; What's the user community like? <br /><br />Interop is but one piece of the puzzle.&nbsp; There's also structure of collections, relationships between items, organization of information, and so forth.&nbsp; We will also want to talk with users of the systems to suss out usage patterns, features liked and hated, quirks, and so forth.&nbsp; Administrators and developers, too, of course. <br /><br />I'm looking forward to making some headway on this review.&nbsp; I have some suspicions about what we may conclude, but I'm keeping those to myself and trying to remain as objective as I can so as to give the platforms a fair shake.<br />]]></description>
            <link>http://www.personal.psu.edu/mjg36/blogs/2010/01/e-content-stewardship-program-kick-off.html</link>
            <guid>http://www.personal.psu.edu/mjg36/blogs/2010/01/e-content-stewardship-program-kick-off.html</guid>
            
            
              
                <category domain="http://www.sixapart.com/ns/types#tag">content stewardship</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">digital libraries</category>
              
            
            <pubDate>Fri, 08 Jan 2010 13:46:51 -0500</pubDate>
	    
	    
        </item>
        
        <item>
            <title>Data management discussion</title>
            <description><![CDATA[My first week on campus is cruising by.&nbsp; On Monday I sat in on a meeting called by our Chief Information Officer (and <a href="http://www.personal.psu.edu/mum28/blogs/Mairead/">my boss</a>'s boss), <a href="http://www.personal.psu.edu/kxm/blogs/LibertyRoad/">Kevin Morooney</a>, to discuss what data management means to folks in Penn State's central information technology group, <a href="http://its.psu.edu/">Information Technology Services</a>.&nbsp; <br /><br />Attendees came from all across the ITS organization: Administrative
Information Services, Security Operations and Services, Consulting and
Support Services, Teaching and Learning with Technology, Marketing and
Communications, Research Computing and Cyberinfrastructure, Identity
and Access Management, and Digital Library Technologies (my department).&nbsp; <br /><br />The meeting was chiefly for exchanging information, for reconstituting the discussion about what we talk about when we talk about data management.&nbsp; Some of the preceding high-level work done in this area has been around a business intelligence initiative -- though I'm not sure what this means, exactly -- and development of a University data classification scheme.&nbsp; <br /><br />I wasn't terribly surprised to learn how different the perspectives
around the table were, but there were also some common themes such as
security and identity management.<br /><br />We've all got data, and so data management is done by just about everyone everywhere.&nbsp; It gets very tricky, naturally, when you start talking about data management planning across an institution as large and diverse as Penn State.&nbsp; Kevin asked for folks to mention examples of institutions of higher learning that have tackled data management at the institutional level.&nbsp; Most of the examples given were private, resource-rich schools -- no shock there, perhaps.&nbsp; <br /><br />I've been somewhat disconnected from academia for a few years now, so I was hesitant to mention my perhaps outdated examples.&nbsp; I've had a chance to poke around and verify what I suspected, in the meantime.<br /><br /><ul><li>Indiana University's Information Policy Office has published a <a href="http://informationpolicy.iu.edu/data/">Data Management website</a> listing policies, guidelines, a classification scheme and dictionary, data managers, and the membership of the Committee of Data Stewards, "a group... responsible for establishing policies, procedures, and
guidelines for management of institutional data across Indiana
University."<br /><br /></li><li>The University of Washington has also been active in this area.&nbsp; <a href="https://www.washington.edu/uwtech/">UW Technology</a> and the <a href="http://escience.washington.edu/">eScience Institute</a> published a report, <a href="http://www.washington.edu/lst/research_development/papers/2009/Conversations_UW_Research-Leaders.pdf">Conversations with University of Washington Research Leaders</a>, on "a large-scale effort to assess the information technology needs of the Univeristy of Washington's top researchers. ... [T]he goals of the project were (1) to understand how UW researchers currently use technology and anticipate using technology in the future to support their research activities, and (2) to identify the resources and services they need to maintain and build upon their remarkable record of success.&nbsp; To accomplish these goals [UW Technology and the eScience Institute] interviewed 127 researchers."&nbsp; The first recommendation in the report, which contains an entire section on data management, is that the University should provide a new data management paradigm.</li></ul>Many big questions remain.&nbsp; What does "data management" even mean?&nbsp; Who are the stakeholders and what are their expectations?&nbsp; How would data management responsibilities be divvied up?&nbsp; In which directions should outreach be concentrated?&nbsp; What data is even out there?<br /><br />Among the big and challenging topics, thoroughly intertwingled, are: security, privacy, and access control; scalability and performance; provenance and auditing; metadata and discoverability; persistence; access vectors; buy-in; the notion of "one-sized-fits-allness;" trust (five huge gigantic scary letters); incorporation w/ existing workflows; and probably eleventy zillion others.&nbsp; <br /><br />All of which I solved in a dream last night -- and then forgot.&nbsp; It probably involved a cloud (or SOA, right?).<br /><br />The meeting adjourned shortly after determining that there were no immediate follow-ons or action items, except to keep thinking about data management and looking for good reasons to reconvene the group.&nbsp; (Sidebar: I was involved with an effort within the Libraries at the University of Washington to develop and conduct an institutional data census, a difficult and involved process aimed at answering at least one question -- what data is out there? -- and so I'd be stoked to see a similar effort at Penn State.)&nbsp; This <i>ad hoc</i> group will probably meet once or twice a year and I look forward to watching things develop in this space.&nbsp; I learned a bunch from the meeting, and that's extremely valuable while I put my work here into context.&nbsp; <br /><br />When isn't learning valuable?<br />]]></description>
            <link>http://www.personal.psu.edu/mjg36/blogs/2010/01/data-management-discussion.html</link>
            <guid>http://www.personal.psu.edu/mjg36/blogs/2010/01/data-management-discussion.html</guid>
            
            
              
                <category domain="http://www.sixapart.com/ns/types#tag">data management</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">ITS</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">meetings</category>
              
            
            <pubDate>Wed, 06 Jan 2010 15:52:09 -0500</pubDate>
	    
	    
        </item>
        
        <item>
            <title>Post-fork</title>
            <description><![CDATA[One last, short post for the night. &nbsp;I have maintained another blog for the past three or four years that has some content relevant to the topic of this blog,&nbsp;<a href="http://lackoftalent.org/michael/blog/" bitly="BITLY_PROCESSED">τεχνοσοφια</a>. &nbsp; Some of the categories over there may be of interest.]]></description>
            <link>http://www.personal.psu.edu/mjg36/blogs/2009/12/post-fork.html</link>
            <guid>http://www.personal.psu.edu/mjg36/blogs/2009/12/post-fork.html</guid>
            
            
              
                <category domain="http://www.sixapart.com/ns/types#tag">administrivia</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">blogs</category>
              
            
            <pubDate>Tue, 22 Dec 2009 01:59:48 -0500</pubDate>
	    
	    
        </item>
        
        <item>
            <title>Micro-musings after a night of oenophilia</title>
            <description><![CDATA[<div>Here's a list of questions I hope to grapple with. &nbsp;And, yes, I am fixating on terms a bit here. &nbsp;I blame it on the philosophy courses.</div><div><br /></div><ul><li>What is a digital library architecture? &nbsp;</li><li>When we talk about digital library architectures do we normally tend to talk about systems?</li><li>Is it wise to have a system-centric view of digital library architectures -- i.e., "just install Fedora/DSpace/Drupal/aDORe!" -- or should we think instead in terms of APIs, standards, and access requirements? &nbsp;Maybe this is a false dichotomy.</li><li>Do digital library architectures need to be so esoteric, or may they reduce to garden-variety information architectures? &nbsp;Formulated otherwise: are our problems really that special?</li><li>How do repositories and digital library architectures intersect?</li><li>Would a UNIX filesystem w/ certain naming and directory conventions suffice as a digital library architecture? &nbsp;Formulated otherwise: would the California Digital Library's curation microservices suffice?</li><li>How do HTTP, web architecture, linked data, RDF-based ontologies, and REST help us with digital library architectures?</li><li>How might messaging architectures such as AMQP, XMPP, and OpenSRF fit into the digital library problem space?</li><li>Am I overthinking this/fixating too much on the phrase "digital library architecture?"</li></ul><div>I've been trying to track down relevant literature on digital library architecture and have found a modest number of articles. &nbsp;Any suggestions would be much appreciated.</div><div><br /></div><div>Happy holidays, folks.</div>]]></description>
            <link>http://www.personal.psu.edu/mjg36/blogs/2009/12/micro-musings-after-a-night-of-oenophilia.html</link>
            <guid>http://www.personal.psu.edu/mjg36/blogs/2009/12/micro-musings-after-a-night-of-oenophilia.html</guid>
            
            
              
                <category domain="http://www.sixapart.com/ns/types#tag">amqp</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">digital library architecture</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">http</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">linked data</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">microservices</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">rdf</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">repositories</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">rest</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">web architecture</category>
              
                <category domain="http://www.sixapart.com/ns/types#tag">xmpp</category>
              
            
            <pubDate>Mon, 21 Dec 2009 23:16:58 -0500</pubDate>
	    
	    
        </item>
        
    </channel>
</rss>
