What is it and what's it good for?

I've been meaning to do an update on the e-Content Stewardship Program for awhile now. In our joint strategic plan for 2008/9 - 2013, ITS and the University Libraries agreed to work together to create a new program to support existing and emerging content management needs. In our respective strategic plans, we referred to this by the rather unwieldy title of the "Cyberinfrastructure, e-Content and Data Stewardship Program," which we've now abbreviated to the "e-Content Stewardship Program." In the last few months, we've been working on getting this program up on its feet.

We are using the word "program" rather than "initiative" or "project" since our intent is to pursue a formal, collaborative, and systematic approach that will enable us to develop services and infrastructure that can be extended to and reused in a variety of contexts. These contexts include scholarly communications, electronic records, electronic theses and dissertations, publishing services, and research data management.

We are using the word "stewardship" in the program's title rather than "management" because our focus is on supporting content across its whole lifecycle. Implicit in the word "stewardship" is an expectation on the part of the user that we can be trusted to care for their content; durability and sustainability are thus fundamental to our approach.

If this seems like an ambitious undertaking, then you've got the picture. However, if you're aware of the resources, time, and energy it takes to bring up a single application, then you'll likely appreciate the rationale behind investing in reusable service and application components. If you've experienced the frustration of searching across siloed content management systems, enough said. If you're responsible for archiving data for an established duration of time for compliance reasons, then an approach that is founded on sustainability should be attractive. If you've listened to our constituents talk about their content, you've likely recognized a common expectation that the content that we've enabled them to create and publish will be persistent, and that we'll take care of that. (Indeed, our stewardship role may end up being what distinguishes us from the Googles and Amazons of the world. This is a traditional role for libraries, needless to say.)

In the blog postings to follow, I'll write about where things stand with planning. I'll also describe a couple of early "anchor" projects and our approach to them to hopefully illustrate the approach we're taking. 

Data Storage Working Group

| | Comments (0) | TrackBacks (0)
Neal Vines, Director of IT in the College of Agricultural Sciences, and I are co-chairing a new working group to explore common requirements and common solutions for data storage across Penn State. The group is composed of members from the College IT Directors and Campus IT Directors groups as well as the chair of the ITS ITANA storage working group (ITANA is the ITS architectural collaborative). We are developing three use cases to begin with: archival storage, storage of sensitive data, and e-science/research data storage, and extracting requirements from these. We will then work with ITANA storage to consider how we might address some of our common issues and look at possible central storage service offerings. 

Report on Digital Preservation and Access

| | Comments (0) | TrackBacks (0)
A report from an NSF and Mellon-funded panel on Digital Preservation and Access has just been released and is interesting reading - Sustaining the Digital Investment: Issues and Challenges of Economically Sustainable Digital Preservation. This is the first report from the panel, focusing on the state of play, and conclusions/recommendations will appear in a future report. This is an area that IT and Libraries will need to work together intensively on going forward.

Update on Strategic Planning

| | Comments (0) | TrackBacks (0)

ITS and the University Libraries have now completed their respective strategic plans. A new development in this strategic planning cycle was the decision by both organizations to plan together and be emphatic in our plans about our joint strategic directions. Of course there are many ways we already collaborate, and our future work together naturally builds on these, but our plans target two key areas that we will invest in together over the next five years.


The first of these is broadly termed as "Joint Service Delivery" and will focus largely on the implementation of the Knowledge Commons facility in Pattee/Paterno Library. The Knowledge Commons will be an open collaborative space where students - undergraduates initially - will have access to reference, IT and academic consultation services as well as state-of-the-art computing resources and the UL's rich collection of online and print resources. The Knowledge Commons concept isn't location-bound, however, and elements, such as collaborative workspaces and Digital Commons multimedia facilities, are already in development at various library locations at UP and the campuses. 


The second area we're targeting in the next five years is the creation of a Cyberinfrastructure, e-Content and Data Stewardship Program. Here's how this program is described in our strategic plans: 


Complementing ITS’s existing high performance computing and networking infrastructure and the University Libraries’ developing scholarly communications program, we will partner to develop a Cyberinfrastructure, e-Content, and Data Stewardship program. E-science or e-research is typically defined as collaborative, distributed, large-scale and data-intensive. ITS and the UL will develop sustainable strategies for the stewardship of the outputs of e-science over its lifecycle – providing a cohesive suite of access, discovery, preservation, curation, repository, archival and storage services. Our phased approach will initially entail needs assessments and prototyping of beta services while building out infrastructure that can be extended to other areas of digital content management.


Both of these programs are described in our respective strategic plans;  if you go to the ITS Strategic Planning wiki, you'll see them in the Appendix section. 


These programs are the result of six months of discussion, planning and input at various levels of ITS and the University Libraries. In February of this year, three open forums were held to foster discussion and gather input; the results of these discussions are also available on the ITS Strategic Planning wiki under the Planning Framework section. What you see in the two  programs is a result of the discussion at the forums; what was really reinforced in those sessions was how well the expertise and strengths of our organizations complement each other as well as the wisdom of our working together rather than duplicating effort. 


We have a lot of details to work out now and a lot of organization to do. What you see in the descriptions of both programs were put together for resource planning purposes and they aren't set in stone by any means (don't get too attached). More to follow on next steps.

 This week, several of us are at the Sun PASIG SIG in San Francisco: Mark Saussure, Lynn Garrison and Ben Grissinger from DLT and Sue Kellerman, Head of Digitization and Preservation in the University Libraries. The agenda is here.
The CIC Librarians and e-Science conference was held Monday and Tuesday of this week at Purdue University and here's a quick report back. 

The conference was attended by science librarians (majority), data librarians, library administrators, IT (central and library), and archivists and from what I could tell, one HPC/visualization expert - George Otto from Penn State.

There were essentially three types of presentations at the conference: overviews of exemplar projects by scientists/researchers; examples of how libraries are currently engaging in supporting such projects; and descriptions of new course offerings at a couple of Library Science schools to create data curators or data librarians. 

In the first category projects discussed included the Large Synoptic Survey Telescope, Blue Waters computing project at NCSA, the Virtual Observatory projects at the US National Observatory, and the Large Hadron Collider at CERN in Geneva. While the data generated in these projects is typically on an unprecedented scale, there was also discussion of small science needs and how they differ from big science; small science typically not using supercomputing capabilities (at least not yet), often informal and manual data collection and typically not funded by major federal funding agencies. However, small science data needs typically share a common characteristic with big science; the need to capture and retain data over extended time periods.

The Large Hadron Collider already has a well structured way of managing and distributing its data; what to keep locally, and what to distribute to regional and national nodes.

E-Science, as discussed at this conference, is characterized as collaborative (single or interdisciplinary), distributed (regionally, nationally, globally), using high performance supercomputing capabilities and generating large amounts of data. The question posed by Thom Dunning of NCSA was not just how science and engineering can benefit from the engagement of libraries in e-science, but how libraries might benefit from the data generated by petascale computing and its derivatives. While the need for curation of this data is being addressed to a small degree right now, the preservation, storage and archiving of the data is not. However, the NSF DataNet program is attempting to address that gap. (It was openly disclosed that several collaborations of CIC schools submitted proposals to this program but didn't make the final cut.)

Examples of how libraries are engaging in e-science/e-research included activities that support/foster traditional research such as VIVO from the Life Sciences library at Cornell, a database of researchers, their research interests, organizations, publications, etc. Purdue has created a Distributed Data Curation Center
to "investigate and pursue innovative solutions for curation issues of organizing, facilitating access to, archiving for and preserving research data and data sets in complex environments." We heard a very interesting case study of how the Agricultural Sciences librarian at Purdue worked with an agronomist there to support her data curation needs (this was clearly an example of "small science".) The University of Minnesota has created a Research Cyberinfrastructure Alliance which includes their library as well as supercomputing institute, and several academic departments: here's a project update at the recent CNI meeting on this effort. 

Neil Rambo of the Association of Research Libraries characterized the areas for library engagement in e-science as data curation, new forms of publications, virtual organizations and policy development and said that this will likely require organizational restructuring of libraries, as well as requiring a more balanced "risk averse vs. risk capable" approach.

Other common themes across the presentations: 

  • Need for continuing education and new course offerings in Library Science and I-Schools; 
  • The value of collaboration in the "grey area" between Libraries and central IT; 
  • The value of domain experts (aka subject specialists); 
  • Engagement further upstream in scholarly communications and publication lifecycle (at creation or authoring rather than focusing on the publication stage for the most part);
  • Scientists/researchers often don't know or understand what librarians do;
  • Look for ways to "embed" the data librarian or librarian in the research activity (this is likely to be more successful if that librarian is a domain expert);
  • The skills need to manage e-science data are not new to Libraries but the concept of libraries integrating "raw data" sets into its holdings is;
  • The Semantic Web offers hope for the management of data and its integration with other resources;
  • What will it mean for science if data goes away? How do we decide what data to keep?
  • Scientists don't always want to share data; various reasons for this including research competitiveness, and data that's too raw and not easily interpreted or open to misinterpretation.
In break out sessions at the end of the conference, we discussed what role the CIC might play in fostering collaborations between research libraries and the e-science community. One question raised was to consider what it would take for CIC schools to be competitive in the NSF DataNet funding program. Also, how about a conference for scientists, librarians and IT? 

Cliff Lynch at the Coalition for Networked Information (CNI) sent this announcement out on the CNI list this morning. Looks like a very interesting report.

Interim Report  
Assessing the Future Landscape of Scholarly Communication:  An In-depth Study of Faculty Needs and Ways of Meeting Them

Principal Investigator Diane Harley, Ph.D., Senior Researcher
Research Associates: Sarah Earl-Novell, Ph.D., Sophia Krzys Acord, Shannon Lawrence, Principal Investigator C. Judson King, Professor, Provost Emeritus and Director

The Center for Studies in Higher Education, with generous funding from the Andrew W. Mellon Foundation, is conducting research to understand the needs and desires of faculty for in-progress scholarly communication (i.e., forms of communication employed as research is being executed) as well as archival publication. In the interest of developing a deeper understanding of how and why scholars do what they do to advance their fields, as well as their careers, our approach focuses on fine-grained analyses of faculty values and behaviors throughout the scholarly communication lifecycle, including sharing, collaborating, publishing, and engaging with the public. Well into our second year, we have posted a draft interim report describing some of our early results and impressions based on the responses of more than 150 interviewees in the fields of astrophysics, archaeology, biology, economics, history, music, and political science.
Our work to date has confirmed the important impact of disciplinary culture and tradition on many scholarly communication habits. These traditions may override the perceived “opportunities” afforded by new technologies, including those falling into the Web 2.0 category. As we have listened to our diverse informants, as well as followed closely the prognostications about the likely future of scholarly communication, we note that it is absolutely imperative to be precise about terms. That includes being clear about what is meant by “open access” publishing (i.e., using preprint or postprint servers for work published in prestigious outlets, versus publishing in new, untested open access journals, or the more casual individual posting of working papers, blogs, and other non-peer-reviewed work). Our work suggests that enthusiasm for technology development and adoption should not be conflated with the hard reality of tenure and promotion requirements (including the needs and goals of final archival publication) in highly competitive professional environments. 

For more information about the research project see the Future of Scholarly Communication website: http://cshe.berkeley.edu/research/scholarlycommunication/

CIC Librarians & e-Science Conference

| | Comments (0) | TrackBacks (0)
The CIC is holding a two day conference at Purdue May 12 -13 on "Librarians & e-Science: Focusing towards 20/20". The agenda is available here. Lisa German, John Meier, Bonnie Osif, and Linda Musser are attending from University Libraries while George Otto and I are attending from ITS. Blog reporting will follow.

Thoughts from OR 2008

| | Comments (0) | TrackBacks (0)
Day Two of the International Open Repositories conference in Southampton, UK. Content has been excellent so far; and a quick report on what's topical this year either in sessions or reception/coffee discussions:

  • At last year's conference, a lot of emphasis on workflow. This year the themes are preservation and sustainability, and the integration of social networking with repositories as well as presentations on working scientific repositories.
  • Emphasis on the critical need to make the repository useful to scientists by easy ingest and integration from the initial authoring or data generation point. The repository has to be at researchers' desktop. The respository needs to be in the laboratory. Have to support all phases of research lifecycle.
  • Repositories to date have focused at the last stages of the research lifecyle.
  • Repository back-ends for blogging and other collaborative tools.
  • Tension between scientists/scholars not seeing the IR or data repository as useful, but at the same time a critical concern is the amount of data being lost.
  • A lot of tools refined now for ingest with a focus on easy ingest.
  • A couple of very interesting presentations on text/data mining; one demonstrated the value of getting knowledge that's currently locked in e-theses into the published domain  - obvious value of the Semantic Web standards.
  • The Australians rule the world! Helps that their federal government has invested very substantially in this and other HE infrastructure areas. Their national library has produced a service framework composed of 39 services with standards and guidelines for each area. As a result, 80% of Australian universities have a production repository.
  • Institutional data preservation policy required.
  • The University of Southampton has been an excellent host despite the challenge of the attendance being nearly double that of last year (about 485 people here). Fortunately the students are on vacation.

Open Repositories 2008

| | Comments (0) | TrackBacks (0)
Next week I'm going to be attending the Open Repositories 2008 conference at the University of Southampton in the UK. Mike Halm from ITS TLT is also attending. I attended this conference last year in San Antonio and it was excellent; it's largely focused around user group meetings for the Fedora, DSpace and EPrints initiatives. The program is now available here. My goal is to post a few blog entries while there. If you see anything on the program that you'd like to hear more about, let me know.