Report from CIC Librarians and e-Science Conference

| | Comments (0) | TrackBacks (0)
The CIC Librarians and e-Science conference was held Monday and Tuesday of this week at Purdue University and here's a quick report back. 

The conference was attended by science librarians (majority), data librarians, library administrators, IT (central and library), and archivists and from what I could tell, one HPC/visualization expert - George Otto from Penn State.

There were essentially three types of presentations at the conference: overviews of exemplar projects by scientists/researchers; examples of how libraries are currently engaging in supporting such projects; and descriptions of new course offerings at a couple of Library Science schools to create data curators or data librarians. 

In the first category projects discussed included the Large Synoptic Survey Telescope, Blue Waters computing project at NCSA, the Virtual Observatory projects at the US National Observatory, and the Large Hadron Collider at CERN in Geneva. While the data generated in these projects is typically on an unprecedented scale, there was also discussion of small science needs and how they differ from big science; small science typically not using supercomputing capabilities (at least not yet), often informal and manual data collection and typically not funded by major federal funding agencies. However, small science data needs typically share a common characteristic with big science; the need to capture and retain data over extended time periods.

The Large Hadron Collider already has a well structured way of managing and distributing its data; what to keep locally, and what to distribute to regional and national nodes.

E-Science, as discussed at this conference, is characterized as collaborative (single or interdisciplinary), distributed (regionally, nationally, globally), using high performance supercomputing capabilities and generating large amounts of data. The question posed by Thom Dunning of NCSA was not just how science and engineering can benefit from the engagement of libraries in e-science, but how libraries might benefit from the data generated by petascale computing and its derivatives. While the need for curation of this data is being addressed to a small degree right now, the preservation, storage and archiving of the data is not. However, the NSF DataNet program is attempting to address that gap. (It was openly disclosed that several collaborations of CIC schools submitted proposals to this program but didn't make the final cut.)

Examples of how libraries are engaging in e-science/e-research included activities that support/foster traditional research such as VIVO from the Life Sciences library at Cornell, a database of researchers, their research interests, organizations, publications, etc. Purdue has created a Distributed Data Curation Center
to "investigate and pursue innovative solutions for curation issues of organizing, facilitating access to, archiving for and preserving research data and data sets in complex environments." We heard a very interesting case study of how the Agricultural Sciences librarian at Purdue worked with an agronomist there to support her data curation needs (this was clearly an example of "small science".) The University of Minnesota has created a Research Cyberinfrastructure Alliance which includes their library as well as supercomputing institute, and several academic departments: here's a project update at the recent CNI meeting on this effort. 

Neil Rambo of the Association of Research Libraries characterized the areas for library engagement in e-science as data curation, new forms of publications, virtual organizations and policy development and said that this will likely require organizational restructuring of libraries, as well as requiring a more balanced "risk averse vs. risk capable" approach.

Other common themes across the presentations: 

  • Need for continuing education and new course offerings in Library Science and I-Schools; 
  • The value of collaboration in the "grey area" between Libraries and central IT; 
  • The value of domain experts (aka subject specialists); 
  • Engagement further upstream in scholarly communications and publication lifecycle (at creation or authoring rather than focusing on the publication stage for the most part);
  • Scientists/researchers often don't know or understand what librarians do;
  • Look for ways to "embed" the data librarian or librarian in the research activity (this is likely to be more successful if that librarian is a domain expert);
  • The skills need to manage e-science data are not new to Libraries but the concept of libraries integrating "raw data" sets into its holdings is;
  • The Semantic Web offers hope for the management of data and its integration with other resources;
  • What will it mean for science if data goes away? How do we decide what data to keep?
  • Scientists don't always want to share data; various reasons for this including research competitiveness, and data that's too raw and not easily interpreted or open to misinterpretation.
In break out sessions at the end of the conference, we discussed what role the CIC might play in fostering collaborations between research libraries and the e-science community. One question raised was to consider what it would take for CIC schools to be competitive in the NSF DataNet funding program. Also, how about a conference for scientists, librarians and IT? 



0 TrackBacks

Listed below are links to blogs that reference this entry: Report from CIC Librarians and e-Science Conference.

TrackBack URL for this entry: https://blogs.psu.edu/mt4/mt-tb.cgi/9320

Leave a comment

About this Entry

This page contains a single entry by MAIREAD MARTIN published on May 15, 2008 10:30 AM.

Report on the future of Scholarly Communications was the previous entry in this blog.

Sun Preservation and Archiving Special Interest Group is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Powered by Movable Type 4.01