HomeHuck Institute of the Life SciencesPenn State

Table Of Contents

Previous topic

Publications

Next topic

Python versus Perl

This Page

Software projects

Modern life sciences research increasingly relies on information technology: data storage, custom algorithms, advanced search and query mechanisms, streamlining recurring tasks, presentation and visualization interfaces. I’m most interested in applying modern and sophisticated computational solutions to diverse biological problems. I have lead the development of several software projects, some with teams, others as sole developer. Here is a quote that best capture my philosophy:

_images/rotating-arrows.png

One of the big insights in the last few years, through work by the internet search engines but also tools like Udi Manber’s glimpse, is that data with no meaningful structure can still be very powerful if the tools to help you search the data are good.

In fact, structure can be bad if the structure you have doesn’t fit the problem you’re trying to solve today, regardless of how well it fit the problem you were solving yesterday.

So I don’t much care any more how my data is stored; what matters is how to retrieve the relevant pieces when I need them [...]. Expect more liberation as searching replaces structure as the way to handle data. (by Rob Pike)

Data analysis software

  • 2008-2009: BooleanNet

    A Boolean network simulation software for life sciences see: http://booleannet.googlecode.com

  • 2007-2008: GeneTrack

    A bioinformatics software package for storing, querying and visualizing interval oriented data

    See: http://genetrack.googlecode.com

  • 2006-2008: Project director

    Bioinformatics project director of the Genome Cartography Project.

    See: http://atlas.bx.psu.edu

  • 2006-2007: MiniDB

    Lead developer of MiniDB, a data storage system for microarray research. A collaboration with Frank Pugh ( folded into the Genome Cartography Project).

  • 2004-2006: Galaxy

    Lead developer of Galaxy, a web-based data analysis framework (funded by NSF, served as Co-PI between 2004-2005). A collaboration with Anton Nekrutenko, James Taylor and Ross Hardison (2005).

    See: http://galaxy.psu.edu

  • 2004-2005: LionDB

    Lead developer of LionDB, a laboratory data management system in continuous operation since September, 2004 it serves the data exchange needs of the life science researchers at Penn State. A collaboration with Naomi Altman and Craig Praul (2004).

    See http://liondb.atlas.bx.psu.edu

Guides and tutorials

  • implementing AJAX with Django

    Written in 2006 it still attracts about 1000 new and unique visitors per month (as of April 2008) thus making it potentially my most popular work ever (oh the irony).

Past work

Software projects that I have worked on in the past. The libraries listed below have been written some time ago and may not work on current computing platforms.

  • Lead developer of MovieLens.

    A movie recommendation site maintained by the GroupLens research group at the University of Minnesota is used to test novel predictions algorithms and user interface elements. The site has over 30 thousand registered users and manages millions of ratings. I was lead developer, in charge of implementing the database and server infrastructure (2001-2003). The site was built with XML/XSLT and JavaServer (Apache Tomcat) technologies .

  • Approximate string matching library

  • A document fingerprinting module

    64-bit Rabin codes based on a port of the Modula-3 fingerprinting module to C by Mark Mitchell,

  • Python wrapper for lowess fitting

  • Index

  • Search Page