Modern life sciences research increasingly relies on information technology: data storage, custom algorithms, advanced search and query mechanisms, streamlining recurring tasks, presentation and visualization interfaces. I’m most interested in applying modern and sophisticated computational solutions to diverse biological problems. I have lead the development of several software projects, some with teams, others as sole developer. Here is a quote that best capture my philosophy:
![]()
One of the big insights in the last few years, through work by the internet search engines but also tools like Udi Manber’s glimpse, is that data with no meaningful structure can still be very powerful if the tools to help you search the data are good.
In fact, structure can be bad if the structure you have doesn’t fit the problem you’re trying to solve today, regardless of how well it fit the problem you were solving yesterday.
So I don’t much care any more how my data is stored; what matters is how to retrieve the relevant pieces when I need them [...]. Expect more liberation as searching replaces structure as the way to handle data. (by Rob Pike)
2008-2009: BooleanNet
A Boolean network simulation software for life sciences see: http://booleannet.googlecode.com
2007-2008: GeneTrack
A bioinformatics software package for storing, querying and visualizing interval oriented data
2006-2008: Project director
Bioinformatics project director of the Genome Cartography Project.
2006-2007: MiniDB
Lead developer of MiniDB, a data storage system for microarray research. A collaboration with Frank Pugh ( folded into the Genome Cartography Project).
2004-2006: Galaxy
Lead developer of Galaxy, a web-based data analysis framework (funded by NSF, served as Co-PI between 2004-2005). A collaboration with Anton Nekrutenko, James Taylor and Ross Hardison (2005).
2004-2005: LionDB
Lead developer of LionDB, a laboratory data management system in continuous operation since September, 2004 it serves the data exchange needs of the life science researchers at Penn State. A collaboration with Naomi Altman and Craig Praul (2004).
implementing AJAX with Django
Written in 2006 it still attracts about 1000 new and unique visitors per month (as of April 2008) thus making it potentially my most popular work ever (oh the irony).
Software projects that I have worked on in the past. The libraries listed below have been written some time ago and may not work on current computing platforms.
Lead developer of MovieLens.
A movie recommendation site maintained by the GroupLens research group at the University of Minnesota is used to test novel predictions algorithms and user interface elements. The site has over 30 thousand registered users and manages millions of ratings. I was lead developer, in charge of implementing the database and server infrastructure (2001-2003). The site was built with XML/XSLT and JavaServer (Apache Tomcat) technologies .
Approximate string matching library
A document fingerprinting module
64-bit Rabin codes based on a port of the Modula-3 fingerprinting module to C by Mark Mitchell,
Python wrapper for lowess fitting