Penn State Official Sheild

 

Prasenjit Mitra,
Assistant Professor,

 

Office: 313F IST Building,
The Pennsylvania State University,
University Park, PA 16802.
Office Phone: +1 (814) 865-4454

Fax : +1 (814) 865-6426
Email: pmitra AT ist.psu.edu

 

 

Students Publications Projects

 

Affiliations:

 

Intelligent Information Systems Laboratory
Cyber Security Laboratory
North-East Visualization and Analytics Center

Institute for CyberScience

College of Information Sciences and Technology,

Department of Computer Science and Engineering (Graduate Faculty),

Department of Industrial and Manufacturing Engineering (Affiliate Faculty),

Education:

Stanford University: Doctor of Philosophy in Electrical Engineering, August 2004.

The University of Texas at Austin: Master of Science in Computer Science, December, 1994.

Indian Institute of Technology, Kharagpur: Bachelor of Technology (with Honors) in Computer Science and Engineering, May, 1993.

 

Industrial Experience:

Global IDs: Chief Scientist, Member of the Board of Advisors, 2007 to present.

DBWizards: Senior Software Engineer, 2002-2003

Narus: Senior Software Engineer, 2000-2001

Oracle Corporation: Senior Member of Technical Staff (Server Technologies Division), 1995-2000.

 

Research Interests:

 

Core Problems: Information Extraction, Information Integration

General Areas: Database Systems, Digital Libraries, Visual Analytics, Data Mining, Semantic Web, Information Retrieval.

My primary research focus is on issues related to information extraction from documents especially documents retrieved from the World-Wide-Web.  Apart from extracting information from the web, we have started looking into extracting information from tables and images automatically.  Of special interest to me is automated geo-spatial information extraction.  Typically, I work with domain scientists, like chemists, archaeologists, and, geo-scientists, who have various applications for the extracted information.

 

For a more detailed idea of my research interests, please refer to my publication list and list of sponsored projects.


My three major research projects are as follows:

  1. ChemXSeer (co-PI): In this project, we are investigating the issues involved in constructing an integrated database and digital library for chemical kinetics data. We have developed a chemical name and formula search engine. We are investigating novel information extraction, document segmentation, and indexing schemes. We have also developed a table search engine, TableSeer, which uses a novel ranking function TableRank to rank tables extracted automatically from digital documents.  Experimental data is often presented using two-dimensional plots in figures in digital documents.  We are aiming at automatically extracting the data from 2-D plots.  Other topics of interest are web crawling (especially focused crawling), query expansion, and analysis of blogs and social networks.
  2. NEVAC (co-PI): I am a co-PI in the North East Visualization and Analytics Center. The objective is to allow for efficient processing of large text corpora. We are pursuing research on machine learning algorithms for relationship extraction between named entities, geographic disambiguation, etc. We have developed the FactXtractor system that extracts relationships between entities from text. We have also designed FEMARepViz, which extracts information (like topic) from daily FEMA situation update reports, performs geographical disambiguation, and visualizes the extracted information on a map.
  3. GeoCam (co-PI): This project aims to extract “accounts of movement” automatically from text documents, disambiguate descriptions of motion and combine the extracted information from a geographic information system.

 

Teaching:

IST 512: Information Processing Technologies and Architectures, Spring 2007, 2008
IST 461: Database Systems Management and Administration, Fall 2006
IST 220: Computer Networks and Telecommunications, Spring 2004-2006, Fall 2007, 2008
IST 402: Emerging Topics in Database Systems, Fall 2004, 2005

Other Interesting Links:

Some Maps