Biography Curriculum-Vitae Research Students
Teaching Publications
Projects
Biography:
Prasenjit Mitra received his Doctor of
Philosophy degree in Electrical Engineering
at Stanford University in 2004. Prior to
that, he had received a Master of Science degree in Computer Science at The University of Texas at Austin in December,1994.
His Bachelor of Technology (with Honours) degree in Computer
Science and Engineering was from the Indian Institute of Technology, Kharagpur
in May, 1993.
From 1995, he worked for five years at
Oracle Corporation in Redwood Shores,
CA
as a senior member of the technical staff at the Server Technologies division
developing database software. He also worked part-time as a senior
engineer at Narus, and DBWizards
(Old) Curriculum Vita:
(Including Publication List): [MS-Word]
[Postscript]
Research Interests:
Database Systems, Digital
Libraries, Data Mining, Semantic Web, Information Retrieval, Artificial
Intelligence.
My primary research focus is on issues related to information extraction from
documents especially documents retrieved from the World-Wide-Web. Apart from extracting information from
the web, we have started looking into extracting information from tables and
images automatically. Of special
interest to me is automated geo-spatial information extraction. Typically, I work with domain scientists
who have various applications for the extracted information.
For a more detailed idea
of my research interests, please refer to my publication list and list of
sponsored projects.
My three major research projects are as follows:
- ChemXSeer
(co-PI): In this project, we are investigating the issues involved in
constructing an integrated database and digital library for chemical
kinetics data. We have developed a chemical name and formula search engine.
We are investigating novel information extraction, document segmentation,
and indexing schemes. We have also developed a table search engine, TableSeer, which uses
a novel ranking function TableRank to rank tables
extracted automatically from digital documents. Experimental data is often presented using two-dimensional plots in
figures in digital documents. We
are aiming at automatically extracting the data from 2-D plots. Other topics of interest are web
crawling (especially focused crawling), query expansion, and analysis of blogs and
social networks.
- NEVAC (co-PI): I
am a
co-PI in the North East Visualization and Analysitcs
Center.
The objective is to allow for efficient processing of large text corpora.
We are pursuing research on machine learning
algorithms for relationship extraction between named entities, geographic
disambiguation, etc. We have developed the FactXtractor
system that extracts relationships between entities from text. We have
also designed FEMARepViz, which
extracts information (like topic) from daily FEMA situation update
reports, performs geographical disambiguation, and visualizes the extracted
information on a map.
- GeoCam (co-PI): This project aims to extract “accounts of
movement” automatically from text documents, disambiguate descriptions
of motion and combine the extracted information from a geographic
information system. See link.
Teaching:
IST 512: Information Processing Technologies and Architectures, Spring 2007,
2008
IST 461: Database Systems Management and Administration, Fall 2006
IST 220: Computer Networks and Telecommunications, Spring 2004-2006,Fall 2007
IST 402:
Emerging Topics in Database Systems, Fall 2004,2005
Other Interesting Links:
Some
Maps