Description: Penn State Official Sheild


Prasenjit Mitra,


College of Information Sciences and Technology

Department of Computer Science and Engineering        (Graduate Faculty)
Department of Industrial and Manufacturing Engineering   (Affiliate Faculty)

Students Publications Projects Curriculum Vita

Biographical Sketch:

Prasenjit Mitra is a Professor in the College of Information Sciences and Technology; he serves on the graduate faculty of the Department of Computer Sciences and Engineering and is an affiliate faculty member of the Department of Industrial and Manufacturing Engineering at The Pennsylvania State University. His current research interests are in the areas of big data analytics, applied machine learning, and visual analytics. In the past, he has contributed to the areas of data interoperation, data cleaning, and digital libraries especially in tabular data extraction, and citation recommendation.

Mitra received his Ph.D. from Stanford University in 2004 where he investigated issues related to modeling data and the semantics of data in an information integration system. At Penn State, he has pursued research on a broad range of topics ranging from data mining on the web and social media, scalable data cleaning, political text mining, chemical formula and name extraction from documents, and the extraction of data and metadata from figures and tables in digital documents.

He was the principal investigator of the DOES project funded by the NSF CAREER Award. He has also been the co-principal investigator of the CiteSeerX, ChemXSeer, and ArchSeer digital library projects, the Regional Visualization and Analytics Center (NEVAC), and the GeoCAM visual analytics projects. Mitra serves as the director of the Cancer Informatics Initiative at Penn State. His research has been supported by the NSF, Microsoft Corporation, DoD, DHS, DoE, NGA, and DTRA.

Mitra obtained his Bachelor of Technology, with honors from the Indian Institute of Technology, Kharagpur in 1993. In 1994, he obtained an M.S. in Computer Science from The University of Texas at Austin. From 1995 to 2000, he worked at the Server Technologies Division at Oracle Corporation as a Senior Member on the Oracle Parallel Server in the Languages and Relational Technologies group. He has served as a consultant for several startups including the Board of Advisors of Global IDs, Inc.

Mitra has co-authored approximately 150 articles at top conferences and journals. His work along with his co-authors has resulted in a visual analytics system that was awarded the IEEE VAST '08 Grand Challenge award in the Data Integration area. He has served as the co-chair of the IEEE SOCIETY conference, and as an area chair, and a senior program committee member at top conferences such as CIKM, and IJCAI, respectively. Mitra has been a member of the Best Paper Award committee for CIKM'15 and the co-chair of four workshops including SNAKDD'09, WIDM'09, and WIDM'12. He has also served on the program committee of several top conferences including SIGMOD, VLDB, AAAI, IJCAI, WWW, CIKM, WSDM, KDD, and ICDM, and serves on the editorial board of the Journal of Data Mining and Digital Humanities. He has supervised over 15 Ph.D. students; and several M.S. students.



Electrical Engineering

Stanford University



Computer Science

The University of Texas at Austin


B.Tech. (Hons.)

Computer Science and Engineering

Indian Institute of Technology, Kharagpur


Selected Awards:

         NSF CAREER Award, 2009-2014
         National Talent Search, 1989-1993


Intelligent Information Systems Laboratory
Cyber Security Laboratory
North-East Visualization and Analytics Center

Institute for CyberScience


Industrial Experience:

Global IDs: Chief Scientist, Member of the Board of Advisors, 2007 to present.

DBWizards: Senior Software Engineer, 2002-2003

Narus: Senior Software Engineer, 2000-2001

Oracle Corporation: Senior Member of Technical Staff (Server Technologies Division), 1995-2000.

Research Interests:

General Areas: Database Systems, Digital Libraries, Visual Analytics, Data Mining, Semantic Web, Information Retrieval.
Core Problems:   Information Extraction, Information Integration, Information Visualization


  The DOES Project on DOcument-element Extraction and Search, NSF CAREER

  Semantic CiteSeerX, NSF
  ChemXSeer: An Integrated Digital Library and Data Repository, Dow Corporation (Past: NSF)
  VACCINE: Visual Analytics for Command, Control, and Interoperability Environments, DHS University Centre of Excellence

  Analysis and Intelligent Search for Cypriot Works of Art and Secreteriat Corpus, NSF

  Contextualization for Accounts of Movement (GeoCAM), NGA

  IGERT - Big Data Social Science: An Integrative Research Program in Social Data Analytics, NSF (Senior Personnel/Faculty)

I am also a co-director of the Cancer Informatics Initiative (CANI) at Penn State.


IST 552: Data and Knowledge Management, Spring 2010-12
IST 512: Information Processing Technologies and Architectures, Spring 2007, 2008
IST 461: Database Systems Management and Administration, Fall 2006
IST 220: Computer Networks and Telecommunications, Spring 2004-06,2010 Fall 2007-11
IST 402: Emerging Topics in Database Systems, Fall 2004, 2005


Office: 313F IST Building,
The Pennsylvania State University,
University Park, PA 16802.
Office Phone: +1 (814) 865-4454

Fax : +1 (814) 865-6426
Email: pmitra AT