I am a PhD candidate (A.B.D) in Computer Science & Engineering at Penn State where I am
advised by Professor C. Lee Giles. I am part of the The Intelligent Information Systems Research Laboratory .
My research interests are in data science, big data, information retrieval and
extraction, applied machine learning, and data mining. I also enjoy building and
contributing to large scale systems on both architectuaral and algorithmic side.
During my time at Penn State I have made contributions to
CiteSeerX , and
I have created the new ChemXSeer Tagger , an information extractor
that identifies chemical formulae and names in text. I am also the
creator of AckSeer,
a search engine and repository for acknowledgments in scientific documents.
It indexes the organizations and persons acknowledged within papers
in the CiteSeerX repository. I have also created YouSeer , an open
source search engine building framework.
Last summer I was a data science fellow at the Eric &Wendy Schmidt Data Science for Social Good Fellowship where we used machine learning to automatically extract earmarks from congressional bills. Before that I interned in the data analytics group at QCRI. And before that I spent two summers at Microsoft.
- Madian Khabsa and C. Lee Giles. “Chemical entity extraction using CRF and an ensemble of extractors”. Journal of Cheminformatics Suppl 1 (2015): S12.
- Madian Khabsa and C. Lee Giles. “The Number of Scholarly Documents on the Public Web.” PloS one 9, no. 5 (2014): e93949. [Media coverage in Science , Nature, ACM News, PSU]
- Madian Khabsa, Pucktada Treeratpituck, C. Lee Giles. “Large Scale Author Name Disambiguation in Digital Libraries.” In proceeding of IEEE International Conference on Big Data 2014.
- Kyle Williams, Lichi Lu, Madian Khabsa, Jian Wu, Patrick Shih, and C. Lee Giles. “A Web Service for Scholarly Big Data Information Extraction” . In Proceeding of IEEE International Conference on Web Services (ICWS) 2014.
- Hung-Hsuan Chen, Madian Khabsa and C. Lee Giles. “The Feasibility of Investing of Manual Correction of Metadata for a Large-Scale Digital Library”. In proceeding of ACM Digital Libraries 2014.
- Zhaohui Wu, Jian Wu, Madian Khabsa, Kyle Williams, Hung-Hsuan Chen, Wenyi Huang, Suppawong Tuarob,Sagnik Ray Choudhury, Alexander Ororbia, Prasenjit Mitra and C. Lee Giles. “Towards Building a Scholarly Big Data Platform: Challenges, Lessons and Opportunities”. In proceeding of ACM Digital Libraries 2014.
- Jian Wu, Kyle Williams, Madian Khabsa and C. Lee Giles. “The Impact of User Corrections on A Crawl-Based Digital Library: A CiteSeerX Perspective.” The 10th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom ’14). [Invited Paper]
- Jian Wu, Pradeep Teregowda, Kyle Williams, Madian Khabsa, Douglas Jordan, Eric Treece, Zhaohui Wu, and C.Lee Giles. “Migrating a Digital Library to a Private Cloud.” In Proceeding of IEEE International Conference on Cloud Engineering (IC2E) 2014. [Best paper award nominee]
- Jian Wu, Kyle Williams, Hung-Hsuan Chen, Madian Khabsa, Douglas Jordan, and C. Lee Giles. “CiteSeerX: AI in a Digital Library Search Engine”. In proceeding of Twenty-Sixth Annual Conference on Innovative Applications of Artificial Intelligence (IAAI-14). 2014. [Voted as one of the best AI applications]
- Jian Wu, Alexander Ororbia, Kyle Williams, Madian Khabsa, Zhaohui Wu, and C. Lee Giles. “Utility-Based Control Feedback in a Digital Library Search Engine: Cases in CiteSeerX.” In 9th International Workshop on Feedback Computing (Feedback Computing 14) 2014.
- Kyle Williams, Jian Wu, Sagnik Ray Choudhury, Madian Khabsa, and C. Lee Giles. “Scholarly Big Data Information Extraction and Integration in the CiteSeerx Digital Library.” In ICDE 2014 Workshop on Information Integration on the Web (IIWEB) 2014.
- Madian Khabsa and C. Lee Giles. “An Ensemble Information Extraction Approach to the BioCreative CHEMDNER Task.” In Proceedings of the Fourth BioCreative Chal- lenge Evaluation Workshop . Vol. 2. 2013
- Wu, Jian, Pradeep B. Teregowda, Madian Khabsa, Eric Treece, Douglas Jordan, Stephen Carman, Prasenjit Mitra, and C. Lee Giles. “Scalability Bottlenecks of the CiteSeerX Digital Library Search Engine.” In WSDM 2013 Large-scale and distributed systems for information retrieval workshop (LSDS-IR '13').
- Madian Khabsa, Pucktada Treeratpituck, C. Lee Giles. “Entity Resolution using Search Engine Results.” In proceeding of ACM International Conference on Information and Knowledge Management 2012. (CIKM '12).
- Madian Khabsa, Pucktada Treeratpituck, C. Lee Giles. “AckSeer: A Repository and Search Engine for Automatically Extracted Acknowledgments from Digital Libraries.” In proceeding of ACM/IEEE Joint Conference on Digital Libraries (JCDL) 2012.
- Madian Khabsa, Stephen Carman, S. R. Choudhury, C. L. Giles. “A Framework for Bridging the Gap Between Open Source Search Tools.” In SIGIR workshop on Open Source Information Retrieval 2012.
- Madian Khabsa, Sharon Koppman and C. Lee Giles. “Towards Building and Analyzing a Social Network of Acknowledgments in Scientific and Academic Documents.” In Social Computing, Behavioral-Cultural Modeling and Prediction 2012.
- Teregowda, Pradeep B., Madian Khabsa, and Clyde L. Giles. “A System for Indexing Tables, Algorithms and Figures.” In Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries, pp. 343-344. ACM, 2012.
- Wu, Jian, Pradeep Teregowda, Madian Khabsa, Stephen Carman, Douglas Jordan, Jose San Pedro Wandelmer, Xin Lu, Prasenjit Mitra, and C. Lee Giles. “Web Crawler Middleware for Search Engine Digital Libraries: A Case Study for CiteseerX. ” In Proceedings of the twelfth international workshop on Web information and data man- agement, pp. 57-64. ACM, 2012.
- Bhatia, Sumit, Cornelia Caragea, Hung-Hsuan Chen, Zhaohui Wu, Madian Khabsa, and C. Lee Giles. “Specialized Research Datasets in the CiteSeerx Digital Library.” D-Lib Magazine 18, no. 7 (2012): 7.
- Pradeep B. Teregowda, Isaac G. Councill, Juan Pablo Fernández R, Madian Khabsa , Shuyi Zheng, and C. Lee Giles. “SeerSuite: Developing a Scalable and Reliable Application Framework for Building Digital Libraries by Crawling the Web.” In USENIX Conference on Web Application Development ’10.
- Karunakaran, Arvind; Kim, Hyun-Woo; Khabsa, Madian. “iSchools and Social Identity A Social Network Analysis.” In the iConference. 2009.
NewsOur PLOS One has received media coverage in:
- ACM Tech News
- Penn State News
- Harvard's Journalist's Resource
- Nature Blogs
- Inside Higher Ed
- Knowledge Wire (In Japanese)
- Naseej Blog (In Arabic)
Software / Code
- ChemXSeer Tagger: An open source information extractor for identifying chemical formulae and names in chemistry documents.
- YouSeer: Open source framework for building search engines using Apache Solr and Heritrix. Check the paper
- Machine learning toolkit for identifying earmarks in congressional bills
Office: 310 IST Building, The Pennsylvania State University, University Park, PA 16802
Email: first name [at] psu [dot] edu