Home Bio Research Collaborators Links Photography

[ Research ]

Research Interests Publications Projects Miscellaneous
Research Interests

The primary goal of my research is to learn and mine knowledge efficiently from massive data. My research topics include, but are not limited to:
Data mining: entity resolution, specifically name disambiguation and Cross Document Coreference (CDC) in large datasets. social network analysis in large collaboration networks.
Machine Learning: supervised learning methods including linear learning methods for very large-scale data, boosting and ensemble learning approaches, active learning strategies for SVMs.
Information Retrieval and Natural Language Processing: applied machine learning techniques for text mining; question answering.

Publications

  • Jian Huang, Omid Madani, C. Lee Giles. Error-Driven Generalist+Experts (EDGE): A Multi-stage Ensemble Framework for Text Categorization. To appear in Proceedings of ACM 17th Conference on Information and Knowledge Management (CIKM 2008). Napa Valley, CA. October 2008.
  • Omid Madani, Jian Huang. On Updates that Constrain the Features' Connections During Learning. To appear in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD 2008). Las Vegas, NV. August 2008.
  • Steven Crain, Jian Huang, Hongyuan Zha. A Scalable Assistant Librarian: Hierarchical Subject Classification of Books. In Proceedings of The 31st Annual International ACM SIGIR Conference (SIGIR 2008), poster. Singapore. July 2008.
  • Jian Huang, Ziming Zhuang, Jia Li, C. Lee Giles. Collaboration Over Time: Characterizing and Modeling Network Evolution. In Proceedings of The 1st ACM International Conference on Web Search and Data Mining (WSDM 2008). Palo Alto, California, USA. February 2008.
  • Jian Huang, Sarah M. Taylor, C. Lee Giles. An Efficient Framework for Large Scale Cross Document Coreference (CDC). In Proceedings of The Conference of American Association for Corpus Linguistics (AACL 2008). Provo, Utah, USA. March 2008.
  • Seyda Ertekin, Jian Huang, Léon Bottou, C. Lee Giles. Learning on the Border: Active Learning in Imbalanced Data Classification. In Proceedings of The ACM 16th Conference on Information and Knowledge Management (CIKM 2007), pp. 127-136. Lisbon, Portugal, November 2007.
    - Also an NEC Laboratories America Technical Report, May 2007.
  • Jian Huang, Seyda Ertekin, Yang Song, Hongyuan Zha, C. Lee Giles. Efficient Multiclass Boosting Classification with Active Learning. In Proceedings of The 7th SIAM International Conference on Data Mining (SDM 2007), pp. 297-308. Minneapolis, MN, USA. April 2007.
    - IBM Research Travel Award Winner.
  • Yang Song, Jian Huang, Ding Zhou, Hongyuan Zha, C. Lee Giles. IKNN: Informative K-Nearest Neighbor Classification. In Proceedings of The 11th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2007), pp. 248-264. Warsaw, Poland. Sept. 2007.
  • Yang Song, Jian Huang, Isaac Councill, Jia Li, C. Lee Giles. Efficient Topic-based Unsupervised Name Disambiguation. In Proceedings of The ACM/IEEE Joint Conference on Digital Libraries (JCDL 2007), pp. 342-351. Vancouver, Canada. June 2007.
  • Seyda Ertekin, Jian Huang, C. Lee Giles. Active Learning for Class Imbalance Problem. In Proceedings of the Conference on Research and Development in Information Retrieval (SIGIR 2007), poster. Amsterdam, Netherlands, July 2007.
    - SIGIR Travel Award Winner.
  • Yang Song, Jian Huang, Jia Li, C. Lee Giles. Generative Models for Name Disambiguation. In Proceedings of The 16th International World Wide Web Conference (WWW 2007), poster. Banff, Canada. May 2007.
  • Isaac Councill, Huajing Li, Levent Bolelli, Yang Song, Ziming Zhuang, Jian Huang, Yang Sun, Ding Zhou, Wang-Chien Lee, Anand Sivasubramaniam, and C. Lee Giles. CiteSeerX: Next-Gen CiteSeer. In The 2nd International Conference on Open Repositories. San Antonio, TX, USA. January 2007.
  • Jian Huang, Seyda Ertekin, C. Lee Giles. Efficient Name Disambiguation for Large Scale Databases. In Proceedings of The 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2006), pp.536-544. Berlin, Germany. Sept. 2006.
    - The poster of this paper won the Best Poster Award in the Greater NY area DB/IR day, Fall 2006.
    - The media coverage of this system appeared in Science Daily: New System Solves The 'Who Is J. Smith' Puzzle, as well as in other sources.
  • Yang Song, Ding Zhou, Jian Huang, Isaac Councill, Hongyuan Zha, C. Lee Giles. Boosting the Feature Space: Text Classification for Unstructured Data on the web. In Proceedings of The 6th IEEE International Conference on Data Mining (ICDM 2006), pp.1064-1069. Hong Kong, China. December 2006.
  • Jian Huang, Seyda Ertekin, C. Lee Giles. Fast Author Name Disambiguation in CiteSeer. IST Technical Report No. 0019, The Pennsylvania State University. University Park, PA, USA. Sept. 2006.
  • Jian Huang, Xuanjing Huang. A Statistical Comparison of the Usage of Chinese in Different Chinese-speaking Regions. In Proceedings of The International Conference on Chinese Computing 2005 (ICCC 2005). Singapore. March 2005.
  • Jian Huang, Xuanjing Huang, Lide Wu. Hot Spot Passage Retrieval in Question Answering. In Proceedings of The 7th International Conference of Asian Digital Libraries (ICADL 2004), pp.483-491. Shanghai, China. October 2004.

  • Projects

  • CiteSeerX (Next Generation CiteSeer) is a popular computer and information science literature search engine and digital library. This poster showcases the architecture of the Next Generation CiteSeer.
  • FDUQA is a Question Answering system in the Media Computing and Web Intelligence Lab (MCWIL), Fudan University, and is a competitive participant in TREC's QA track.

  • Miscellaneous

  • Reviewer: SDM, WWW, JCDL, SIGIR, etc.
  • ACM student member.
  • * More details are available in my CV upon request.