Title:

Multidimensional Text Mining with Limited Supervision

Abstract:

Unstructured text, as one of the most important data forms, plays a crucial role in domains such as cybersecurity, healthcare informatics, and cyber-physical systems. In many emerging applications, people's information need from text data is becoming multidimensional---they demand useful insights along multiple aspects from the given text corpus. However, acquiring multidimensional knowledge from massive text data challenges existing data mining techniques. In this talk, I will present a structuring-and-mining framework for facilitating acquiring multidimensional knowledge from text data. It organizes unstructured text into a multidimensional and multi-granular structure, from which end users can easily select relevant data with declarative queries and apply any data mining primitives thereafter. I will detail two core algorithms in this framework, including (1) a weakly supervised text classification algorithm; and (2) an abnormal event detection algorithm. The algorithms in the framework all require little supervision and are thus particularly appealing in scenarios where labeled data are expensive to acquire.

Bio:

Chao Zhang is an Assistant Professor at College of Computing, Georgia Institute of Technology. His research area is data mining and machine learning. He is particularly interested in developing label-efficient and robust learning techniques, with applications in text mining and spatiotemporal data mining. Chao has published more than 40 papers in top-tier conferences and journals, such as KDD, WWW, SIGIR, VLDB, and TKDE.  He is the recipient of the ECML/PKDD Best Student Paper Runner-up Award (2015) and the Chiang Chen Overseas Graduate Fellowship (2013). Before joining Georgia Tech, he obtained his Ph.D. degree in Computer Science from University of Illinois at Urbana-Champaign in 2018.