Domains like engineering and life sciences are producing vast amounts of data. Although mining this data is a challenging task, it can lead to valuable insights for explaining mechanism, and for making important decisions for design, decision, control and prediction of the underlying system. In this talk, I will present my work in mining data from domains like Biology, Web and Social media, Application Logs and Petroleum industry. The focus will be on knowledge discovery in the domain of Biology. Proteins are work horses of many cellular functions in the living organisms. They are made up of amino acid residues. A small number of these residues (typically 1%) are involved in protein function and hence called as functional residues. I will present a method for prediction of functional residues from a given three dimensional protein structure. In our scheme, we represent each such structure as a weighted undirected residue interaction network with residues being the nodes. Spatially proximal, and hence interacting, residues in the structure are connected by an edge in the network. The weight on the edge captures correlation between the residue labels. We then obtain optimal label assignment for the protein by minimizing combined cost of residue-wise label misclassification and violation of label correlation constraints. We solve this optimization problem in two stages, where the first stage minimizes residue-wise label misclassification cost followed by an iterative collective inference scheme that adjusts labels predicted in the first stage so as to minimize violations of label correlations. Our approach significantly outperforms state of the art methods on standard benchmark dataset. It achieves 35.3% precision at 50% recall and 88% recall at 18.5% precision, which translates to improvements of 8 percentage points in the precision at 50% recall and 19 percentage points in recall at 18.5% precision. In the end, I will provide a brief summary of research and teaching plans.
Ashish completed his PhD from Department of Computer Science and Engineering of IIT Bombay in 2009. He worked with the Department of Computer Science and Engineering at IIT Madras for two years as a visiting assistant professor and later with Tata Institute of Fundamental Research for a year. Ashish is currently working with Reliance Industries Ltd in Mumbai as a senior data scientist in the Big Data Analytics Team. He is interested in Machine Learning and its application in discovering new knowledge in various fields like Biology, Web and social media, and Engineering.