Data driven research has become an important avenue for novel discoveries in a number of scientific, engineering, and medical and government applications. These applications generate gigabytes to terabytes of data per day. Storing, processing, exploration and mining of this data to make new scientific discoveries, improve quality of health care and model internet worms are the focus of my research. I seek to exploit the rich interdependence between theory and practice of computer science to provide effective solutions to these data intensive applications.
Grid based architectures have become pervasive for developing data intensive and geographically distributed applications in disciplines such as High Energy Physics, Medicine and Astronomy. Grids typically represent resources that change over time and are differentially available to a given user or application based on administrative policies. With support from a NSF ITR grant and Intel Corporation, weave developed a framework called SPHINX that can administrate grid policies and schedule complex and data intensive scientific applications in an adaptive resource environment. The novelty lies in making effective decisions in the presence of time varying resources and achieving fault-tolerance when some of these resources fail. I also presented an invited talk on this work at the 2006 Grid and Pervasive Computing Conference in Taiwan. Energy consumption has recently become a critical issue in large scale grids and data centers (these systems may require megawatts of power). We are currently developing novel algorithms to reduce the energy requirements of complex workflows on computational grids.
Data mining of large dimensional datasets is very important for understanding the underlying patterns and relationships in many data intensive applications. I am developing novel data mining based algorithms and software for this purpose. In a joint project with College of Pharmacy at University of Florida and supported by NSF, we are developing methods for understanding the temporal resistance trends for different bacterial organisms based on the amount of antibiotics used at a given hospital. A better understanding of antibiotic use and its relationship to resistance patterns can lead to significant reduction in health care costs. We are also developing novel data mining methods for analyzing CGH (Cytogenic Genomic Hybridization) datasets from cancer patients. We have shown that our methods can effectively derive the genetic imbalances for a given cancer type; and similarities and differences between genetic imbalances for multiple cancer types. A better understanding of these imbalances and the corresponding genes and can lead to more effective cancer treatments in the future.
I am actively involved in a number of university wide initiatives for high performance and bioinformatics. I recently led a team of UF researchers to develop a networking and storage infrastructure called CASTOR. CASTOR, Funded by NSF and Cisco corporation, has enabled UF to link major HPC facilities on campus by a 10 Gigabit/s network and provide 24 Terabyte high-performance storage systems - one of the largest and fastest in US academe.
I believe that educating the next generation of scientists and engineers with leading edge concepts is crucial for the US to maintain its technological edge. I am very active in transferring my research to graduate and undergraduate curriculum. I have developed and offered new graduate courses on data mining and high performance computing. As part of a NSF funded CHEPREO project, I am developing tutorials on grid computing for practitioners in large scale and data intensive applications.
|