Statement of purpose (research Statement)

A one to two page statement, written by the student, of the student's professional interests in research, teaching, service, and/or product development.

My research interests involve the application of mathematics to a variety of problems including graph theory and management science, mathematical modeling and statistics, artificial intelligence and natural language processing, computational biology, and software engineering. Formally trained as a mathematician, I have developed an affinity for multidisciplinary applied scientific research and enjoy my work as part of a research group that is continuously discovering and learning new aspects of scientific knowledge that bears significance to real life.

After becoming a member of a production-oriented research team at JGI/LANL, I realized that while conducting research on an everyday basis does not require a PhD degree, many opportunities I would like to pursue such as starting and leading research projects, getting funding through internal or external sources, and mentoring requires a PhD degree. Also, while working on fulfilling these requirements for my personal development, I am motivated by the strong belief that the research part of my PhD work will bring novel approaches and views to software measurement and analysis. I can see immediate practical benefits from the implementation of these ideas to the Hackystat user community in particular and the software metrics practitioners' community in general.

My PhD degree research involves an in-depth study of software development activity patterns. Using the Hackystat system as infrastructure, I am developing new software modules with extended data mining functionality which involves indexing software process and product data for further analysis through the comparison of multidimensional telemetry stream time-series. The methods, I am working on, enable unsupervised discovery of development activity patterns through real-time sensor data analysis. Nevertheless, while the algorithmic and software development challenges are a big and very interesting part of my research, the main focus of this work lays beyond the implementation of algorithms and software modules: my goal is to improve our understanding of the software development process as a human activity. One outcome of this research, if successful, will be a taxonomy of software development activity patterns for use in software development project analysis and management.

I started my time-series data mining journey with straightforward application of statistical analyses such as moving averages, seasonality, and autoregressive models which didn't yield useful results. Next, I applied spectral decomposition techniques including Discrete Fourier Transform and Wavelet-based analysis. These methods were also found to be insufficient and my focus moved from attempts at periodicity discovery to the direct comparison and indexing of time series. I started with simple Euclidean distance metrics for the time series and implementation of a 3D trajectory GUI browser. I finished this aspect of my research by implementing a Dynamic Time Warping Java library and developing a plugin for the Hackystat ProjectBrowser – the Hackystat project management and telemetry visualization module.

At the beginning of Spring'09 semester I started applying Symbolic Aggregate approXimation (SAX) to my research problem. I have developed an open source, Java-based version of SAX called the JMotif library. To date, SAX seems to overcome all of the problems I encountered in my previous research and allows not only real-time telemetry data indexing but lightning-fast clustering, aligning and search. The automatic motif-discovery highlights similar features on the telemetry streams and turns telemetry analysis from boring "eyeballing" into an interactive, user-friendly process. I have yet to explore the "bitmap" visualization of the telemetry data.

Currently, I have the basic Hackystat-SAX "proof-of-concept" toolkit implemented and am able to see it in the action through the Java/SWING/MySQL based telemetry visualization module called Hackystat-Trajectory. This initial release of Trajectory has revealed many issues with my initial index database schema design, user-interface design and the underlying Hackystat architecture. My immediate goals are: implementation of batch access to the Hackystat central data repository; improvement of the indexing algorithm; creation of an index database; Trajectory GUI improvement; implementation of multidimensional SAX-based indexing; and bitmap visualization.

My future plans include developing an offline Hackystat Trajectory Browser featuring 2D and 3D visualization, trajectory alignment, motif highlighting and augmented visual querying. Using this tool, I plan to identify and classify patterns (motifs) found within the software development telemetry datasets and implement a Hackystat module for real time telemetry streams analysis, automatic software project evaluation, and process defect detection.

Thank you for consideration.