dmse

Ahmed E. Hassan

Queen's University, Canada

Tao Xie

North Carolina State University, USA

Technical Briefing Slides (PDF 5.5MB)

An ICSE 2010 Tutorial T18 Tuesday, 4 May 2010 (afternoon)

Mining Software Engineering Data

Ahmed E. Hassan

Queen's University, Canada

Tao Xie

North Carolina State University, USA

Tutorial Slides (PPT, 4.0MB) Tutorial Notes (6 slides per page, PDF, 2.22MB)

Software engineering data (such as code bases, execution traces, historical code changes, mailing lists, and bug databases) contains a wealth of information about a project's status, progress, and evolution. Using well-established data mining techniques, practitioners and researchers can explore the potential of this valuable data in order to better manage their projects and to produce higher quality software systems that are delivered on time and on budget.

This tutorial presents the latest research in mining Software Engineering (SE) data, discusses challenges associated with mining SE data, highlights SE data mining success stories, and outlines future research directions. Attendees will acquire the knowledge and skills needed to perform research or conduct practice in the field and to integrate data mining techniques in their own research or practice. More information of the tutorial can be found at https://sites.google.com/site/asergrp/dmse.

An ICSE 2009 Tutorial (Tuesday May 19 morning) on

Mining Software Engineering Data

Tao Xie

North Carolina State University, USA

Ahmed E. Hassan

Queen's University, Canada

An ICSE 2008 Tutorial on

Mining Software Engineering Data

Ahmed E. Hassan

Queen's University, Canada

Tao Xie

North Carolina State University, USA

Invited talks at West Virginia U., HKUST, CUHK, U. Calgary, Motorola Labs, Accenture Labs

Improving Software Productivity and Quality via

Mining Program Source Code

Tao Xie

North Carolina State University

Talk Slides (PPT, 1.7MB)

Since late 90's, various data mining techniques have been applied to analyze software engineering data, and have achieved many noticeable successes. This talk will first present recent research at North Carolina State University on mining program source code, including mining API usage patterns for software reuse and API properties for static detect detection. The research exploits a model checker to generate static traces for mining without requiring system tests or runtime execution. The research also exploits a code search engine to expand the scope of mining to billions of lines of open source code. The related research papers can be found at http://www.csc.ncsu.edu/faculty/xie/research.htm#minestatic and more general information on mining software engineering data can be found in tutorial slides presented at KDD 2006, ICSE 2007, and ICDM 2007 as well as a comprehensive bibliography: https://sites.google.com/site/asergrp/dmse.

An ICDM 2007 Tutorial on

Mining for Software Reliability

Chao Liu

Yahoo! Research

Tao Xie

North Carolina State University

Jiawei Han

Univ. of Illinois at Urbana-Champaign

Tutorial Slides (PPT, 10.2MB)

Software is ubiquitous in our daily life. It brings us great convenience and a big headache about software reliability as well: Software is never bug-free, and software bugs keep incurring monetary loss or even catastrophes. In the pursuit of better reliability, software engineering researchers found that huge amount of data in various forms can be collected from software systems, and these data, when properly analyzed, can help improve software reliability. Unfortunately, the huge volume of complex data renders simple analysis techniques incompetent; consequently, researchers have been resorting to data mining for more effective analysis. In the past few years, we have witnessed many studies on mining for software reliability reported in data mining as well as software engineering forums. These studies either develop new or apply existing data mining techniques to tackle reliability problems from different angles. In order to keep data mining researchers abreast of the latest development in this growing research area, we propose this tutorial on mining for software reliability. In this tutorial, we will present a comprehensive overview of this area, examine representative studies, and lay out challenges to data mining researchers. Especially, every effort will be made to let data mining researchers appreciate the challenges and impact posed by software reliability, and be stimulated to contribute.

An ICSE 2007 Tutorial on

Mining Software Engineering Data

Tao Xie

North Carolina State University, USA

Ahmed E. Hassan

University of Victoria, Canada

Some tutorial slides are adapted from KDD 06 tutorial slides co-prepared by Jian Pei from Simon Fraser University, Canada

A KDD 2006Tutorial on

Data Mining for Software Engineering

Tao Xie

North Carolina State University, USA

Jian Pei

Simon Fraser University, Canada

Tutorial Slides (PDF, 1.70MB) (PPT, 3.46MB)

Since late 90's, various data mining techniques have been applied to analyze software engineering data, and have achieved many noticeable successes. Substantial experience, development, and lessons of data mining for software engineering pose interesting challenges and opportunities for new research and development. In this tutorial, we shall present a survey on the research problems, the latest progress, the challenges, and the potentials of data mining practice in software engineering. The tutorial will focus on the inherent challenges of mining software engineering data, offer a shortcut to the current research and development frontier, and illustrate a few case studies. The tutorial will answer questions like what software engineering tasks can be helped by data mining, what kinds of software engineering data are available for mining, and how data mining techniques can be used in software engineering. The tutors, Drs. Tao Xie and Jian Pei, are active and prolific researchers in software engineering and data mining, respectively. The tutorial website is at: http://ase.csc.ncsu.edu/dmse/

Tutorials on Mining Software Engineering Data

Target Audience: both Practitioners and Researchers from the Software Engineering/Development or Data Mining community.

If you are interested in inviting any of us in giving this tutorial at your company, research lab, or university, please contact Tao Xie!

The normal duration of the tutorial is 2.5~3 hours including a 10-min break and a 15-min Q&A session but the tutorial duration can be customized as needed.

Venues of Tutorial Presentations:

  • 04/2009: given by Tao Xie at ABB Research (slides)
  • 10/2007: given by Chao Liu and Tao Xie at ICDM 2007 (on Mining for Software Reliability)
  • 05/2007: given by Tao Xie and Ahmed E. Hassan at ICSE 2007
  • 08/20/2006: given by Tao Xie and Jian Pei at KDD 2006

You may be also interested in Tao Xie's presentations on Improving Automation in Developer Testing.

Tao Xie's Research on Mining Software Engineering Data