SETasks

.

Mining Software Engineering Data Bibliography

-- What software engineering tasks can be helped by data mining?--

Programming

  • Data Mining Library Reuse Patterns in User-Selected Applications. Amir Michail. ASE 1999. [PDF]
  • Data Mining Library Reuse Patterns using Generalized Association Rules. Amir Michail. ICSE 2000. [PDF]
  • CVSSearch: Searching through Source Code using CVS Comments. Annie Chen, Eric Chou, Joshua Wong, Andrew Y. Yao, Qing Zhang, Shao Zhang, and Amir Michail. ICSM 2001. [PDF]
  • A New Approach to Data Mining for Software Design. Walid Taha, Scott Crosby, and Kedar Swadi. CSITeA 2004. [PDF]
  • MUDABlue: An Automatic Categorization System for Open Source Repositories. Shinji Kawaguchi, Pankaj K. Garg, Makoto Matsushita, and Katsuro Inoue. APSEC 2004. [PDF]
  • Mining Jungloids: Helping to Navigate the API Jungle. David Mandelin, Lin Xu, Rastislav Bodik, and Doug Kimelman. PLDI 2005. [PDF] [Prospector tool web interface]
  • Synthesis of Interface Specifications for Java Classes. Rajeev Alur, Pavol Cerny, Gunjan Gupta, P. Madhusudan, Wonhong Nam, and Anshuman Srivastava. POPL 2005. [PDF]
  • Permissive Interfaces. Thomas A. Henzinger, Ranjit Jhala, and Rupak Majumdar, ESEC/FSE 2005. [PDF]
  • MAPO: Mining API Usages from Open Source Repositories. Tao Xie and Jian Pei. MSR 2006. [PDF]
  • XSnippet: Mining for Sample Code. Naiyana Tansalarak and Kajal T. Claypool. OOPSLA 2006 [PDF]
  • QUARK: Empirical Assessment of Automaton-based Specification Miners. David Lo and Siau-Cheng Khoo. WCRE 2006. [PDF]
  • SMArTIC: Towards Building an Accurate, Robust and Scalable Specification Miner. David Lo and Siau-Cheng Khoo. FSE 2006 [PDF]
  • Context-Sensitive Domain-Independent Algorithm Composition and Selection. Troy A. Johnson. Rudolf Eigenmann. PLDI 2006 [PDF]
  • Mica: A Web-Search Tool for Finding API Components and Examples. Jeffrey Stylos and Brad A. Myers, VL/HCC 2006. [PDF] [Web]
  • Static Specification Mining Using Automata-Based Abstractions. Sharon Shoham, Eran Yahav, Stephen Fink and Marco Pistoia. ISSTA 2007 [PDF]
  • Efficient Mining of Iterative Patterns for Software Specification Discovery. David Lo, Siau-Cheng Khoo and Chao Liu. KDD 2007. [PDF]
  • Recommending Random Walks, Zachary M Saul, Vladimir Filkov, Premkumar Devanbu, Christian Bird. ESEC/FSE 2007 [PDF] [FRAN implementation]
  • Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers. Raphael Hoffmann, James Fogarty, and Daniel S. Weld. UIST 2007. [PDF]
  • Searching the Library and Asking the Peers: Learning to Use Java APIs on Demand. Yunwen Ye, Yasuhiro Yamamoto, Kumiyo Nakakoji, Yoshiyuki Nishinaka, and Mitsuhiro Asada. PPPJ 2007. [PDF]
  • Mining Concepts from Code with Probabilistic Topic Models. Erik Linstead, Paul Rigor, Sushil Bajracharya, Cristina Lopes, and Pierre Baldi. ASE 2007 [PDF]
  • PARSEWeb: A Programmer Assistant for Reusing Open Source Code on the Web. Suresh Thummalapenta and Tao Xie. ASE 2007. [PDF]

Static defect detection

  • Bugs as Deviant Behavior: A General Approach to Inferring Errors in Systems Code. Dawson Engler, David Yu Chen, Seth Hallem, Andy Chou, and Benjamin Chelf. SOSP 2001. [PDF]
  • CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code. Zhenmin Li, Shan Lu, Suvda Myagmar, and Yuanyuan Zhou. OSDI 2004. [PDF]
  • PR-Miner: Automatically Extracting Implicit Programming Rules and Detecting Violations in Large Software Code. Zhenmin Li and Yuanyuan Zhou. ESEC/FSE 2005. [PDF]
  • Locating Matching Method Calls by Mining Revision History Data. Benjamin Livshits and Thomas Zimmermann. BUG 2005. [PDF]
  • DynaMine: Finding Common Error Patterns by Mining Software Revision Histories. Benjamin Livshits and Thomas Zimmermann. ESEC/FSE 2005. [PDF]
  • Mining Temporal Specifications for Error Detection. Westley Weimer and George Necula. TACAS 2005. [PDF]
  • Mining Change and Version Management Histories to Evaluate an Analysis Tool, Danhua Shao, Sarfraz Khurshid and Dewayne E. Perry. 2005. [PDF]
  • When Do Changes Induce Fixes? Jacek Sliwerski, Thomas Zimmermann, and Andreas Zeller. MSR 2005. [PDF] [slides]
  • Automatic Mining of Source Code Repositories to Improve BugFinding Techniques. Chadd C. Williams and Jeffrey K. Hollingsworth. TSE 2005. [PDF]
  • Perracotta: Mining Temporal API Rules from Imperfect Traces. Jinlin Yang, David Evans, Deepali Bhardwaj, Thirumalesh Bhat, and Manuvir Das. ICSE 2006. [PDF]
  • Mining Metrics to Predict Component Failures, Nachiappan Nagappan, Thomas Ball, Andreas Zeller, ICSE 2006. [PDF]
  • Predicting Faults from Cached History. Sunghun Kim, Thomas Zimmermann, E. James Whitehead Jr., and Andreas Zeller. ICSE 2007. [PDF]
  • Path-Sensitive Inference of Function Precedence Protocols. Murali Krishna Ramanathan, Ananth Grama, Suresh Jagannathan. ICSE 2007. [PDF]
  • Static Specification Inference Using Predicate Mining. Murali Krishna Ramanathan, Ananth Grama, Suresh Jagannathan. PLDI 2007 [PDF]
  • Static Error Detection Using Semantic Inconsistency Inference. Isil Dillig, Thomas Dillig and Alex Aiken. PLDI 2007. [PDF]
  • Finding What's Not There: A New Approach to Revealing Neglected Conditions in Software. Ray-Yaung Chang, Andy Podgurski, and Jiong Yang. ISSTA 2007. [PDF]
  • Mining API Patterns as Partial Orders from Source Code: From Usage Scenarios to Specifications, Mithun Acharya, Tao Xie, Jian Pei, Jun Xu. ESEC/FSE 2007. [PDF]
  • Detecting Object Usage Anomalies. Andrzej Wasylkowski, Andreas Zeller, and Christian Lindig. ESEC/FSE 2007. [PDF]
  • /* iComment: Bugs or Bad Comments? */. Lin Tan, Ding Yuan, Gopal Krishna and Yuanyuan Zhou. SOSP 2007. [PDF]
  • NEGWeb: Static Defect Detection via Searching Billions of Lines of Open Source Code. Suresh Thummalapenta and Tao Xie. NCSU CSC 2007. [PDF]
  • Mining Patterns and Violations Using Concept Analysis. Christian Lindig. [PDF]

Testing

  • The Concept of Dynamic Analysis. Tom Ball. ESEC/FSE 1999. [PDF]
  • Dynamically Discovering Likely Program Invariants to Support Program Evolution. Michael D. Ernst, Jake Cockrell, William G. Griswold, and David Notkin. TSE 2001. [PDF] [Daikon tool implementation] [Publications using Daikon]
  • Encoding Program Executions. Steven P. Reiss and Manos Renieris. ICSE 2001 [PDF]
  • Finding Failures by Cluster Analysis of Execution Profiles. William Dickinson and David Leon and Andy Podgurski. ICSE 2001. [PDF]
  • Pursuing Failure: the Distribution of Program Failures in aProfile Space. William Dickinson, David Leon, and Andy Podgurski. FSE 2001. [PDF]
  • Tracking Down Software Bugs Using Automatic Anomaly Detection, Sudheendra Hangal and Monica S. Lam. ICSE 2002. [PDF] [DIDUCE tool implementation]
  • Automatic Extraction of Object-Oriented Component Interfaces. John Whaley, Michael C. Martin, and Monica S. Lam. ISSTA 2002 [PDF]
  • Mining Specifications. Glenn Ammons, Rastislav Bodk, and James R. Larus. POPL 2002 [PDF]
  • Discovering Algebraic Specifications from Java Classes. Johannes Henkel and Amer Diwan. ECOOP 2003. [PDF]
  • Detecting AAA Vulnerabilities by Mining Execution Profiles. Zhan Xu, David Leon, and Andy Poidgurski, and Vincenzo Liberatore. IEEE S&P 2004. [PDF]
  • Active Learning for Automatic Classification of Software Behavior. James Bowring, James Rehg, and Mary Jean Harrold. ISSTA 2004. [PDF]
  • Automatic Extraction of Object-Oriented Observer Abstractions from Unit-Test Executions. Tao Xie and David Notkin. ICFEM 2004. [PDF]
  • Automatic Extraction of Sliced Object State Machines for Component Interfaces. Tao Xie and David Notkin. SAVCBS 2004. [PDF]
  • Dynamically Inferring Temporal Properties. Jinlin Yang and David Evans. PASTE 2004. [PDF] [Terracotta tool implementation]
  • Inference of Component Protocols by the kBehavior Algorithm. Leonardo Mariani, and Mauro Pezzè. 2004 Tech Report [PDF] [kBehavior tool implementation]
  • Behavior Capture and Test: Automated Analysis of Component Integration, Leonardo Mariani and Mauro Pezzè. ICECCS 2005. [PDF]
  • Automatic Extraction of Abstract-Object-State Machines Based on Branch Coverage. Hai Yuan and Tao Xie. RETR 2005. [PDF]
  • Automatically Identifying Special and Common Unit Tests for Object-Oriented Programs. Tao Xie and David Notkin. ISSRE 2005. [PDF]
  • Improving the Classification of Software Behaviors using Ensembles of Control-Flow and Data-Flow Classifiers. James Bowring, Mary Jean Harrold, and James Rehg. GIT-CERCS-05-10. [PDF]
  • Mining Object Behavior with ADABU. Valentin Dallmeier and Christian Lindig and Andrzej Wasylkowski and Andreas Zeller. WODA 2006. [PDF]
  • Automatic Extraction of Abstract-Object-State Machines from Unit-Test Executions. Tao Xie, Evan Martin, and Hai Yuan. ICSE 2006 Demo. [PDF]
  • Inferring Access-Control Policy Properties via Machine Learning, Evan Martin and Tao Xie. POLICY 2006. [PDF]
  • Inference and Enforcement of Data Structure Consistency Specifications. Brian Demsky, Michael D. Ernst, Philip J. Guo, Stephen McCamant, Jeff H. Perkins, and Martin Rinard. ISSTA 2006. [PDF]

Debugging

  • Automated Support for Classifying Software Failure Reports. Andy Podgurski, David Leon, Patrick Francis, Wes Masri, MelindaMinch, Jiayang Sun, and Bin Wang. ICSE 2003. [PDF]
  • Debugging Temporal Specifications with Concept Analysis. Glenn Ammons, David Mandelin, Rastislav Bodik, James Larus. PLDI 2003. [PDF]
  • Bug Isolation via Remote Program Sampling. Ben Liblit, Alex Aiken, Alice X. Zheng, and Michael I. Jordan. PLDI 2003. [PDF]
  • Tree-Based Methods for Classifying Software Failures. Patrick Francis, David Leon, Melinda Minch, Andy Podgurski. ISSRE 2004. [PDF]
  • Automatic Bug Triage Using Text Classification. Davor Cubranic and Gail Murphy. SEKE 2004. [PDF]
  • Scalable Statistical Bug Isolation. Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan. PLDI 2005. [PDF]
  • SOBER: Statistical Model-based Bug Localization. Chao Liu, Xifeng Yan, Long Fei, Jiawei Han, and Samuel P. Midkiff. ESEC/FSE2005. [PDF]
  • Mining Behavior Graphs for Backtrace of Noncrashing Bugs. Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han, and Philip S. Yu. SDM 2005. [PDF]
  • Lightweight Defect Localization for Java. Valentin Dallmeier, Christian Lindig, and Andreas Zeller. ECOOP 2005. [PDF] [Ample tool implementation]
  • Detecting Failure-Related Anomalies in Method Call Sequences. Valentin Dallmeier. Thesis 2005. [PDF]
  • Data Mining Approaches to Software Fault Diagnosis. Bose, R.P.J.C.; Srinivasan, S.H. RIDE-SDMA 2005. [PDF]
  • Helping Users Avoid Bugs in GUI Applications. Amir Michail and Tao Xie. ICSE 2005. [PDF]
  • Data Mining and Cross-checking of Execution Traces. TristanDenmat, Mireille Ducasse and Olivier Ridoux. ASE 2005. [PDF]
  • Software Defect Association Mining and Defect Correction Effort Prediction. Qinbao Song, Martin Shepperd, Michelle Cartwright, and Carolyn Mair. TSE 2006. [PDF]
  • Who Should Fix This Bug? John Anvik, Lyndon Hiew, and Gail C. Murphy. ICSE 2006. [PDF] [BugTriage project]
  • Coping With Open Bug Repositories. John Anvik, Lyndon Hiew, and Gail C. Murphy. eTX 2006. [PDF] [BugTriage project]
  • How Long Did It Take to Fix Bugs? Sunghun Kim, E. James Whitehead, Jr. MSR 2006 [PDF]
  • Mining Control Flow Abnormality for Logic Error Isolation. Chao Liu, Xifeng Yan, and Jiawei Han. SDM 2006 [PDF]
  • If Your Bug Database Could Talk. Adrian Schröter, Thomas Zimmermann, Rahul Premraj, and Andreas Zeller: Technical Report, Saarland University, June 2006. [PDF][Eclipse Bug Data]
  • Using FogBUGZ to Get Crash Reports From Users - Automatically! Joel Spolsky. [HTML]
  • Bug Classification Using Program Slicing Metrics. Kai Pan, Sunghun Kim and E. James Whitehead, Jr.. SCAM 2006 [PDF]
  • A Linguistic Analysis of How People Describe Software Problems in Bug Reports. Andrew Ko, Brad Myers, Duen Horng Chau. VLHCC 2006 [PDF]
  • How Bayesians Debug. Chao Liu, Zeng Lian and Jiawei Han. ICDM 2006 [PDF]
  • Failure Proximity: A Fault Localization-Based Approach. Chao Liu and Jiawei Han. FSE 2006 [PDF]
  • Cost-Sensitive Decision Tree Learning for Forensic Classification. Jason V. Davis, Jungwoo Ha, Hany E. Ramadan, Christopher J. Rossbach, and Emmett Witchel. ECML 2006. [PDF]
  • Detection of Duplicate Defect Reports Using Natural Language Processing. Per Runeson, Magnus Alexandersson, and Oskar Nyholm. ICSE 2007. [PDF]
  • Improved Error Reporting for Software that Uses Black-Box Components. Jungwoo Ha, Christopher J. Rossbach, Jason V. Davis, Indrajit Roy, Hany E. Ramadan, Don E. Porter, David L. Chen and Emmett Witchel. PLDI 2007 [PDF]
  • Statistical Debugging Using Compound Boolean Predicates. Piramanayagam Arumuga Nainar, Ting Chen, Jake Rosin, and Ben Liblit. ISSTA 2007 [PDF]
  • Mining Specifications of Malicious Behavior. Mihai Christodorescu, Somesh Jha, and Christopher Kruegel. ESEC/FSE 2007. [PDF]
  • Which Warnings Should I Fix First? Sunghun Kim and Michael D. Ernst. ESEC/FSE 2007. [PDF]
  • An Approach to Detecting Duplicate Bug Reports using Natural Language and Execution Information.. Xiaoyin Wang, Lu Zhang, Tao Xie, John Anvik, and Jiasu Sun. ICSE 2008. [PDF]

Maintenance

  • Discovering Models of Software Processes from Event-Based Data. Jonathan E. Cook and Alexander L. Wolf. TOSEM 1998 [PDF]
  • Interaction-Pattern Mining: Extracting Usage Scenarios from Run-time Behavior Traces, M. El-Ramly, E. Stroulia, P. Sorenson. KDD 2002. [PDF] CELLEST Mohammad El-Ramly Stroulia
  • Mining the Maintenance History of a Legacy Software System. Jelber Sayyad-Shirabad, Timothy Lethbridge, Stan Matwin. ICSM 2003 [PDF]
  • Automatic Categorization Algorithm for Evolvable Software Archive. Shinji Kawaguchi, Pankaj K. Garg, Makoto Matsushita, and Katsuro Inoue. IWPSE 2003. [PDF] More papers More papers
  • Mining System-User Interaction Logs for Interaction Patterns. Mohammad El-Ramly and Eleni Stroulia. MSR 2004. [PDF]
  • Mining Version Histories to Guide Software Changes. Thomas Zimmermann, Peter Weibgerber, Stephan Diehl, and Andreas Zeller. ICSE 2004. [PDF][eRose tool implementation]
  • Predicting Source Code Changes by Mining Change History. Annie Ying, Gail Murphy, Raymond Ng, and Mark Chu-Carroll. TSE 2004. [PDF]
  • Aspect Mining Using Event Traces. Silvia Breu and Jens Krinke. ASE 2004. [PDF]
  • Aspect Mining through the Formal Concept Analysis of Execution Traces. Paolo Tonella, Mariano Ceccato. WCRE 2004. [PDF]
  • Mining Control Patterns from Java Program Corpora. Deng-Jyi Chen, Chung-Chien Hwang, Shih-Kun Huang, and David T. K. Chen. JISE 2004. [PDF]
  • Automatically Inferring Temporal Properties for Program Evolution. Jinlin Yang and David Evans. ISSRE 2004. [PDF]
  • Recovering System Specific Rules from Software Repositories. Chadd Williams, and Jeffrey K. Hollingsworth. MSR 2005. [PDF]
  • Mining Aspects in Requirements. Américo Sampaio, Neil Loughran, Awais Rashid and Paul Rayson. Workshop on Early Aspects 2005. [PDF]
  • EA-Miner: A Tool for Automating Aspect-Oriented Requirements Identification. Americo Sampaio, Ruzanna Chitchyan, Awais Rashid, and Paul Rayson. ASE 2005 [PDF]
  • Timna: A Framework for Automatically Combining Aspect Mining Analyses. David Shepherd, Jeffrey Palm, Lori Pollock and Mark Chu-Carroll. ASE 2005. [PDF]
  • An Empirical Study of Code Clone Genealogies. Miryung Kim , Vibha Sazawal, David Notkin, and Gail C. Murphy. ESEC/FSE 2005. [PDF]
  • Extending Dynamic Aspect Mining with Static Information. Silvia Breu. SCAM 2005. [PDF]
  • Applying Webmining Techniques to Execution Traces to Support the Program Comprehension Process. Andy Zaidman, Toon Calders, Serge Demeyer, and Jan Paredaens. CSMR 2005. [PDF]
  • Mining Eclipse for Cross-Cutting Concerns. Silvia Breu and Thomas Zimmerman and Christian Lindig. MSR 2006. [PDF]
  • Examining the Evolution of Code Comments in PostgreSQL. Zhen Ming Jiang and Ahmed E. Hassan. MSR 2006 [PDF]
  • Understanding Software Application Interfaces via String Analysis. Evan Martin and Tao Xie. ICSE 2006 ER. [PDF]
  • Automatic Identification of Bug-Introducing Changes. Sunghun Kim, Thomas Zimmermann, Kai Pan, E. James Whitehead, Jr. ASE 2006 [PDF]
  • GPLAG: Detection of Software Plagiarism by Procedure Dependency Graph Analysis. Chao Liu, Chen Chen , Jiawei Han, and Philip Yu. KDD 2006. [PDF]
  • Mining Email Social Networks. Christian Bird, Alex Gourley, Prem Devanbu, Michael Gertz, and Anand Swaminathan. MSR 2006 [PDF]
  • Mining Security-sensitive Operations in Legacy Code using Concept Analysis. Vinod Ganapathy, David King, Trent Jaeger and Somesh Jha. ICSE 2007. [PDF]
  • Automated Inference of Pointcuts in Aspect-Oriented Refactoring. Prasanth Anbalagan and Tao Xie. ICSE 2007. [PDF]
  • Automatic Inference of Structural Changes for Matching Across Program Versions. Miryung Kim, David Notkin, and Dan Grossman. ICSE 2007. [PDF]
  • What Can OSS Mailing Lists Tell Us? A Preliminary Psychometric Text Analysis of the Apache Developer Mailing List. Peter C. Rigby, and Ahnmed E. Hassan. MSR 2007 [PDF]
  • Mining the Lexicon Used by Programmers during Sofware Evolution. G. Antoniol and Y. Gael and E. Merlo and Paolo Tonella. ICSM 2007 [PDF]