minecode

Improving Software Productivity and Quality via

Mining Program Source Code

[Summary] [People] [Publications] [Tutortials] [Presentations] [Software] [Links] [Sponsors]

PROJECT SUMMARY

Since late 90's, various data mining techniques have been applied to analyze software engineering data, and have achieved many noticeable successes. Substantial experience, development, and lessons of data mining for software engineering pose interesting challenges and opportunities for new research and development. This project develops novel techniques and tools for mining program source code by exploiting code search engines to expand the mining scope, adapting program model checking to collect static program traces for simulating runtime program behavior (without requiring system runtime setup or system test suites), and applying data mining techniques on collected program source code or its static traces to mine API usage patterns or properties. These mined API usage patterns or properties are used to find bugs in software systems or aid developers in correctly writing API client code when developing software systems.

PEOPLE

Faculty

Tao Xie (Principal Investigator)

Graduate Students

Madhuri R Marri (MS Student)

Suresh Thummalapenta (PhD Student)

Undergraduate Student (REU project)

Justin W. Gorham

Collaborators

Mithun Acharya (ABB Research)

PUBLICATIONS

  1. Hao Zhong, Suresh Thummalapenta, Tao Xie, Lu Zhang, and Qing Wang. Mining API Mapping for Language Migration. In Proceedings of the 32nd International Conference on Software Engineering (ICSE 2010), Cape Town, South Africa, May 2010. [BibTeX]
  2. Suresh Thummalapenta and Tao Xie. Alattin: Mining Alternative Patterns for Detecting Neglected Conditions. To appear in Proceedings of the 24th IEEE/ACM International Conference on Automated Software Engineering (ASE 2009), Auckland, New Zealand, November 2009. [BibTeX]
  3. Hao Zhong, Lu Zhang, Tao Xie, and Hong Mei. Inferring Resource Specifications from Natural Language API Documentation. To appear in Proceedings of the 24th IEEE/ACM International Conference on Automated Software Engineering (ASE 2009), Auckland, New Zealand, November 2009. [BibTeX]
  4. Best Paper Award and ACM SIGSOFT Distinguished Paper Award
  5. Nuo Li, Tao Xie, Nikolai Tillmann, Jonathan de Halleux, and Wolfram Schulte. Reggae: Automated Test Generation for Programs using Complex Regular Expressions. To appear in Proceedings of the 24th IEEE/ACM International Conference on Automated Software Engineering (ASE 2009), Short Paper, Auckland, New Zealand, November 2009. [BibTeX]
  6. Suresh Thummalapenta, Tao Xie, Nikolai Tillmann, Peli de Halleux, and Wolfram Schulte. MSeqGen: Object-Oriented Unit-Test Generation via Mining Source Code. To appear in Proceedings of the 7th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE 2009), Amsterdam, the Netherlands, August 2009. [PDF][BibTeX]
  7. Tao Xie, Suresh Thummalapenta, David Lo, and Chao Liu. Data Mining for Software Engineering. IEEE Computer, 42(8), pp.35-42, August 2009. [BibTeX]
  8. Hao Zhong, Tao Xie, Lu Zhang, Jian Pei, and Hong Mei. MAPO: Mining and Recommending API Usage Patterns. To appear in Proceedings of the 23rd European Conference on Object-Oriented Programming (ECOOP 2009), Genova, Italy, July 2009. [PDF][BibTeX]
  9. Tao Xie, Nikolai Tillmann, Peli de Halleux, and Wolfram Schulte. Fitness-Guided Path Exploration in Dynamic Symbolic Execution. To appear in Proceedings of the 39th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2009), Lisbon, Portugal, June-July 2009. [PDF][BibTeX]
    1. Suresh Thummalapenta and Tao Xie. Mining Exception-Handling Rules as Sequence Association Rules. To appear in Proceedings of the 31st International Conference on Software Engineering (ICSE 2009), Vancouver, Canada, May 2009. [PDF][BibTeX]
    2. Xiaoyin Wang, Lu Zhang, Tao Xie, Hong Mei, and Jiasu Sun. Locating Need-to-Translate Constant Strings for Software Internationalization. To appear in Proceedings of the 31st International Conference on Software Engineering (ICSE 2009), Vancouver, Canada, May 2009. [PDF][BibTeX]
  10. Xiaoyin Wang, Lu Zhang, Tao Xie, Hong Mei, and Jiasu Sun. TranStrL: An Automatic Need-to-Translate String Locator for Software Internationalization. To appear in Proceedings of the 31st International Conference on Software Engineering (ICSE 2009), Formal Demonstration, Vancouver, Canada, May 2009. [PDF][BibTeX]
  11. Wujie Zheng, Michael R. Lyu, and Tao Xie. Test Selection for Result Inspection via Mining Predicate Rules. In Proceedings of the 31st International Conference on Software Engineering (ICSE 2009), New Ideas and Emerging Results, Vancouver, Canada, pp. 219-222, May 2009
  12. Madhuri R Marri, Suresh Thummalapenta, and Tao Xie. Improving Software Quality via Code Searching and Mining. To appear in Proceedings of the First International Workshop on Search-Driven Development – Users, Infrastructure, Tools and Evaluation (SUITE 2009), Vancouver, Canada, May 2009. [PDF][BibTeX]
  13. Tao Xie, Nikolai Tillmann, Jonathan de Halleux, and Wolfram Schulte. Mutation Analysis of Parameterized Unit Tests. In Proceedings of the 4th International Workshop on Mutation Analysis (Mutation 2009), Denver, Colorado, pp. 177-181, April 2009. [PDF][BibTeX]
  14. Mithun Acharya and Tao Xie. Mining API Error-Handling Specifications from Source Code. To appear In Proceedings of International Conference on Fundamental Approaches to Software Engineering (FASE 2009), York, UK, March 2009. [PDF][BibTeX]
    1. Suresh Thummalapenta and Tao Xie. SpotWeb: Detecting Framework Hotspots and Coldspots via Mining Open Source Code on the Web. In Proceedings of the 23rd IEEE/ACM International Conference on Automated Software Engineering (ASE 2008), L'Aquila, Italy, pp. 327-336, September 2008. [BibTeX]
    2. A previous version appeared in Proceedings of MSR 2008 as a Position Paper.
    3. Suresh Thummalapenta and Tao Xie. NEGWeb: Detecting Neglected Conditions via Mining Programming Rules from Open Source Code. Presented as a Student Poster at International Symposium on Software Testing and Analysis (ISSTA 2008), Seattle, Washington, July 2008.
    4. Christoph Csallner, Yannis Smaragdakis, and Tao Xie. DSD-Crasher: A hybrid analysis tool for bug finding. ACM Transactions on Software Engineering and Methodology, Vol. 17, Issue 2, pp. 345-371, July 2008. [PDF][BibTeX]
    5. Xiaoyin Wang, Lu Zhang, Tao Xie, John Anvik, and Jiasu Sun. An Approach to Detecting Duplicate Bug Reports using Natural Language and Execution Information. In Proceedings of the 30th International Conference on Software Engineering (ICSE 2008), Leipzig, Germany, pp. 461-470, May 2008. [PDF][BibTeX]
    6. Suresh Thummalapenta and Tao Xie. SpotWeb: Detecting Framework Hotspots and Coldspots via Mining Open Source Code on the Web. To appear in Proceedings of the 23rd IEEE/ACM International Conference on Automated Software Engineering (ASE 2008), L'Aquila, Italy, September 2008. [PDF][BibTeX]
    7. Tao Xie, Mithun Acharya, Suresh Thummalapenta, and Kunal Taneja. Improving Software Reliability and Productivity via Mining Program Source Code. In Proceedings of the NSF Next Generation Software Program Workshop at IPDPS 2008 (NSFNGS 2008), Miami, Florida, April 2008. [PDF][BibTeX]
    8. Suresh Thummalapenta and Tao Xie. PARSEWeb: A Programmer Assistant for Reusing Open Source Code on the Web. In Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2007), Atlanta, Georgia, pp. 204-213, November 2007. [PDF][BibTeX]
    9. Mithun Acharya and Tao Xie. Static Detection of API Error-Handling Bugs via Mining Source Code. North Carolina State University Department of Computer Science Technical report TR-2007-35, October 15, 2007. [PDF][BibTex]
    10. Yoonki Song, Suresh Thummalapenta, and Tao Xie. UnitPlus: Assisting Developer Testing in Eclipse. In Proceedings of the Eclipse Technology eXchange Workshop at OOPSLA 2007 (ETX 2007), Montreal, Canada, Pages 26-30, October 2007. (Best Student Paper Award) [PDF][BibTeX]
    11. Suresh Thummalapenta. Exploiting code search engines to improve programmer productivity. In Proceedings of the 21th Annual ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (Companion) (OOPSLA 2007), ACM SIGPLAN Student Research Competition, Montreal, Canada, pp. 921-922, October 2007.[PDF][PPT]
    12. Suresh Thummalapenta and Tao Xie. NEGWeb: Static Defect Detection via Searching Billions of Lines of Open Source Code. North Carolina State University Department of Computer Science Technical report TR-2007-24, September 16, 2007. [PDF][BibTex]
    13. Mithun Acharya, Tao Xie, Jian Pei, and Jun Xu. Mining API Patterns as Partial Orders from Source Code: From Usage Scenarios to Specifications. In Proceedings of the 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE 2007), Dubrovnik, Croatia, pp. 25-34, September, 2007. [PDF][BibTeX]
    14. Mithun Acharya, Tao Xie, and Jun Xu. Mining Interface Specifications for Generating Checkable Robustness Properties. In Proceedings of the 17th IEEE International Conference on Software Reliability Engineering (ISSRE 2006), Raleigh, NC, pp. 311-320, November 2006. [PDF][BibTeX]
    15. Mithun Acharya. Automatic Inference of Interface Properties from Program Source Code. In Proceedings of the 14th ACM SIGSOFT Symposium on Foundations of Software Engineering (FSE 2006), Doctoral Symposium, Portland, Oregon, USA, November 2006. [PDF]
    16. Mithun Acharya. Automatic Generation and Inference of Interface Properties from Program Source Code. In Proceedings of the 20th Annual ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (Companion) (OOPSLA 2006), ACM SIGPLAN Student Research Competition, Portland, Oregon, USA, pp. 750-751, October 2006.
    17. Mithun Acharya, Tanu Sharma, Jun Xu, and Tao Xie. Effective Generation of Interface Robustness Properties for Static Analysis. In Proceedings of the 21st IEEE/ACM International Conference on Automated Software Engineering (ASE 2006), Short Paper, Tokyo, Japan, pp. 293-296, September 2006. [PDF][BibTeX]
    18. Mithun Acharya. Automatic Generation of Robustness and Security Properties from Program Source Code. In Supplemental Proceedings of the IEEE International Conference on Dependable Systems and Networks (DSN 2006), Student Forum, Philadelphia, PA, USA, pp. 166-168, June 2006. [PDF]
    19. Tao Xie and Jian Pei. MAPO: Mining API Usages from Open Source Repositories. In Proceedings of the 3rd International Workshop on Mining Software Repositories (MSR 2006), Shanghai, China, pp. 54-57, May 2006. [PDF][BibTeX][Slides]

TUTORIALS/COURSE MODULES/ORGANIZED EVENTS

    1. Ahmed E. Hassan and Tao Xie. Mining Software Engineering Data. In Proceedings of the 32nd International Conference on Software Engineering (ICSE 2010), Tutorials, Cape Town, South Africa, May 2010. [Tutorial Web][BibTeX]
    2. Tao Xie and Ahmed E. Hassan. Mining Software Engineering Data. Presented at the 31st International Conference on Software Engineering (ICSE 2009), Tutorials, Vancouver, Canada, May 2009. [Tutorial Web][BibTeX]
    3. Ahmed E. Hassan and Tao Xie. Mining Software Engineering Data. In Proceedings of the 30th International Conference on Software Engineering (ICSE 2008), Companion Volume,Tutorials, Leipzig, Germany, May 2008. [Tutorial Web][BibTeX]
    4. Chao Liu, Tao Xie, and Jiawei Han. Mining for Software Reliability. In Proceedings of the 2007 IEEE International Conference on Data Mining (ICDM 2007), Omaha, NE, October 2007. [Tutorial Web][BibTeX]
    5. Tao Xie, Jian Pei, and Ahmed E. Hassan. Mining Software Engineering Data. In Proceedings of the 29th International Conference on Software Engineering (ICSE 2007), Companion Volume, Tutorials, Minneapolis, MN, pp. 172-173, May 2007. [Tutorial Web][PDF][BibTeX]
    6. Tao Xie and Jian Pei. Data Mining for Software Engineering. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2006), Tutorial, Philadelphia, Pennsylvania, August 2006
    7. Tao Xie, Co-Organizer (with Abraham Bernstein, Harald Gall, and Andreas Zeller), Dagstuhl Seminar on Mining Programs and Processes (Dagstuhl Seminar 07491)
  1. . [Tutorial Web][Slides][BibTeX]
    1. Tao Xie. Data Mining III - Text Mining course module. the Master of Science in Analytics (MSA) program, the Institute for Advanced Analytics, North Carolina State University, January-February 2008.

PRESENTATIONS

    1. Tao Xie. Improving Software Reliability via Automated Static and Dynamic Analysis. Invited talk, FDA Office of Science and Engineering Laboratories, Electrical and Software Engineering Group, October 2008.
    2. Tao Xie. Improving Software Productivity and Quality via Mining Program Source Code. Invited talk, Department of Computer Science, Drexel University, Philadelphia, PA, September 2008.
    3. Tao Xie. Searching and Mining Open Source Code from the Web. ASE 2008 PC Workshop presentation. Google Tech Talks. Mountain View, CA, June 2008.
    4. Tao Xie. Mining Software Engineering Data. Tutorial presentation, the 30th International Conference on Software Engineering (ICSE 2008), Leipzig, Germany, May 2008.
    5. Tao Xie. SpotWeb: Detecting Framework Hotspots via Mining Open Source Repositories on the Web. Conference presentation. the 5th Working Conference on Mining Software Repositories (MSR 2008), Leipzig, Germany, May 2008.
    6. Tao Xie. Improving Software Productivity and Quality via Mining Program Source Code. Invited talk, Department of Electrical and Computer Engineering, Clarkson University, Potsdam, NY, April 2008.
    7. Tao Xie. Improving Software Reliability and Productivity via Mining Program Source Code. Workshop presentation. the NSF Next Generation Software Program Workshop at IPDPS 2008 (NSFNGS 2008), Miami, Florida, April 2008.
    8. Tao Xie. Recommendation Systems for Code Reuse. Workshop talk, Bellairs Workshop On Software Analysis for Recommendation Systems (SARS 2008), Barbados, February, 2008. [Slides]
    9. Suresh Thummalapenta. PARSEWeb: A Programmer Assistant for Reusing Open Source Code on the Web. Conference presentation, the 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2007), Atlanta, Georgia, November 2007.
    10. Tao Xie. Improving Software Productivity and Quality via Mining Program Source Code. Invited talk, Accenture Labs, Chicago, IL, October 2007.
    11. Tao Xie. Improving Software Productivity and Quality via Mining Program Source Code. Invited talk, Motorola Labs, Schaumburg, IL, October 2007.
    12. Tao Xie. Improving Automation in Developer Testing: Achievements and Challenges. Conference talk, International Verify Conference (Verify 2007), Arlington, VA, October 2007.
    13. Suresh Thummalapenta. Exploiting code search engines to improve programmer productivity. Conference ACM SIGPLAN SRC SRC presentation, the 21th Annual ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (Companion) (OOPSLA 2007), ACM SIGPLAN Student Research Competition, Montreal, Canada, October 2006.
    14. Tao Xie. Improving Automation in Developer Testing: Achievements and Challenges. Conference talk, Triangle Information Systems Quality Association Conference (TISQA 2007), Chapel Hill, NC, September 2007.
    15. Mithun Acharya. Mining API Patterns as Partial Orders from Source Code: From Usage Scenarios to Specifications. Conference presentation, the 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE 2007), Dubrovnik, Croatia, September, 2007.
    16. Tao Xie. Improving Software Productivity and Quality via Mining Program Source Code. Invited talk, Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV, September 2007.
    17. Tao Xie. Improving Programmer Productivity via Mining Program Source Code. Invited talk, Department of Computer Science and Engineering, Hong Kong University of Science and Technology, China, August 2007.
    18. Tao Xie. Improving Programmer Productivity via Mining Program Source Code. Invited talk, Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China, August 2007.
    19. Tao Xie. Mining Software Engineering Data. Invited talk, Software Engineering Institute, Peking University, Beijing, China, July 2007.
    20. Tao Xie. Improving Programmer Productivity via Mining Program Source Code. Invited talk, Department of Computer Science, University of Calgary, Canada, May 2007.
    21. Mithun Acharya. Mining Interface Specifications for Generating Checkable Robustness Properties. Conference presentation, the 17th IEEE International Conference on Software Reliability Engineering (ISSRE 2006), Raleigh, NC, November 2006.
    22. Mithun Acharya. Automatic Inference of Interface Properties from Program Source Code. Conference doctoral symposium presentation, the 14th ACM SIGSOFT Symposium on Foundations of Software Engineering (FSE 2006), Doctoral Symposium, Portland, Oregon, USA, November 2006
    23. Mithun Acharya. Automatic Generation and Inference of Interface Properties from Program Source Code. Conference ACM SIGPLAN SRC SRC presentation, the 20th Annual ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (Companion) (OOPSLA 2006), ACM SIGPLAN Student Research Competition, Portland, Oregon, USA, October 2006.
    24. Mithun Acharya. Effective Generation of Interface Robustness Properties for Static Analysis. Conference poster presentation, the 21st IEEE/ACM International Conference on Automated Software Engineering (ASE 2006), Tokyo, Japan, September 2006.
    25. Mithun Acharya. Automatic Generation of Robustness and Security Properties from Program Source Code. Conference student forum presentation, the IEEE International Conference on Dependable Systems and Networks (DSN 2006), Student Forum, Philadelphia, PA, USA, June 2006
    26. Tao Xie. Data Mining for Software Engineering. Visit talk, Fudan University, China, May 2006.
    27. Tao Xie. MAPO: Mining API Usages from Open Source Repositories. Workshop presentation, the 3rd International Workshop on Mining Software Repositories (MSR 2006), Shanghai, China, May 2006. [Slides]

SOFTWARE & EVALUATION SUBJECTS/RESULTS

LINKS

Bibliography on Mining Software Engineering Data

SPONSORS

Army Research Office Award W911NF-08-1-0443 (09/08/2008-08/30/2011)

National Science Foundation Award CNS-0720641, Computer Systems Research (CSR) Program (08/01/2007-07/31/2008)

Army Research Office Award W911NF-07-1-0431, Short Term Innovative Research (STIR) Program (06/18/2007-03/17/2008)

NCSU COE

College of Engineering and Department of Computer Science, North Carolina State University Undergraduate Research Award (01/24/2008-05/16/2008)