Text Analytics in Software Engineering

We develop novel natural language processing and text mining techniques to (1) extract security policies [1] such as access control policies from natural language documents such as requirements documents, (2) extract risk items and risk association rules [2] from risk analysis documents, (3) identify security bug reports [3] and detect duplicate bug reports [4] among bug reports, (4) extract programming rules [5] from API documents. His team co-developed and released NIST/NCSU Access Control Policy Test (ACPT) Tool, which has been beta-tested in various defense sectors.

Principal Investigator

PhD Students

Subprojects

Publications

1. Xusheng Xiao, Amit Paradkar, and Tao Xie. Automated Extraction and Validation of Security Policies from Natural-Language Documents. North Carolina State University Department of Computer Science Technical report TR-2011-7, March 15, 2011. [PDF]

2. LiGuo Huang, Daniel Port, Liang Wang, Tao Xie, and Tim Menzies. Text Mining in Supporting Software Systems Risk Assurance. In Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering (ASE 2010), Short Paper, Antwerp, Belgium, pages 163-166, September 2010. [PDF]

3. Michael Gegick, Pete Rotella, and Tao Xie. Identifying Security Bug Reports via Text Mining: An Industrial Case Study. In Proceedings of the 7th Working Conference on Mining Software Repositories (MSR 2010), Cape Town, South Africa, pp. 11-20, May 2010. [PDF]

4. Xiaoyin Wang, Lu Zhang, Tao Xie, John Anvik, and Jiasu Sun. An Approach to Detecting Duplicate Bug Reports using Natural Language and Execution Information. In Proceedings of the 30th International Conference on Software Engineering (ICSE 2008), Leipzig, Germany, pp. 461-470, May 2008. [PDF]

5. Hao Zhong, Lu Zhang, Tao Xie, and Hong Mei. Inferring Resource Specifications from Natural Language API Documentation. In Proceedings of the 24th IEEE/ACM International Conference on Automated Software Engineering (ASE 2009), Auckland, New Zealand, pp. 307-318, November 2009. Award Best Paper Award and ACM SIGSOFT Distinguished Paper Award [PDF]