text2policy

Text2Policy: Automated Extraction of Security Policies from Natural-Language Software Documents

Summary

Access Control Policies (ACP) specify which principals such as users have access to which resources. Ensuring the correctness and consistency of ACPs is crucial to prevent security vulnerabilities. However, in practice, ACPs are commonly written in Natural Language (NL) and buried in large documents such as requirements documents, not directly checkable for correctness and consistency. It is tedious to manually extract ACPs from these NL documents, and validate NL functional requirements such as use cases against ACPs for detecting inconsistencies. To address these issues, we propose an approach to automatically extract ACPs from NL software documents and resource-access information from NL scenario-based functional requirements. We conducted three evaluations on the collected ACP sentences from publicly available sources, and use cases from both open source and proprietary projects. The results show that Text2Policy effectively identifies ACP sentences with the precision of 88.7% and the recall of 89.4%, extracts ACP rules with the accuracy of 86.3%, and extracts action steps with the accuracy of 81.9%.

People

Faculty
Tao Xie
Graduate Students
Xusheng Xiao
Researchers (IBM T.J. Watson Research Center)
Amit Paradkar Suresh Thummalapenta

Publications

Xusheng Xiao, Amit Paradkar, Suresh Thummalapenta, Tao Xie. Automated Extraction of Security Policies from Natural-Language Software Documents. In Proceedings of the 20th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE 2012), Cary, NC, November 2012.
[PDF][BibTeX]

Approach

Our approach accepts NL software documents as input and applies linguistic analysisto parse the NL software documents and annotate the sentences with semantic meaning for words and phrases. Using the annotated sentences, our approach construct model instances. Based on provided transformation rules, our approach transforms the model instances into formal specifications, which can be automaticallychecked for correctness and consistencies. Figure 1 shows the overview of our approach.

Figure 1. Overview of Text2Policy

Figure 2 illustrates how our approach transforms an ACP rule written in NL into a policy rule written in XACML.

Figure 2. Example extraction of ACP rule

Implementation

The graphical user interface (GUI) of the current implementation of Text2Policy is shown in Figure 1. There are three main components of the GUI: (A) the model instances of extracted ACPs and action steps; (B) the editor of use case text; (C) the warning/error view shows the detected inconsistencies between ACPs and action steps.

Figure 3. Screenshot of Text2Policy

Evaluation Results

Subjects

First, we used 37 use cases from iTrust. iTrust is an open source health-care application that provides various features such as maintaining medical history of patients, storing communications with doctors, identifying primary caregivers, and sharing satisfaction results. The requirements documents and source code of iTrust are publicly available in its website. iTrust requirements specification has 37 use cases, 448 use-case sentences, 10 non-functional-requirement sentences, and 8 constraint sentences. The iTrust requirements specification also has a section, called Glossary, that describes the roles (users) that interact with the system.

We preprocessed the iTrust use cases so that the format of the use cases can be processed by Text2Policy. In particular, we remove symbols (e.g., [E1] and [S1]) that cannot be parsed by our approach. We replace some names with comments quoted in parenthesis. For example, when we see A user (an LHCP or patient), we replace A user with an LHCP or patient. We break down sentences by replacing / with or. We break down long sentences that span more than 2 or 3 lines, since such style affects the precision of shallow parsing.

Second, we collected 100 ACP sentences from 17 sources (published articles and public websites). These ACP sentences and 117 NL ACP rules from the iTrust use cases are the subjects for our evaluation to address RQ2.

Third, we used 25 use cases from a module in a proprietary IBM enterprise application. For confidentiality reasons, we refer to this application as IBMApp. This module belongs to the financial domain.

Downloadable Subjects:

iTrust Use Cases
Collected ACP sentences
Extracted Policy Examples

Page updated

Google Sites

Report abuse