Tool

Overview

In this section, we will explain about:

Installation        # How to install the compliance checking tool.
Bridge Predicate    # How to define a bridge predicate.
Concrete Policy     # How to write a concrete policy.
Compliance Checking # How to perform a compliance checking.
Query Model         # Understanding authorization query models.

Installation

The following instruction should work on Unix-based system (Linux or Mac). First of all download the compliance checking tool, untar it and change the working directory:

$ tar xzvf tool.tar.gz
$ cd tool

This compliance checking tool is written in Python, so it needs a Python interpreter (either Python 2 or 3) properly installed in the system and additional libraries:

The Python library can be installed using pip. For example, the following command is to install PyYAML and PySMT in Linux shell (with super user privilege):

$ pip install PyYAML
$ pip install pysmt

After PySMT is installed, the SMT solver can be installed using pysmt-install:

$ pysmt-install --msat

The above command is to install MathSAT, and external libraries might be needed for the compilation process. To check the list of supported solvers, run:

$ pysmt-install --help

The PYTHONPATH should be set accordingly so that the SMT solver can be recognized by PySMT. Please consult PySMT documentation for more details.

Concrete Policy

The concrete policy that will be checked by the tool should be written in JSON format. This is an example of a concrete policy with 2 rules:

{
    "comment": "Health data processing",
    "type": "concrete",
    "rule": [
        {
            "effect": "allow",
            "subject": {
                "role": "Patient"
            },
            "action": "Read",
            "resource": {
                "type": ["medical_record", "prescription"]
            }
        },
        {
            "effect": "allow",
            "subject": {
                "role": "Hospital_Staff",
            },
            "action": ["Read", "Write", "Update"],
            "resource": {
                "type": "medical_record"
            },
            "environment": {
                "consent": true,
                "purpose": ["preventive_medicine", "medical_diagnosis", "provision_of_care"],
                "adequate_relevant_notexcessive": true,
                "accurate": true,
                "validity": true,
                "dpa_authorization": true
            }
        }
    ]
}

All of possible values for each entity must be listed in reference.json file:

{
  "comment": "Health data processing reference values",
  "type": "reference",
  "reference": {
      "subject_role": ["Health_Board", "General_Practitioner",
          "Hospital_Staff", "Patient"],
      "action": ["Read", "Write", "Update"],
      "resource_type": ["medical_record", "prescription"],
      "environment_purpose": ["preventive_medicine", "medical_diagnosis",
          "provision_of_care"],
      "environment_consent": [true, false],
      "environment_adequate_relevant_notexcessive_purpose": [true, false],
      "environment_accurate": [true, false],
      "environment_validity": [true, false],
      "environment_dpa_authorization": [true, false]
  }
}

We will put all of the JSON files in a sub folder inside the tool directory, and for this example we call it health:

dpd.json        # Abstract poliy from EU DPD
predicate.json  # Bridge predicate definition
concrete.json   # Concrete policy to be checked
reference.json  # Reference values for entities

Bridge Predicate

A bridge predicate is a very important component in our compliance checking system. Its main task is to connect the regulation or abstract terms to concrete terms. A correct predicate definition is the basic foundation to build a fully compliant concrete policy.

The bridge predicate is also expressed in a JSON format. Here is an example of predicate.json file:

{
    "comment": "Health data processing",
    "type": "predicate",
    "predicate": {
        "data_subject": [{
            "subject_role": "Patient"
        }],
        "health_professional_controller": [{
            "subject_role": ["Health_Board", "General_Practitioner"]
        }],
        "health_professional_processor": [{
            "subject_role": "Hospital_Staff"
        }],
        "sensitive_data": [{
            "resource_type": ["medical_record", "prescription"]
        }],
        "personal_data": [{
            "resource_type": ["medical_record", "prescription"]
        }],
        "process": [{
            "action": ["Read", "Write", "Update"]
        }],
        "access": [{
            "action": ["Read", "Write", "Update"]
        }],
        "health_purposes": [{
            "environment_purpose": ["preventive_medicine", "medical_diagnosis",
                    "provision_of_care"]
        }],
        "consent_true": [{
            "environment_consent": true
        }],
        "consent_false": [{
            "environment_consent": false
        }],
        "mandate": [
            {
                "subject_role": "Hospital_Staff",
                "action": ["Read", "Write", "Update"],
                "resource_type": "medical_record",
                "environment_purpose": ["preventive_medicine",
                    "medical_diagnosis", "provision_of_care"]
            },
            {
                "subject_role": "Hospital_Staff",
                "action": "Write",
                "resource_type": "prescription",
                "environment_purpose": ["preventive_medicine",
                    "medical_diagnosis", "provision_of_care"]
            }
        ],
        "empower": [
            {
                "subject_role": "Patient",
                "action": "Read",
                "resource_type": ["medical_record", "prescription"]
            }
        ],
        "data_quality": [
            {
                "environment_adequate_relevant_notexcessive": true,
                "environment_accurate": true,
                "environment_validity": true
            }
        ],
        "ms_requirements": [
            {
                "environment_dpa_authorization": true,
                "environment_consent": true
            }
        ]
    }
}

Compliance Checking

Once a concrete policy is ready, we can start checking the compliance with the abstract policy by running the following command:

$ python region.py health

This script will check the satisfiability of 9 regions. An example output:

Region check:
{
    "R1": false,
    "R2": false,
    "R3": true,
    "R4": true,
    "R5": false,
    "R6": false,
    "R7": false,
    "R8": false,
    "R9": false
}

Formula of non-empty regions:
{
    "R3": "Pc & Not(Nc) & Pa & Not(Na)",
    "R4": "Not(Pc) & Nc & Not(Pa) & Na"
}

To understand the result, we can take a look on the following Venn diagram. The goal of the compliance checking is to have the green region R3 and R4 non-empty (SAT). While neutral regions R7 and R8 are also allowed to be non-empty.

If the checking result gives a non-empty result in the red regions (R1, R2, R5 and R6), then we need the query model of these regions.

Query Model

When we have a non-empty region that should be empty, the tool can produce an authorization query model for the corresponding region. This model is useful for debugging which rule in concrete policy that makes a region non-empty. To obtain the query model, run this command:

$ python model.py health > health/model.csv

Then if we open the file model.csv, we can find a list of valid query models that makes a region non-empty. The csv file is better processed with a spreadsheet application, so we can sort a column quickly. These models can pinpoint either which rule is wrong or what rule is missing in the concrete policy. After modifying the concrete policy, we can rerun the compliance checking again and see the effect.

Benchmark

In order to evaluate the scalability of our approach, we conducted a test on random policy. We generate three different random policies based on three real scenarios, i.e., financial, marketing and health data processing. The financial scenario is a processing of employee personal data by a company, which then outsource the processing to a third party company for the purpose of salary, tax and pension computation. While in the marketing scenario, customer's personal data will be shared to marketing company, based on the customer's preference and consent for sharing. And for the health scenario, the local health care provider will determine how the patient can release their medical record to hospital medical staff or private clinic medical professional.

We simulate a random concrete policy for these scenarios and a fix abstract policy representing the EU DPD. The number of the concrete policy rules are increased and we observe the evaluation time of each region.

The diagram shows the results of our experiments on the three synthetic benchmarks: the x-axis report the number of disjuncts (called rules) in the formulae while the y-axis show the timings to check the emptiness of each one of the regions R1, ..., R9.

The first policy is a completely random policy (R), which is generated from a pre-defined value of entities. Whether the second policy is a random policy with a specific pattern taken from the minimal version policy (PR). In this case, the pattern is the percentage of subject role entity. And the last policy is randomly built with additional entities in the environment (SR), that will scale up a single rule. An example of the new environments are date of access and IP address.

The results were obtained on a PC with Intel i5-3340M 2.7 GHz processor and 4 GB of RAM, running Debian Linux with kernel version 4.7; the Python and PySMT version are 2.7.12 and 0.6.0, respectively, and the version of MathSAT is 5.3.13. Such results clearly show the scalability of our approach. On the one hand, less than 5 seconds are needed to perform a detailed evaluation of the compliance of an ABAC policy of 10,000 rules. On the other hand, the plot seems to suggest a linear behavior in the number of rules of the timings of the tool.