Provenance-based Security and Transparent Computing

A workshop inspired by the DARPA Transparent Computing program at Provenance Week, June 6, 2016, Washington DC.

Organizing Committee

  • James Cheney (chair), University of Edinburgh
  • David W. Archer, Galois, Inc.
  • Ashish Gehani, SRI International
  • Yingbo Song, BAE Systems
  • Mukesh Dalal, BAE Systems


Transparent Computing is a DARPA research program aimed at using provenance to improve the security of computer systems. Specifically, the program aims to use pervasive provenance tracking for data and threads of execution to understand causality in both individual systems and networks of systems. As a practical use case and concrete way to demonstrate effectiveness, the program aims to use that causality to identify and combat advanced persistent threats (APTs) against such systems. APTs are often undetectable by policy-based monitoring or current event log analysis techniques for two key reasons: they act slowly (over months at a time) to gain and persist access to target system resources; and they are careful not to violate either system security policies or user work patterns that would betray their presence.

This workshop aims to provide a forum for discussing recent developments in the Transparent Computing project and their relationship to the broader field of provenance research.


Today, enterprise system and network behaviors are typically “opaque”: stakeholders lack the ability to assert causal linkages in running code, except in very simple cases. At best, event logs and audit trails can offer some partial information on temporally and spatially localized events as seen from the viewpoint of individual applications. Thus current techniques give operators little system-wide situational awareness, nor any viewpoint informed by a long-term perspective. A system- (or network-) wide, longer term view of meaningful activities is important because certain emerging classes of threats, such as APTs, adopt courses of action that are temporally dispersed and mimic benign, normal system activity: from any temporally or spatially local view, their individual actions do not appear suspicious. Such a wide, longer-term perspective can identify unwanted behavior patterns and their points of origin in system time and space.

One way to discover such broad patterns is to map out global, long-term causality in systems and networks by constructing provenance graphs for system data and control flow. However, successful and timely construction and analysis of such provenance graphs faces several challenges. The first of these is scalability: the ability to handle large volumes of local causality data at high bandwidth (many GBytes per hour over periods of weeks to months), as generated by sensors in systems and networks under observation. The second challenge is how to generate rich provenance graphs that assemble local, short-term causality evidence into global, long-horizon causal networks. The third key challenge is abstracting these graphs into smaller, more intuitive structures, and reasoning over such structures at system operational rates to characterize system activity. The fourth key challenge is to distinguish APT activity from normal background activity.

Topics of Interest

We invite short talks on a number of themes relevant to the challenges described above. We encourage talks from the broader provenance community as well as talks from participants in the DARPA Transparent Computing program. The following is a partial list of themes that would fall into the scope of our workshop:

  • An overview of the Transparent Computing program

  • Overviews of active projects in the portion of the Transparent Computing program relevant to constructing and analyzing provenance graphs

  • Data models for tracking computing system and network entities, their provenance, and their relationships to other entities

  • Approaches to handling high-bandwidth, large-volume system event and entity data, and practical construction of provenance graphs at these data rates and volumes

  • Algorithms for constructing provenance graphs from such system monitoring data

  • Models for efficiently abstracting low-level system monitoring data into intuitive patterns of behavior that support reasoning about threats vs. benign system activities

  • Methods for efficiently deducing provenance of such higher-level behavior patterns from provenance of their low-level precursors

  • Approaches to identifying malicious vs. benign behavior patterns by leveraging this provenance information

Authors are strongly encouraged, where appropriate, to make an explicit link between requirements and application needs.

Workshop Format

Our workshop will be structured as follows.

  • An invited keynote talk describing current and emerging challenges in this area and approaches that address them

  • A number of short (from 5- to 15-minute) “Lightning talks” grouped by the themes outlined above

  • Open discussion about promising directions to address the 4 key challenges described above


  • May 11, 2016: Deadline for submission

  • May 15, 2016: Workshop programme published

  • May 20, 2016: Registration closes

  • June 6, 2016: Workshop

Submission Procedure

Please submit plain text abstracts of about half of one page to the program committee chair at jcheney@inf.ed.ac.uk . Multiple submissions for different experiences and/or requirements are welcome.



Combining computational provenance and anomaly detection to reveal cautious, persistent threats
David W. Archer (Galois, Inc.), James Cheney (University of Edinburgh), Hoda Eldardiry (PARC, a Xerox Company), Rui Filipe Lima Maranhão de Abreu (PARC, a Xerox Company), Alan Fern (Oregon State University)

Advanced persistent threats confound typical threat detection. Their “slow and low” approach specifically aims to imitate normal system operations and user practices, preventing discovery by typical intrusion detection systems and so-called policy-based security. As a result, such threats continue to be discovered, but only after causing significant harm by exfiltration of sensitive data or corruption of business processes. In this talk we describe ongoing work as part of the DARPA Transparent Computing program on a different approach to APT detection. This approach combines three technical efforts into an integrated delection system. First is the construction of graphs that represent the provenance of computation activities (such as the execution of library functions) and persistent objects (such as files or network connections). Second is the identification of cliques of system behavior that we call Segments and characterization of statistical anomalies in them. Third is recognition of abstract patterns of system activity and mapping of those pattern instances to a grammar that describes persistent threat behaviors. We describe the architecture of our system, our approaches on each of these technical efforts, and preliminary results to date.

Tagging and Tracking of Multilevel Host Events for Transparent Computing and Information Assurance
Team members: Wenke Lee, Taesoo Kim, Alex Orso, Simon Chung, Sangho Lee, Trent Brunson, Evan Downing, Mattia Fazzini, Yang Ji, Weiren Wang (Georgia Institute of Technology)

    Traditional network security defenses cannot protect against advanced persistent threats (APTs), because they lack the visibility and provenance-checking needed to authenticate user, program, and operating system activities.  To detect APTs, we propose THEIA, a security analysis system for tagging and tracking multilevel host events and data. THEIA records events at three layers: user interaction with a program, program processing of input, and program and network interactions with the operating system. THEIA achieves both high accuracy and high efficiency at runtime by avoiding computation-heavy tag analyses to the system’s execution and by performing thorough analysis while replaying the recorded events at various levels of details that are decoupled from the system’s execution.
    THEIA has its internal provenance model called the Action History Graph (AHG) to track the causalities between kernel objects such as processes and files. The AHG is constructed at sub real time of the recording by scanning the recorded trace of system calls. Though many attacks can be detected by analyzing the AHG, we are able to further provide fine-grained causalities via instruction-level taint analysis to detect the finest movement in the attack. Particularly, we are currently focusing on two Dynamic Information Flow Tracking (DIFT)-related analyses: refinement-based DIFT and trace-based DIFT. Refinement-based DIFT quickly provides coarse-grained results, then magnifies part of them in its refinement loop. Trace-based DIFT analysis improves the performance by extracting taint propagation summaries from the execution trace and then attempting to reuse them across different execution traces.

Title: From DTrace to provenance: bridging the semantic gap
Authors: Lucian Carata, Ripduman Sohan, Robert Watson (University of Cambridge), Arun Thomas (BAE Systems)


A number of existing system tools such as DTrace, SystemTap and perf have been developed for understanding the activity and behaviour of applications from the low-level perspective of the operating system. However, they are mostly designed for providing the mechanisms for short, targeted investigations (either performance-related or forensics-oriented) by users that are familiar with the underpinnings of their systems. In contrast, the identification and analysis of persistent threats requires long-term comprehensive tracing of all aspects related to a given system and its communications. Such analysis should ideally be done without requiring expertise regarding the internals of the operating system, with most of the effort being spent on higher-level reasoning.

Our current work focuses on augmenting DTrace and our own provenance-capturing system, OPUS, for meeting those demands. The main challenge addressed is one of mapping the set of events logged at the OS level into actual semantically-meaningful relationships between entities known in user-space (such as files and paths, network transfers, execution of binaries). Building on our previous research on the Provenance Versioning Model (PVM), low-level OS events are used to build a provenance graph describing the fine-grained relationships amongst those entities according to well-defined semantics. This can be later used for performing complex queries about the system, its evolution, and the flow of data as it was used and transformed by various applications.


ProvenanceWeek 2016, June 6-9, 2016, is being hosted by The MITRE Corporation in McLean, Virginia, USA, a short metro ride from Washington D.C. The workshops IPAW and TAPP will be co-located during the week. The workshop "Provenance Based Security and Transparent Computing" will take place on the morning of June 6. Note that registrations close on May 20! All registered attendees will be listed on the workshop Web site. Registration is through the Provenance Week registration page. Participants are cordially invited to register for subsequent Provenance Week events.