Team blog‎ > ‎

Understanding Belay with Alloy

posted Mar 27, 2012, 4:29 PM by jpolitz@google.com   [ updated Mar 28, 2012, 9:15 AM ]
In a previous post, we gave an overview of routing in Belay, and how we used Alloy to get a more precise understanding of what could go wrong with Belay's launch operation.  In this post, we go in-depth into the steps we went through to build up the model, and elaborate on some of the choices we made along the way.  If you'd like to follow along in code, you should first get Alloy, and download the example code, which has separate files for most of the steps along the way, referred to as they come up.

The World and its Parts

In the informal discussion, we used a number of terms to describe the interacting pieces of Belay's routing.  We at least used the concepts of windows, the browser and Belay IFramesinstancesstrings, and identities.  We also assumed that only one thing was happening at a time, in a sequence of serialized, atomic steps.  We need to build up these notions in Alloy's description language.

Time

We'll start our exploration by defining a notion of time.  Time is a linear series of steps, which Alloy has built-in support for expressing.  The Alloy incantation we'll begin with is:

open util/ordering[Time] as ord
sig Time {}

The sig declaration adds a new kind of element to the world, so Alloy will search models with elements that belong to the Time set (in models, they will appear as Time0Time1, and so on).  The incantation on the first line imposes constraints on the elements of Time, so that there is exactly one Time in the relation called first, one in the relation called last, and all other Times make a straight line from first to last via a two-place relation called next.  We can ask Alloy to instantiate the model we've described with run:

run {} for 4

This tells Alloy to instantiate the world we've described so far and show us what it looks like.  To run, choose "Execute"->"run $1 for 4" from the application menu. With some fiddling of visualization (I had to open up the Theme pane, uncheck "Hide private sigs/relations" and choose ord from the "Projected over..." dropdown), we get a nice picture of time (just-time.als):

Windows and Instances

Time lets us talk about the state of the world changing in discrete events.  Now we need to populate the world with interesting elements.  We're going to keep the model as simple as possible, and we'll see that complexity quickly emerges from even a minimal specification.

In Belay, the notion of an instance isn't precisely defined.  When we say instance with respect to an application on the web, we mean some persistent collection of data and behavior associated with a user's interaction with the application.  This might be a single blog, a whole email inbox, or a shopping cart.  There are two important features of an instance:
  1. It has some notion of persistent identity over time;
  2. It knows some set of data that might include instance-specific secrets, be partially shared with other instances, and change over time.
We abstract away all other details, and merely focus on these two features in our model:

sig Token {}

sig Instance {
  identity: Token,
  knownTokens: Token set->set Time
}

We choose to represent all data with a single kind of element, Token.  This is an out-of-model description: We really mean that all data, all strings, all secretly-generated UUIDs, all JSON strings, are represented by sets of Tokens in the model.  This uniformity is both coarse and useful.  It will be difficult to miss a kind leak of information if all information can flow over every channel and be stored everywhere!

The definition of Instance, then, has two Token-based pieces.  An Instance has a single identity, for all Time, there is a single piece of data that describes its identity.  This is our model of the persistence of (real-world) instances.  Along with its identity, an instance knows some set of other information.  The information the instance knows isn't the same at all times, however; it can change over time.  The declaration 

knownTokens: Token set->set Time

says that an instances knownTokens are a many-to-many mapping between Tokens and Times.  That is, for a given token, an instance might know that token at many different times.  The inverse is true as well: for a given time, an instance might know a number of different tokens.

Given this definition, we can ask Alloy to instantiate examples for us:

run { } for 3 Time, exactly 1 Instance, exactly 2 Token

(tokens-and-instances-withfact.als) (Note:  You might not get this example as the first one when you click "Execute".  Try clicking "Next" in the toolbar to explore multiple examples.)

In this instantiation, we've told Alloy to only generate a single Instance, which is in the oval above.  Its identity is Token1, but the set of Tokens that it knows changes over time.  At Time0 and Time1 it knows about Token0, but at Time2 it doesn't, and has "forgotten" about Token0.  This could be a model of a instance that allowed the user to delete some data between Time1 and Time2, for example.  The Next button at the top of the frame can be used to explore other instantiations, as this isn't the only instantiation for 3 Time steps, one Instance, and two Tokens.

There's something a little odd about this instantiation, though.  According to knownTokens, the instance isn't aware of its own identity at any time.  Instances often need to report their identity in order to grant BCAPs that point back to them.  Indeed, an instance is the only entity that actually knows its true identity.  We enforce this by adding a constraint to our model:

fact IdentityIsAlwaysKnown {
  all i:Instance, t:Time | i.identity in i.knownTokens.t
}

This constrains instances in our model to always know their identity at all times.  Another way to think about it is that this fact tells Alloy that we aren't interested in exploring instantiations where instances don't know their own identity.  Framing it that way makes it clear that facts are a little scary:  they can cause Alloy to miss interesting examples if we over-constrain the model.  Any fact we add should be carefully scrutinized to make sure that it only excludes truly nonsensical instantiations.  If we run again, with this fact added, we get a different picture:

In this instantiation, the Instance always knows Token1, which is its identity.  It happens that at Time2, it also knows Token0 which we can think of it "learning" somehow between Time1 and Time2.

The arrows can get a bit overwhelming in these pictures.  We can also project the visualization over Time (the "Projected over ord/Ord" is a drop-down), and step through one Time at a time, using the << and >> buttons at the bottom.  Here's the same instantiation in that view:


Instances look reasonable for now, so we can move on to thinking about windows in the browser.  Windows actually have rich and complex behavior in browsers, with respect to navigation, events, the same origin policy, including IFrames within them, and so on.  We are going to boil all of those details down to almost nothing, and say that we only care about two features for a window:
  1. Whether or not the window is open (if it exists at a given point in time),
  2. What instance is loaded into the window.
In Alloy, we write:

sig Window {
  exists: set Time,
  atInstance: Instance lone->set Time
}

A window might exist at several different points in time (it doesn't just flash into and out of existence), and it may have a particular Instance loaded into it at certain points in time.  It doesn't ever have multiple instances loaded at the same time.  Both of these assumptions are quite simplistic, and we might want to consider relaxing them.  For example, a window might have an iframe that is running a different instance than the master instance, and this might confuse part of the routing fabric.  We acknowledge our simplistic assumptions here.  Instantiating the model with some windows yields interesting results (with-windows.als):

run { } for 3 Time, exactly 1 Instance, exactly 2 Token, exactly 2 Window

In this instantiation, Window0 is visiting Instance at Time0, but it doesn't exist!  No real-world situation corresponds to this, so it needs to be fixed:

fact InstancesVisitedByExistingWindows {
  all w:Window, t:Time | some w.atInstance.t => w in exists.t
}

This forces windows to be open when they are visiting instances (with-windows-correct-instances.als).

The Browser and Routing

Finally, we need to describe the Belay browser.  We don't get into the details of the iframe implementation of routing here, and assume that Belay-style routing is built-in to the browser.  That is, the browser itself keeps track of the map from tokens to windows:

one sig Browser {
  routeMap: (Token lone->lone Window) set->set Time
}

The "one sig" says that there is only ever one Browser in an instance.  We don't consider any multi-browser communication here.  The routeMap in the browser, at any given time, may have many functional mappings from Tokens to Windows.  Further, the same mapping is free to persist across multiple Times.

If we start to instantiate this model, we start to get some pretty weird behaviors.  The browser adds and removes mappings from its routeMap arbitrarily, Windows jump to difference instances, and Instances gain and lose arbitrary information.  We could try to constrain all this behavior with facts, but instead, we will take a more principled approach to describing the behavior of a browser over time.

State Transitions

We are going to model the browser as a state machine that nondeterministically makes one of several transitions at every time step.  If the browser starts in some acceptable initial state, each step should take the browser to some acceptable next state.  If it doesn't, and the transition is an accurate model of our implementation, then there is something we need to pay attention to in Belay's routing.

Initialization

To initialize the browser, we'll start with it in a "just opened" state.  No windows are open, no routes are mapped, and no instances are loaded (init-example.als):

pred init[t:Time] {
  Browser.routeMap.t = none->none
  exists.t = none
  Window.atInstance.t = none
}
run init for 1 Time, exactly 2 Instance, exactly 2 Token, exactly 2 Window

Note that we don't constrain the shape of Instances in an initial configuration (beyond the fact we defined above).  They are free to know any set of Tokens, and have any Token as their identity.  These sets might overlap or be shared between Instances, as they are in the above instantiation.  In fact, if we were willing to be more precise, we might be able to be less draconian with our model of browser initialization.  In this simple model, we start with a clean slate for the browser, and acknowledge that a richer model might help us explore more scenarios.

Transitions

Given an initial state, we should now define how we can update the state.  Traditionally, Alloy models describe state update with a predicate of (at least) two places, describing the state before and after the update.  This is the approach we take, and our pre- and post- states are both elements of Time.  We will consider three kinds of transitions:
  1. A trusted instance launch, such as when Belay launches Station from trusted local storage;
  2. An untrusted launch, when one window requests that Belay launch an instance in another window;
  3. A capability invocation, where one instance passes information to an instance in another window via Belay by providing that window's instance id.
Trusted Launch: We'll start at the beginning, and describe the time before and after a trusted launch operation.  Before diving in to code, we should think about what a trusted launch is:
  • Belay knows the identity of whatever trusted instance it is launching,
  • Belay can update its routeMap correctly for later invocations,
  • After launching, this addition should be the only change to the routeMap,
  • After launching, there should be only one additional Window open, which has the Instance in question loaded,
  • After launching, no Instance should know any more Tokens than it did before the launch.
These intuitions can be translated almost directly into code:

pred launchTrustedIdentity[t, t': Time, w: Window, i: Instance] {
  t' = t.next

  // this window has never been open before
  not (w in exists.(t.*prev))

  // this window is the only new open window, all others are open still
  exists.t' = (exists.t + w)

  // this window holds instance i right now, others are the same
  atInstance.t' = (atInstance.t + w->i)

  // identity isn't mapped to already (don't open same Instance twice!)
  not (i.identity in Browser.routeMap.(t.*prev).Window)

  // consistent knownTokens
  knownTokens.t' = knownTokens.t

  // this instance is now mapped to by its identity
  Browser.routeMap.t' = (Browser.routeMap.t + (i.identity)->w)
}

Here's what a such a launch looks like, if we start from an good initial state, ensured by init[t] (launch-example.als):

run {
  some t, t': Time, w: Window, i: Instance {
    init[t] && launchTrustedIdentity[t, t', w, i]
  }
} for 2 Time, exactly 1 Instance, exactly 1 Token, exactly 1 Window


Initially, no windows exist, and the only Instance only has it's identity as a knownToken.  The launch adds the mapping to the routeMap, and springs the Window into existence with the Instance loaded.  The launch is trusted because we assume that the browser has some way to get the identity "right", without being informed by one of the windows or instances.

Checking, not Constraining: An interesting thought comes up here relative to one of our facts from earlier.  We explicitly stated, via a fact constraint, that Windows couldn't be at an Instance unless they existed, like so:

fact InstancesVisitedByExistingWindows {
  all w:Window, t:Time | some w.atInstance.t => w in exists.t
}

If we're switching to a model where we talk about models that satisfy state transitions, this seems less like a fact about the world, and more like something we'd like to verify of our state transitions.  That is, we should check, not assume, that InstancesVisitedByExistingWindows holds after each transition.  We can use Alloy to describe what a trace of events looks like:

pred trace[] {
  init[first]
  all t:Time - last {
    (some w:Window, i:Instance | launchTrustedIdentity[t, t.next, w, i])
  }
}

The trace[] predicate says that at the beginning of time, init is satisfied, and at all subsequent times, the state update is the result of a trusted launch.  Given this, we can take our fact from above, and turn it into an assertion about traces:

assert InstancesVisitedByExistingWindows {
  trace[] =>
  all w:Window, t:Time | some w.atInstance.t => w in exists.t
}
check InstancesVisitedByExistingWindows for 4

No counterexample found.  Assertion may be valid.  193ms.

Now, we've removed the constraint on all Windows from before, and changed it into a property we can check of our state transition model.  This is a great improvement: we've taken one of our assumptions and turned it into something checkable.  Even better, we can continue to check this assertion as a sanity check against any new transitions we add, to help ensure that we don't send our model off the rails anywhere.

There are other useful things to assert here that are more specific to launchTrustedIdentity, as well.  For example, it should be the case that all trusted launches grow the routeMap by exactly one entry:

assert TrustedLaunchGrowsRouteMapByOne {
  all t, t': Time, w: Window, i: Instance {
    launchTrustedIdentity[t, t', w, i] =>
    #(Browser.routeMap.t') = #(Browser.routeMap.t) + 1
  }
}
check TrustedLaunchGrowsRouteMapByOne for 5

No counterexample found. Assertion may be valid. 0ms.

Checking this also results in no counterexamples.  Note that there is no requirement that trace[] hold here; this is a more general assertion over all trusted launch events.  Writing these assertions is like unit-testing the specification: they aren't the end result we're aiming for, but we can sanity-check the implementation by making sure they are never violated.  These checks, and this implementation of trace[], are in trace-and-test.als.

Invocation:  Next, we need a definition of invocation between instances.  In an invocation, an instance decides to pass some information, in the form of Tokens, to some other instance.  In a client-side invocation, this happens between windows, so an instance can try to send an invocation to any window for which it has the correct Token in the routeMap of the browser.  The invocation adds the arguments to the knownTokens of the instance in the targeted window, and doesn't change anything else:

pred invoke[t, t': Time, invoker, target: Window, targetIdentity, cap: Token, args: set Token] {
  t' = t.next

  // everything on the invoker's side is known by the invoker
  (cap + args + targetIdentity) in invoker.atInstance.t.knownTokens.t

  // the target window is mapped to by the targetIdentity
  (Browser.routeMap.t)[targetIdentity] = target

  // windows are open
  (invoker + target) in exists.t

  // Routing doesn't change at all
  Browser.routeMap.t' = Browser.routeMap.t
    
  // Windows don't change at all
  atInstance.t' = atInstance.t
  exists.t' = exists.t

  // Payload is delivered
  knownTokens.t' = (knownTokens.t + (target.atInstance.t')->(cap + args))
}

Now we've added a new kind of state transition, so we need to update trace[] to allow invokes into the trace.  To add invoke to the trace, we existentially quantify (using some), over the non-Time arguments to invoke, and add it as a case to trace[]:

pred trace[] {
  init[first]
  all t:Time - last {
    (some w:Window, i:Instance | launchTrustedIdentity[t, t.next, w, i])
    or
    (some invoker, target: Window, targetIdentity, cap: Token, args: set Token {
      invoke[t, t.next, invoker, target, targetIdentity, cap, args]
    })
  }
}
run trace for 4

When we run this, we hit a squirrelly corner of Alloy:

    (some invoker, target: Window, targetIdentity, cap: Token, args: set Token {
      invoke[t, t.next, invoker, target, targetIdentity, cap, args]
    })

A type error has occurred:
Analysis cannot be performed since it requires higher-order
quantification that could not be skolemized.

Alloy is having trouble checking a quantification over a set, due to something called "skolemization."  This technique relies on the finite nature of Alloy models to handle quantifications.  For example, consider an instantiation bounded at 3 Tokens: T1, T2, and T3.  A quantification of

some t:Token | somePredicate[t]

Really means

some t:{T1, T2, T3} | somePredicate[t]

Which is equivalent to:

somePredicate[T1] or somePredicate[T2] or somePredicate[T3]

This expansion is known as skolemization (this example is derived from page 154-155 of Jackson's excellent book on modeling with Alloy).  This doesn't work so well for set constraints, evidently, though I'm not completely sure why.  I would guess that it's because Alloy would need to generate powerset-like expansion:

somePredicate[{T1, T2, T3}] or somePredicate[{T1, T2}] or somePredicate[{T1, T3}] or somePredicate[{T2, T3}] or ...

In any event, there are several solutions that we don't address directly here.  For this exposition, it is enough to consider single-argument invocations, where we simply drop the set constraint, and allow for only one args Token:

(some invoker, target: Window, targetIdentity, cap: Token, args: Token {
  invoke[t, t.next, invoker, target, targetIdentity, cap, args]
})

We can create an example of a trace that contains an invoke:

run {
  trace &&
some t, t': Time, invoker, target: Window, targetIdentity, cap: Token, args: Token {
    invoke[t, t', invoker, target, targetIdentity, cap, args]
  }
} for 4



The state is exactly the same at Time2 and Time3!  The same Token filled all three roles:  It was the target of the invocation, the identity of the only instance, the capability that was invoked, and the argument passed for the invocation.  This actually could happen in the real system, when an Instance invokes one of its own capabilities.  So this is a perfectly valid instantiation.  If we use "Next" to find more examples, we get what we came for:

In this example, the extra arrow from Instance3 to Token1 appears, which is exactly the argument that was passed from Instance0 in Window1 to Instance3 in Window0.  Up to here is included in invoke-example.als.

General Launching:

Now we need to model launches that aren't the trusted launch from before, where the launcher gets to decide on the entry in the routeMap for the new window.  Aside from this change, this launch is almost identical to the launch from before (with the appropriate update to trace[]):

pred launchFromWindow[t, t': Time, launcher, newW: Window, i: Instance, identity: Token] {
  t' = t.next

  // The identity is just some token the launcher happens to know about
  identity in launcher.atInstance.t.knownTokens.t

  // this window has never been open before
  not (newW in exists.(t.*prev))

  // this window is the only new open window, all others are open still
  exists.t' = (exists.t + newW)

  // this window holds instance i right now, others are the same
  atInstance.t' = (atInstance.t + newW->i)

  // consistent knownTokens
  knownTokens.t' = knownTokens.t

  // this instance is now mapped to by the incoming identityToken
  Browser.routeMap.t' = (Browser.routeMap.t + identity->newW)
}

Asserting A Property

Now we can try to state the property that we want to hold of invocations.  Our assumption is that the targetIdentity of an invocation identifies the Instance that the cap and args of the invocation are meant for.  Crucially, the cap and args aren't meant for any other Instance.  To model this, we write an assertion that states that after an invocation, the targeted Instance is the only one that has access to the cap and args that didn't already:

assert NoBadInvokes {
  trace =>
  all t, t': Time, invoker, target: Window, targetIdentity, cap: Token, args: set Token {
    invoke[t, t', invoker, target, targetIdentity, cap, args] =>
    (all i: Instance {
      i in knownTokens.t'.(cap + args) =>
      (i in knownTokens.t.(cap + args) or i.identity = targetIdentity)
    })
  }
}

The statement is a little complicated, so let's insert some English.  The wrapping of "trace => all t, t'..." is just saying that we're only interested in this property for valid traces, and we're interested in all invocations in valid traces.  The inner part requires more unpacking:

    (all i: Instance {  // for all the instances
      i in knownTokens.t'.(cap + args) =>  // IF the instance is aware of anything in the cap or args AFTER the invoke
      (i in knownTokens.t.(cap + args) or i.identity = targetIdentity)
      // THEN that instance had better have known it before, or be the targetIdentity
    })

If we check this with Alloy, we get a counterexample (as we expected).  We start with two windows open, and we can see the problem lurking: Window0 is mapped to by Token3 in routeMap, but the Instance it holds has identity Token 0.  This example is in (counterexample.als).
Indeed, in the next step, Instance2 mistakenly receives Token3, which Window1 intended to go to the Instance with identity Token3.  We know this because of the $NoBadInvokes_targetIdentity annotation on Token3, which is Alloy annotating the elements of the visualization with their role in the counterexample.  The invocation was mis-routed to Instance2, which learned something it wasn't supposed to, because it was opened with the wrong mapping in routeMap.
We seem to have found our bug!  But we should still be skeptical.  The statement of NoBadInvokes is awfully complicated, even though it gave us what looks like a reasonable counterexample.  We weren't satisfied, so we came up with a different way to say the same thing:

assert NoBadInvokes2 {
  trace =>
  all t, t': Time, invoker, target: Window, targetIdentity, cap: Token, args: Token {
    invoke[t, t', invoker, target, targetIdentity, cap, args] =>
    (all c: (cap + args) {
      knownTokens.t'.c in (knownTokens.t.c + identity.targetIdentity)
    })
  }
}

Checking this yields a nearly identical counterexample.  But were our statements equivalent?  We can check that, too!  We wrap both asserts in predicates, and check that they hold in exactly the same situations:


pred NoBadInvokesP[] {
  trace =>
  ... // as before
}

assert NoBadInvokes {
  NoBadInvokesP
}

pred NoBadInvokes2P[] {
  trace =>
  ... // as before
}

assert NoBadInvokes2 {
  NoBadInvokes2P
}

assert BadInvokesSame {
  NoBadInvokesP <=> NoBadInvokes2P
}

check BadInvokesSame for 5

>> Executing "Check BadInvokesSame for 5"
   Solver=sat4j Bitwidth=0 MaxSeq=0 SkolemDepth=4 Symmetry=20
   188404 vars. 440 primary vars. 311052 clauses. 25157ms.
   No counterexample found. Assertion may be valid. 490032ms.

Note that this took a really long time (nearly 10 minutes)!  The iff operation is quite expensive, which probably has something to do with the exhaustiveness of the property.  But, we've stated the property twice, and both have the exact same meaning (up to models of size 5).  The whole model is in full-example.als.

We've gone through the process of modeling our error and convincing ourselves that our model correctly represents it.  In a future post, we'll go through the details of both the implementation and model of our fix, and what properties the fix satisfies.


ċ
counterexample.als
(3k)
jpolitz@google.com,
Mar 27, 2012, 4:29 PM
ċ
full-example.als
(4k)
jpolitz@google.com,
Mar 27, 2012, 4:29 PM
ċ
init-example.als
(1k)
jpolitz@google.com,
Mar 27, 2012, 4:29 PM
ċ
invoke-example.als
(3k)
jpolitz@google.com,
Mar 27, 2012, 4:29 PM
ċ
just-time.als
(0k)
jpolitz@google.com,
Mar 27, 2012, 4:29 PM
ċ
launch-example.als
(1k)
jpolitz@google.com,
Mar 27, 2012, 4:29 PM
ċ
tokens-and-instances-withfact.als
(0k)
jpolitz@google.com,
Mar 27, 2012, 4:29 PM
ċ
trace-and-test.als
(3k)
jpolitz@google.com,
Mar 28, 2012, 9:21 AM
ċ
jpolitz@google.com,
Mar 28, 2012, 9:22 AM
ċ
with-windows-correct-instances.als
(0k)
jpolitz@google.com,
Mar 27, 2012, 4:29 PM
ċ
with-windows.als
(0k)
jpolitz@google.com,
Mar 27, 2012, 4:29 PM
Comments