Team blog‎ > ‎

Modeling Client Communication in Belay

posted Mar 27, 2012, 4:30 PM by   [ updated Mar 28, 2012, 9:15 AM ]
The Belay project has enabled new kinds of window-to-window communication within browsers.  We've made this possible by building a mechanism for sending messages between potentially mistrusting windows at different origins.

Routing in Belay

One of the main jobs of Belay's browser presence is to define the invocation of BCAPs granted by a JavaScript CapServer in a browser window.  This client-to-client case differs from other typical scenarios of invocation: for server-to-server or browser-to-server communication, invocation maps easily onto HTTPS, for example.  This is because server-granted BCAP URLs specify a domain name, which is resolvable by the local DNS service, and is globally unique, because of how the space of domain names is set up.  In order to have invocations work between windows in the browser, we need the same properties as for HTTPS URLs:
  1. A notion of persistent, globally unique names for browser windows.
  2. An analog of DNS resolving for browser windows.

Routing by Assigned Identities

For (2), we have the Belay process (currently, the Belay IFrame) keeps a map from strings to windows.  The strings represent the identity of the CapServer running in the page.  CapServers on pages are informed of their entry in the map, and can mint new BCAPs that refer to these identities (the analog of minting a BCAP that refers to a web server by domain name).  These BCAP URLs have a slightly different scheme, shown with the identity part in bold:


To invoke such a BCAP, the page passes the URL and the serialized arguments to the Belay IFrame, which uses its map of identities to route the request to the right page, and return any response back to the invoker.

Assigning Identities

This solution assumes that identities for the windows exist, and have been communicated to the Belay IFrame.  But browser windows don't really come with a good notion of identity, so solving (1) is a little more subtle.  In our initial design, we decided to invent identities for windows by making the window's opener responsible:  When a new instance is launched in a new window, the whatever page initiated the launch picks a string to use as the identity of that window, and communicates it to the Belay process.

The launcher might pick this string completely at random, perhaps for a window that will hold a page it has never seen before.  Or, it might pick the string to provide some persistence, as in the case of Belay Station re-launching an instance.  The instance may have granted some BCAPs in the past, using a particular instance id, and the only way for it to continue serving those requests on the next launch is for the launcher to associate it with the same instance id.  For example, we want to support the following:
  1. A Buzzer instance is created and launched with identity b022
  2. The instance grants a posting BCAP using that identity, the BCAP looks like: urn:x-cap:b022:a0db
  3. That BCAP is handed to an Emote instance
  4. The windows holding the Buzzer and the Emote instance are closed, with the launch BCAPs for the instances saved to the user's Station.
  5. The next day, the user opens both the Buzzer and Emote instance from their Station.
  6. The Emote instance invokes the posting cap urn:x-cap:b022:a0db, which should be routed to the window the Buzzer instance has been reopened to.
At step 5, Station (the launcher) needs to provide the identity b022 to the Buzzer instance's window, otherwise the requests will not be successfully routed during the invoke in step 6.

What Could (Possibly) Go Wrong?

We just described a mechanism for launching windows that gives their content some persistence across sessions.  This lets Belay's notion of an "instance", a persistent, application-defined set of data and behavior, exist on top of browser-specific concerns like windows and data-only communication channels.  But does it really work?  Will BCAPs always be routed to the "right" instance?  Can any page launch new instances, or is that a special, Station-only behavior?  Could there be a mimicry attack, where one instance masquerades as another?

Potential for Sadness

Based on the our implementation above, we were pretty sure that launch was a highly-trusted operation.  A malicious or mistaken launch could assign an instance's identity to route to a window running a completely different instance.  For example, take a variation on the example above:
  1. A Buzzer instance is created and launched with identity b022
  2. Another Buzzer instance is created and launched with identity a955
  3. The first instance grants a posting BCAP using its assigned identity; the BCAP looks like: urn:x-cap:b022:a0db
  4. That BCAP is handed to an Emote instance
  5. The windows holding the Buzzer and the Emote instance are closed, with the launch BCAPs for the instances saved to the user's Station.
  6. The next day, the user opens Buzzer a955 and the Emote instance from their Station, but Station accidentally assigns identity b022 to the Buzzer's window.
  7. The Emote instance invokes the posting cap urn:x-cap:b022:a0db, which posts to the wrong Buzzer.
This is a case where Station makes a mistake, but exposing the launch operation to any page would allow malicious pages to deliberately launch instances that impersonate others.

We realized this weakness existed through Conversation and Thinking Real Hard (CaTRH).  While we consider ourselves seasoned practitioners of CaTRH, it does make our heads hurt sometimes, and we'd prefer to resort to it less often.  We don't know what other weaknesses are lurking in the system that we've missed, and we could use some help gaining confidence in their absence.  Plus, we'd like to free up our valuable Conversation time for more engaging diversions.

Reducing CaTRH's Role

Alloy is a language and tool for exploring relational models.  Given a relational specification of a system, it can generate instantiations of the system, visualize the instantiations, and check properties of the instantiations.  If we can describe Belay's routing fabric in an Alloy model, we can give a more formal semantics to our intuitions generated from CaTHR.  The rest of this post describes how we used Alloy to crisply model the weakness we described informally above, and verify a solution.  We provide a much more detailed account of the Alloy model in a separate post, and only provide a brief overview here.

The world of our launch-and-invoke model is made up of 5 kinds of things:  the Browser, Windows, Instances, Tokens, and Time.  A Token is a model of any and all serialized data, from UUIDs to URLs to postMessaged strings.  A Window holds a currently-loaded Instance, and can hold a different Instance at any point in Time.  An Instance has an identity represented by a Token, and also is aware of a set of knownTokens.  The Browser keeps track of a mapping from Tokens to open Windows, in order to route invocations to the right Window.

Why Launch Must Be Trusted, Formally

We use Alloy to model the launching operation we described above, where the launcher decides which identity (modeled as a Token), is used to route to the newly-opened Window.  We also define invocation, which allows an invoker Instance to pass any Tokens that it knows about to any other Instance open in Window for which the invoker knows the entry in the browser's routing map.  If the routing fabric is working correctly, then after an invocation, only the Instance whose identity is the Token the invoker chose to route to should know the passed Tokens.  It would be an error to pass the arguments to a different instance, for example, which is the bug in step 7 of the mistaken Station above.

We express the bad invocation and check if its possible, using Alloy:

assert NoBadInvokes {
  trace =>
  all t, t': Time, invoker, target: Window, targetIdentity, cap: Token, args: set Token {
    invoke[t, t', invoker, target, targetIdentity, cap, args] =>
    (all c: (cap + args) {
      knownTokens.t'.c in (knownTokens.t.c + identity.targetIdentity)
check NoBadInvokes for 4 but exactly 3 Instance

This is a lot of code (which is built up more slowly in the related post).  The gist is that an invoke is described by a pre and post state (t and t'); an invoker and a target window; a targetIdentity, which should route to the target window; and a cap and some args, which are the Tokens to pass to the target.  The invariant described by NoBadInvokes is that after every invocation, the instances with each of the argument Tokens in their knownTokens is either the same as before, or the same as before plus the instance with the identity of targetIdentity (the purported target instance of the invocation).  This says that no instance other than the targeted one should learn the arguments.

When we run the check command, Alloy immediately reports a counterexample.  We'll go through it step by step.

At Time1, there is a single window open, which the Browser maps to via Token1 (shown as the orange arrow labeled routeMap[Token1]).  This window, Window1, is currently navigated to Instance1, which has the identity of Token1.  The brown arrows represent the Tokens the instance knows---the set of knownTokens---which aren't labeled merely to avoid cluttering the example.  There are two other instances in the world, Instance1 and Instance2, which aren't loaded in any browser window.

If we step forward in time, we see that a new window is launched:

Now there is a new window, Window0, open, which is navigated to Instance2.  Instance2 has identity Token0, but its enclosing window, Window0, is mapped to by Token3 in routeMap.  This is the wrong identity for Instance2.  Also note that at this time, the only Token that Instance2 knows of is Token0.  If we step forward in the counterexample one more step, we can see what changes:

Here, we see that Instance2 now knows about Token3, after an invocation occurred.  But this shouldn't be possible---Instance1, the instance in the invoking window, doesn't even know Token0, the identity of Instance2.  It "tried" to invoke a capability at the instance with identity Token3 (we know this because of the $NoBadInvokes_targetIdentity annotation), but the invocation was merely routed to the window mapped to by that token, with no guarantee that the instance in the window actually had that identity.  This potential for misassignment of identity in routing is why launching must be treated as a trusted operation: malicious or buggy launchers can break important invariants of the system.

Allowing Untrusted Launches

We'd like to provide the launch operation to instances other than Station (or other trusted parties).  It's just plain useful; some sites might have built-in mini-stations that keep track of related sub-instances, like launching your Calendar from Gmail.  Others might want to turn middle-click (open-in-new-window), into a launch operation to support common web gestures.  Relegating the launch operation to a trusted whitelist of instances simply won't always work.

We have a (CaTRH-derived) proposed solution.  Instances will keep track of a secret that represents their identity.  There will be a global hash function for creating routing identities from secret identities.  The hash of the instance's secret identity can be freely shared with other instances, so they can invoke capabilities at the instance, or even launch it.  Upon launch, an instance must provide it's secret identity to the trusted browser architecture.  The browser can then apply the hashing function against the purported routing identity of the instance, and ensure that the launch is legitimate.

We change our implementation of the model, make a (slight) change to our definition of NoBadInvokes based on the change, and re-check.  Alloy doesn't report any counterexamples:

Executing "Check NoBadInvokes for 5"
  Solver=sat4j Bitwidth=0 MaxSeq=0 SkolemDepth=4 Symmetry=20
  68933 vars. 515 primary vars. 155098 clauses. 7174ms.
  No counterexample found. Assertion may be valid. 44908ms.

In a future, longer post, we'll explain the updated model, and a discussion of why this helps us justify our fix.  The related tutorial post only gets up to the point of explaining the counterexample above.

The Payoff

To understand what the model is doing for us, consider what we'd have done in a world without our Alloy excursion.  We could have built a proof-of-concept site that abused an open implementation of launch.  Then, we could have implemented the fix, and watched our proof-of-concept break.  We wouldn't have known if another malicious site could be implemented against our fix, or if our idea was even worth the effort of implementing.  We wouldn't know if the bad behavior was due to a simple bug in our implementation, or if we really were implementing a flawed idea.

With the model in hand, we know what the problem was, and that our fix is at least conceptually sound.  We still have to implement it correctly, but at least we know that what we're trying to implement is consistent with our goals.  Now we've:
  1. Understood why our intuition that launch should only be provided to trusted parties was correct;
  2. Crisply defined what a security failure meant in terms of information reaching the wrong party;
  3. Formally specified our fix on top of the old system;
  4. Gained confidence that the fix doesn't exhibit the same weakness.
Moving forward, we'll build more models, extend existing ones, and address more properties than the straightforward one we've outlined here.  We're excited about increasing our own understanding of Belay, and also for the opportunity to communicate crisp specifications of Belay's design and techniques.