The Belay project has enabled new kinds of window-to-window communication within browsers. We've made this possible by building a mechanism for sending messages between potentially mistrusting windows at different origins.
For (2), we have the Belay process (currently, the Belay IFrame) keeps a map from strings to windows. The strings represent the identity of the CapServer running in the page. CapServers on pages are informed of their entry in the map, and can mint new BCAPs that refer to these identities (the analog of minting a BCAP that refers to a web server by domain name). These BCAP URLs have a slightly different scheme, shown with the identity part in bold:
To invoke such a BCAP, the page passes the URL and the serialized arguments to the Belay IFrame, which uses its map of identities to route the request to the right page, and return any response back to the invoker.
This solution assumes that identities for the windows exist, and have been communicated to the Belay IFrame. But browser windows don't really come with a good notion of identity, so solving (1) is a little more subtle. In our initial design, we decided to invent identities for windows by making the window's opener responsible: When a new instance is launched in a new window, the whatever page initiated the launch picks a string to use as the identity of that window, and communicates it to the Belay process.
The launcher might pick this string completely at random, perhaps for a window that will hold a page it has never seen before. Or, it might pick the string to provide some persistence, as in the case of Belay Station re-launching an instance. The instance may have granted some BCAPs in the past, using a particular instance id, and the only way for it to continue serving those requests on the next launch is for the launcher to associate it with the same instance id. For example, we want to support the following:
At step 5, Station (the launcher) needs to provide the identity b022 to the Buzzer instance's window, otherwise the requests will not be successfully routed during the invoke in step 6.
We just described a mechanism for launching windows that gives their content some persistence across sessions. This lets Belay's notion of an "instance", a persistent, application-defined set of data and behavior, exist on top of browser-specific concerns like windows and data-only communication channels. But does it really work? Will BCAPs always be routed to the "right" instance? Can any page launch new instances, or is that a special, Station-only behavior? Could there be a mimicry attack, where one instance masquerades as another?
Based on the our implementation above, we were pretty sure that launch was a highly-trusted operation. A malicious or mistaken launch could assign an instance's identity to route to a window running a completely different instance. For example, take a variation on the example above:
This is a case where Station makes a mistake, but exposing the launch operation to any page would allow malicious pages to deliberately launch instances that impersonate others.
We realized this weakness existed through Conversation and Thinking Real Hard (CaTRH). While we consider ourselves seasoned practitioners of CaTRH, it does make our heads hurt sometimes, and we'd prefer to resort to it less often. We don't know what other weaknesses are lurking in the system that we've missed, and we could use some help gaining confidence in their absence. Plus, we'd like to free up our valuable Conversation time for more engaging diversions.
Alloy is a language and tool for exploring relational models. Given a relational specification of a system, it can generate instantiations of the system, visualize the instantiations, and check properties of the instantiations. If we can describe Belay's routing fabric in an Alloy model, we can give a more formal semantics to our intuitions generated from CaTHR. The rest of this post describes how we used Alloy to crisply model the weakness we described informally above, and verify a solution. We provide a much more detailed account of the Alloy model in a separate post, and only provide a brief overview here.
The world of our launch-and-invoke model is made up of 5 kinds of things: the Browser, Windows, Instances, Tokens, and Time. A Token is a model of any and all serialized data, from UUIDs to URLs to postMessaged strings. A Window holds a currently-loaded Instance, and can hold a different Instance at any point in Time. An Instance has an identity represented by a Token, and also is aware of a set of knownTokens. The Browser keeps track of a mapping from Tokens to open Windows, in order to route invocations to the right Window.
We use Alloy to model the launching operation we described above, where the launcher decides which identity (modeled as a Token), is used to route to the newly-opened Window. We also define invocation, which allows an invoker Instance to pass any Tokens that it knows about to any other Instance open in Window for which the invoker knows the entry in the browser's routing map. If the routing fabric is working correctly, then after an invocation, only the Instance whose identity is the Token the invoker chose to route to should know the passed Tokens. It would be an error to pass the arguments to a different instance, for example, which is the bug in step 7 of the mistaken Station above.
We express the bad invocation and check if its possible, using Alloy:
This is a lot of code (which is built up more slowly in the related post). The gist is that an invoke is described by a pre and post state (t and t'); an invoker and a target window; a targetIdentity, which should route to the target window; and a cap and some args, which are the Tokens to pass to the target. The invariant described by NoBadInvokes is that after every invocation, the instances with each of the argument Tokens in their knownTokens is either the same as before, or the same as before plus the instance with the identity of targetIdentity (the purported target instance of the invocation). This says that no instance other than the targeted one should learn the arguments.
When we run the check command, Alloy immediately reports a counterexample. We'll go through it step by step.
At Time1, there is a single window open, which the Browser maps to via Token1 (shown as the orange arrow labeled routeMap[Token1]). This window, Window1, is currently navigated to Instance1, which has the identity of Token1. The brown arrows represent the Tokens the instance knows---the set of knownTokens---which aren't labeled merely to avoid cluttering the example. There are two other instances in the world, Instance1 and Instance2, which aren't loaded in any browser window.If we step forward in time, we see that a new window is launched:
Now there is a new window, Window0, open, which is navigated to Instance2. Instance2 has identity Token0, but its enclosing window, Window0, is mapped to by Token3 in routeMap. This is the wrong identity for Instance2. Also note that at this time, the only Token that Instance2 knows of is Token0. If we step forward in the counterexample one more step, we can see what changes:
Here, we see that Instance2 now knows about Token3, after an invocation occurred. But this shouldn't be possible---Instance1, the instance in the invoking window, doesn't even know Token0, the identity of Instance2. It "tried" to invoke a capability at the instance with identity Token3 (we know this because of the $NoBadInvokes_targetIdentity annotation), but the invocation was merely routed to the window mapped to by that token, with no guarantee that the instance in the window actually had that identity. This potential for misassignment of identity in routing is why launching must be treated as a trusted operation: malicious or buggy launchers can break important invariants of the system.
We'd like to provide the launch operation to instances other than Station (or other trusted parties). It's just plain useful; some sites might have built-in mini-stations that keep track of related sub-instances, like launching your Calendar from Gmail. Others might want to turn middle-click (open-in-new-window), into a launch operation to support common web gestures. Relegating the launch operation to a trusted whitelist of instances simply won't always work.
We have a (CaTRH-derived) proposed solution. Instances will keep track of a secret that represents their identity. There will be a global hash function for creating routing identities from secret identities. The hash of the instance's secret identity can be freely shared with other instances, so they can invoke capabilities at the instance, or even launch it. Upon launch, an instance must provide it's secret identity to the trusted browser architecture. The browser can then apply the hashing function against the purported routing identity of the instance, and ensure that the launch is legitimate.
We change our implementation of the model, make a (slight) change to our definition of NoBadInvokes based on the change, and re-check. Alloy doesn't report any counterexamples:
In a future, longer post, we'll explain the updated model, and a discussion of why this helps us justify our fix. The related tutorial post only gets up to the point of explaining the counterexample above.
To understand what the model is doing for us, consider what we'd have done in a world without our Alloy excursion. We could have built a proof-of-concept site that abused an open implementation of launch. Then, we could have implemented the fix, and watched our proof-of-concept break. We wouldn't have known if another malicious site could be implemented against our fix, or if our idea was even worth the effort of implementing. We wouldn't know if the bad behavior was due to a simple bug in our implementation, or if we really were implementing a flawed idea.
With the model in hand, we know what the problem was, and that our fix is at least conceptually sound. We still have to implement it correctly, but at least we know that what we're trying to implement is consistent with our goals. Now we've:
Moving forward, we'll build more models, extend existing ones, and address more properties than the straightforward one we've outlined here. We're excited about increasing our own understanding of Belay, and also for the opportunity to communicate crisp specifications of Belay's design and techniques.
Team blog >