Here, I attempt to describe the fundamental problem in representing "instances" in a user interface.
In the current model of application development, an application runs with access to all the resources upon which it could possibly operate. For example, a word processor can simultaneously open several files; has an "Open recent" menu item (implying it saves the history of the user's behavior on all files it manages); and can often present a customized "Open file" dialog allowing the user to search in a customized way across all files in the filesystem which are stored in the word processor's format.
This arrangement is problematic for one important security reason -- the centralization of excess authority -- which manifests itself in various ways:
We wish therefore to build a model where, instead of opening an application, a user constructs instances of a particular type of object (in our example, "word processing document"). Each instance is isolated except by explicit introduction.
This new model has deep ramifications in the user experience, because operations commonly available via the application's own user interface ("Search", "New document", ...) must now be handled by a neutral (and more-trusted) system component. Our security requirement, in a sense, induces us to think of our system as a framework into which "applications" are now merely externally-supplied data types.
Such frameworks have been around for a long time (at least since the Xerox Star) where they were justified as a means to empower the end-user to reason via a portable paradigm across diverse data types. The last serious attempt to commercialize such a model was the OS/2 Workplace Shell, and the follow-on work by Taligent. Yet, at the same time, the Web was emerging as a zero-install, distributed computing platform, and Web whoring became the new hotness.
Today, we return to the problem of building such a framework, but in the context of a significant built environment on the Web which we cannot (and, anyway, do not want to) abandon. Thus we face two problems:
The Web is made up of public resources identified by URLs which anyone can browse to or find on a search engine (Figure 1). The public resources may include resource makers the function of which is to make new resources (Figure 2). Users invoke these makers to create new resources (Figure 3) which they may either keep private (by not publicizing their identifiers) or put into the pool of public resources (by making them reachable from some public resource -- generally, the home page of some well-known domain name's Website). Parts of the "system", like cameras and printers, are presented to the user as private resources under their control in a manner uniform with the resources they have created and chosen to keep private (Figure 4). A user gets work done by delegating authority between resources, drawing on their own private pool where necessary (Figure 5).
When a user navigates to a public resource, to remain compatible with Web pages today, the local presence of the resource (e.g., the scripts in the HTML) expects to have the power to obtain arbitrary resources from the Internet, even if the locations of these resources are computed, rather than pre-declared in the source code. Unfortunately, in our system, if we grant the local presences of resources the ability to establish connectivity to arbitrary endpoints based on data (as opposed to explicit introduction), we allow them to gain connectivity via information sharing alone. This violates the assumptions we make in the implementation of our object capability system.
In a distributed capability system, this is solved by establishing protocol constraints. Specifically, a resource from some outside source can declare as data its connectivity to other resources (Figure 6) via its wire protocol. On the incoming end, we convert the wire representation to an executing instance pre-initialized with the references declared in its representation. We subsequently ensure that, from that point on, data alone cannot be used to gain additional connectivity because we deny it the ability to convert data into references (Figure 7).
To implement this scheme on the Web of today, we would have to ensure that Web pages pre-declare their connectivity. We have a number of solutions we can use in tandem:
Despite these mitigations, it remains likely that we will "break [some subset of] the Web". For example, an existing Web page may decide what image to load based on the state of a game and, not being an HTML5 page, it would not have a cache manifest. We need to think about this problem carefully.