Protocol‎ > ‎Design Documents‎ > ‎

User-data Wavelets and Supplements

    Author: David Hearnden (

    Date: 3-Dec-2010

    A wave is a container that holds a collection of wavelets. In any wave, state that is specific to a particular user is stored in a wavelet called the user-data wavelet (UDW) which only has that user as the participant. This state tracks information like what parts of the wave that user has read, private gadget state, and thread collapse state. The model that describes this collection of information is called the supplement.

    Because the current code is sourced from existing Google Wave code, there are many more pieces of information stored in this wavelet that is not used by WIAB right now. e.g. folder allocations, indexing state (following and archive), seen wavelets, etc. However, we will still describe them here as they may be used by WIAB in the future.

    There are three layers to the supplement model:
    • the primitive data - just describes the data structures that are used to hold the supplement data. Think of it as simple getter and setters.
    • the semantic, context-free model - semantics of that data in terms of conversation-specific queries and actions (like marking blips as read, where it ensures read versions only increase). It is context-free in that it has no implicit knowledge of any wave data - that information must be supplied to each method. i.e. it works with blip ids and not blip objects.
    • the semantic, contextual model - binds the context-free supplement with a particular wave, in order to contextualise the supplement queries and actions to a particular wave (this is essentially just currying).

    UDW ID

    The user-data wavelet in a wave (domain = D1, id = I) for a particular user (domain = D2, id = U) has the id user+U@D2. e.g.
    is the UDW for bob from on the wave w+ABCDEFG at This id construction is defined by: address)
    The supplement model is agnostic about the lifecycle of the UDW wavelet; it is not specified how, when, or if it gets created.

    Primitive layer

    The data model for the supplement is built using wave's toolkit for embedding concurrent data types in wavelets. This lets the data be described using far more appropriate data types than annotated XML, and the concurrent toolkit takes care of embedding those types in XML in a manner that works well with concurrency and operational transformation. Example scenarios of concurrent writes on a user's UDW are:
    • use by two clients simultaneously;
    • use by a client and a server (e.g., marking blips as read in a client while a server is applying a filter or a remote mark-as-read request); and
    • bringing back online a client that read content while offline, where that content has been marked as read in the interim.
    This section describes the data model in terms of these abstract types, with some XML snippets to show their concrete embeddings.

    Read state

    Read state is a map of wavelet ids to 4-tuples:
    WaveletReadState = Tuple (
    blips: MonotonicMap [ String -> Int ],
    participants: MonotonicValue [ Int ],
    tags: MonotonicValue [ Int ],
    wavelet: MonotonicValue [ Int ]

    ReadState = Map [ String -> WaveletReadState ]

    This structure tracks, for each conversation in a wave, the last-read version of each blip, the last-read version of that conversation's participants, the last-read version of that conversation's tags collection (tags are a Google Wave feature not currently supported in Wave-In-A-Box), and the last-read version of the conversation as a whole. The monotonicity of the data types MonotonicValue and MonotonicMap just mean that values are only allowed to increase, and that the resolution of concurrent writes is the maximum value (rather than last-one-wins or any other resolution strategy).

    For example, the following read state describes two conversations in a wave (a root conversation and a private reply). Three blips in the root conversation are read at versions 104, 230 and 250, the entire root conversation was marked as read at version 157, and the participants collection read at version 7. The private conversation conv+d0jfkt has similar read state.
    "!conv+root" → {
    blips: {
    "b+lurL8WUGA" → 104,
    "b+rAUrAGUGA" → 230,
    "b+fF872cQNK" → 250,
    participants: 7,
    wavelet: 157
    "!conv+d0jfkt" → (
    blips: {
    "b+aA8LAcUGK" → 35,
    "b+cuU78GQGA" → 1440,

    In XML, this is embedded at the document root of the m/read document in the UDW (the supplement data model uses an m/ prefix on all the documents it uses in the UDW. The use of this prefix was once, but is no longer, necessary; it is legacy and is still used only for compatibility with old data). The read state for each wavelet is a top-level element, and the four structures of the blip-map, participant value, tag value, and wavelet value, are all superimposed inside those top-level elements:

    <wavelet i="!conv+root">
    <blip i="b+lurL8WUGA" v="104"/>
    <blip i="b+rAUrAGUGA" v="230"/>
    <blip i="b+fF872cQNK" v="250"/>
    <participants v="7"/>
    <all v="157"/>
    <wavelet i="!conv+dOjfkt">
    <blip i="b+aA8LAcUGK" v="35"/>
    <blip i="b+cuU78GQGA" v="1440"/>

    Thread state

    Thread state is similar to read state: it maps, for each conversation, thread ids to their presentation state. Currently this is just collapsed or expanded, but it is open to extension for other thread states in the future (like summarised, or partially expanded).

    WaveletThreadReadState = Map [ String -> (COLLAPSED | EXPANDED) ]
    ThreadState = Map [ String -> WaveletThreadState ]

    Gadget state

    Map [ String -> Map [ String -> String ] ]

    Gadget state is simply a map of gadget id to a key-value map. This allows gadgets to record private, per-user information as key/value pairs. The Gadget doodad exposes this key/value pair map as part of the Wave Gadget API.

    Google-Wave state

    These fields are leftover from use in the Google Wave product, and are not currently used by Wave In A Box. The following parts may be of use in the future for Wave In A Box.

    • Folders: Set [ Int ]
    • this is just a set of folder ids, embedded in m/folder.
    • Indexing: Tuple ( archive: MonotonicMap [ String -> Int ], following: Boolean )
    • this tracks the versions at which conversations have been archived, and an optional bit specifying whether this wave is being followed or not (delivered to the user's Inbox when changes occur since archived). These are embedded in m/archive and m/muted.
    • Seen: Tuple( seen: MonotonicMap [ String -> HashedVersion ], notified: MonotonicMap [ String -> Int ], pending : Boolean )
    • The 'seen' map tracks the unforgeable/signed version at which a user has 'seen' conversations in a wave. Actions interpreted as 'seeing' include performing some action on the wave (like marking as read or moving to a folder), regardless of whether it has actually been rendered. Seen versions are used as a proof that a user has access to a particular conversation at that version, in order to provide them access to those versions for all time in case they later lose access due to being dropped as a participant. The 'notified' and 'pending' parts are for external gateway notifications (e.g., like email), and track when notifications have been sent, and whether further notifications are needed. This structure has race conditions.

    The rest of the state in the supplement (m/abuse, m/cleared) is obsolete and of no interest.


    All the structures in the data model are live and broadcast events when they change. This capability comes for free from the concurrent toolkit.


    The primitive data model is defined in PrimitiveSupplement, with the observable extension in ObservablePrimitiveSupplement. There are two canonical implementations: one embeds its state in a wavelet (WaveletBasedSupplement), and the other in POJO structures (PrimitiveSupplementImpl). The pojo version is used for testing, snapshotting supplement state for faster server-side processing, and is also used as a fake persistence layer when UDWs are not present and creating them on demand is not desirable.

    Context-free semantic layer

    The context-free semantic layer defines the actions and queries that the supplement model provides, defines in terms of the primitive data model and input wave state. For example, it defines the readness of a blip as:
    a blip is unread if, and only if
    • the read-version for that blip either does not exist or is less than the blip's last-modified version; and
    • the wavelet-override version either does not exist or is less than the blip's last-modified version.
    The signature of that query is:
    boolean isBlipUnread(WaveletId waveletId, String blipId, int blipVersion);
    Note that all the relevant wave state for this query (wavelet id, blip id, and blip version) is input explicitly, which is why this layer is context-free. As well as queries, this layer also defines actions like marking blips or wavelets as read, marking threads as expanded/collapsed, etc.


    Interfaces: {Readable,Writeable,Observable}Supplement
    Implementations: SupplementImpl

    Contextual semantic layer

    The contextual semantic layer associates the context-free supplement object with a particular wave, in order to curry out all the wave state parameters from its queries and actions. For example, the signature of the blip read/unread query simply becomes:
    boolean isBlipUnread(ConversationBlip blip);
    The view of conversation state that this layer requires in order to contextualize the supplement is defined in SupplementWaveView. This interface is more restrictive than the full WaveView interface, and exposes only the relevant parts of a wave, mainly version numbers. Defining the supplement layer in terms of this smaller interface means that a supplement model can run against a wave representation that is cheaper than a full wave model. There are two implementation of this layer, one adding observability to the other.


    Interfaces: {Readable,Writeable,Observable}SupplementedWave
    Implementations: SupplementedWaveImpl, LiveSupplementedWaveImpl

    Example Code

    Given a Wave model wavelet, and its conversation model conversations, the live implementation of the supplement model is created with the following snippet:

    Wavelet udw = w.getUserData();
    ObservablePrimitiveSupplement data = WaveletBasedSupplement.create(udw);
    ObservableSupplementedWave supplement = new LiveSupplementedWaveImpl(
    data, wavelet, user, DefaultFollow.ALWAYS, conversations);

    Future Work

    This document has described the objective state of the supplement model, without explaining its evolutionary history or trajectory. There are a number of parts of it that could or should be changed. The supplement's structure has a number of legacy concerns that are no longer relevant in Google Wave, and/or will never be relevant for Wave-In-A-Box.

    First, there is the obvious task of deleting obsolete parts that only exist to interpret old data specific to Google Wave.

    Second, the use case that drove splitting the semantic layer into two (the context-free and the contextual) no longer exists, and so code size and complexity would be reduced by merging the two semantic layers back into one (the contextual supplement).

    Third, the separation of interfaces for readers, writers, and observers of supplements, while being semantically satisfying, does add volume to the universe of supplement-related types when expressed in Java, and the utility of these role-specific views may not be enough to mitigate that complexity burden.

    Fourth, version numbers in the supplement model have only 32-bit integers, whereas in the wave model proper they are 64-bit integers. This is purely a client-specific optimization, because GWT's faithful emulation of 64-bit numbers in JavaScript (that lacks a native 64-bit number) is a speed concern. This sacrifice of correctness for the sake of speed is something that could be rectified.

    Fifth, although used successfully in production for over a year, there are still some race conditions in the supplement logic.

    Finally, the supplement model initially started as just a model for read-state, and grew incrementally into a bag of disparate concerns related only by the property of being user-specific. Exposing all this data through the one sum interface (PrimitiveSupplement) and implementation (WaveletBasedSupplement) is not necessarily a good way to scale, and it is perhaps time to split the supplement into individual models (e.g., reading, indexing, notifications, etc).