Glorp for TOPLink Users

GLORP for TOPLink Users

Copyright 2004
by Alan Knight

There are a lot of Smalltalk projects out there that use, or have used TOPLink, and lots of developers who are familiar with TOPLink. Glorp shares a lot of basic concepts and design decisions with TOPLink, but also has significant differences, some high-level and many in the details. So TOPLink users ought to be able to pick up Glorp fairly quickly. This tries to explain Glorp usage in terms familiar to TOPLink users. It also explains which pieces are currently missing or have issues. The explanations are quite sketchy, but should point you in the right direction for which classes to look at and which other documentation (assuming its existence) might be helpful.

Descriptors

Basic mapping will be fairly familiar to TOPLink users. Glorp has Descriptors, and it has Mappings. It’s possible to implement descriptors as class-side methods, which was the most common TOPLink practice, but it’s not the norm.
Instead, Glorp defines a DescriptorSystem which includes all of the descriptors for a given application. Rather than adding individual descriptors to a session, you set the system, and everything else is automatic.
The DescriptorSystem defines other things as well as the descriptors. Glorp descriptors are lighter weight than TOPLink ones, because there is also an explicit model of the database tables (class DatabaseTable), and an explicit model of the Smalltalk classes (class ClassModel). The DescriptorSystem is built to create these lazily as required, but to ensure that the same instances are always used. This is very important, because Glorp frequently uses identity comparison, relying on these objects being the same. For example, it almost never does string comparison of field names, but just checks if the fields are identical or not.
To set up a descriptor system you’ll need to implement the methods #allTableNames and #constructAllClasses. Then, for each class MyClass you’ll need to implement #classModelForMyClass: and #descriptorForMyClass:. For each table MY_TABLE you’ll need to implement tableForMY_TABLE:. The easiest way to build one is to copy and paste from the examples.

Class Models

A class model is very simple. It defines the attributes for a class, at least the ones that are persistent, and their types. If the attribute is a collection it may define both the collection class and the type of the elements.

DatabaseTables

A DatabaseTable defines a table in the database. This includes its name, fields, and foreign key constraints. Note that types are always relative to a database platform (e.g. OraclePlatform), and a descriptor system always needs to know its platform, and is normally created with one.
The foreign key constraints are extremely important. Even if your database doesn’t define constraints, it’s important to tell Glorp what the foreign key relationships are. These are used for a variety of things, including determining write order and inferring joins for relationship mappings. Fields know things like their name, their type, and whether or not they’re a primary key or an optimistic locking field.
You never use a qualified name for a database table or field. Instead, they know their “schema”. It’s also possible to define a default schema for an entire system (it’s set on the Login object), which all tables will use by default.

Field Types, DatabasePlatform

The Login object also defines the “platform”, or database that we’ll be using. This is important because it defines the database types. TOPLink used Smalltalk classes to indicate the types, and then tried to compute what that meant in database terms. Glorp doesn’t do this, but rather has an explicit representation of the types, which can vary by database. Typical examples of types are “platform int4”, “platform varChar: 50”, or “platform serial”.

Descriptors

As in TOPLink, a descriptor consists primarily of a list of mappings. There is one descriptor for each persistent class. In addition to the mappings, a descriptor knows things like it’s list of tables, the joins between those tables if there are multiple ones, a caching policy if it doesn’t use the system default, and inheritance information.

Mappings

Mappings are relatively similar in concept to TOPLink, although the names differ. A simple example of mapping creation is
(aDescriptor newMapping: DirectMapping)
from: #foo to: (table fieldNamed: ‘FOO’).
Note that you normally don’t have to specify the reference class of a mapping, because it’s the type of the attribute as defined in the class model.

There are fewer types of relationship mapping. Any relationship mapping can choose to use a link table. So there’s no need for explicit classes for OneToMany and ManyToMany. Glorp does include classes for these for convenience, but they aren’t actually necessary and have very little code. Back-references on a to-many relationship are never required.

In many cases it’s not necessary to specify the join for a relationship mapping, since it can be calculated from the foreign key constraints between the tables. However, if the constraints are ambiguous (e.g. if there are multiple constraints between the two tables) it will be necessary to specify the join. For example,
(aDescriptor newMapping: ToManyMapping)
attributeName: #customers;
join: (Join
from: (table fieldNamed: 'ID')
to: ((self tableNamed: 'VIDEO_CUSTOMER')
fieldNamed: 'STORE_ID')).
The join determines three things
- what fields to write for the relationship
- what foreign keys to use to resolve when reading the relationship
- what join to use when traversing the relationship in a query
These were stored as different things in TOPLink, and could be specified in a variety of ways (e.g. with the sourceForeignKeyFieldNamed:referencesTargetPrimaryQueryKey:).

For a mapping that uses a link table, it may also be necessary to give Glorp a hint as to the other side of the relationship. Again this can often be inferred from foreign key constraints, but if it isn’t, you can tell the mapping its #relevantLinkTableFields:.

EmbeddedValueOneToOneMapping is the rough equivalent of a TOPLink aggregate mapping. When you use an embedded value, for the simple case you can map the object’s fields directly to the fields of the containing table. However if the object is embedded in multiple different places, then you specify an imaginary table for its fields, and in each place that it’s used, you specify a join to indicate the transformation between the imaginary fields and the fields actually in use.

An AdHocMapping is equivalent to a TOPLink transformation mapping. This allows you to specify two blocks and completely control the way the data both going into Smalltalk and out to the database. Note that you also have to specify the fields that the mapping affects. Using an ad hoc mapping will probably require some exposure to Glorp internals, as the arguments to the blocks are relatively complex.

A ConditionalMapping is one that includes two or more sub-mappings, and will use a different one of these depending on the state of memory or the database. There is no direct TOPLink equivalent.

A ConstantMapping is one that always maps to a constant. This is more useful than you might think. It can be used as part of a conditional mapping, e.g. to indicate that a value is nil under certain conditions. It’s also possible to make the constant value be the session, which can be useful if you want mapped objects to easily be able to issue queries.

A BasicDictionaryMapping maps a relationship whose result is a dictionary. It is a subclass of ToManyMapping and also allows a keyField: (DatabaseField) to be specified. The value from this field will be used as the dictionary key. This doesn’t cover nearly all the possibilities for mapping dictionaries, but is all that’s currently supported in a mapping.

TypeMappings are not used directly as mappings, but are part of the implementation of inheritance.

Note that any mapping implies a query, and you can get access to this query and modify it in order to

Any mapping can be marked readOnly. They can also be marked writeOnly. This is also more useful than you might think. This is how Glorp supports its equivalent of TOPLink query keys. Rather than a special construct, a query key is simply a mapping which neither reads nor writes. It can thus be used as an attribute in querying, but doesn’t cause anything to be read into memory or written to the database. For example, if we wanted to test if a foreign key field was null we could write that as an expression in terms of fields, but that’s inconvenient. Instead we could create a mapping for the field, but we don’t want to have to materialize it in an instance variable, and it will already be written because of the relationship it implies. So we create a mapping that neither reads not writes.

Inheritance

Inheritance is implemented by defining a type resolver, and having each descriptor for a subclass register with that resolver. Queries against the superclass can then ask the resolver how they should be executed in order to bring back instances of the subclasses as well. See GlorpInheritanceDescriptorSystem for examples of this, notably for GlorpInventoryItem and GlorpEmployee.

Two kinds of inheritance resolvers are supported. Horizontal resolvers, indicating that each concrete subclass has its own table, and Filtered Resolvers, indicating that all classes are stored in a single table, with a discriminator field to indicate which class a particular row represents. There is not yet a type resolver for the case where superclass data is stored in a superclass table with each subclass augmenting that in its own table.

Session

The session class is called GlorpSession and plays a similar role to the TOPLink session. It holds onto the actual database connection, to the metadata, and to the cache. Queries are executed in the context of a particular session.
The Cache fills the same role as the TOPLink identityMap. There are different cache policies available, including timed expiry, weak references, never expiring, and others. These can be set on a per-class basis, or on the system.

Unit of Work

The unit of work protocol is similar to TOPLink. You begin a unit of work, register objects, and then commit or roll back the unit of work. On rollback, all registered objects and everything reachable from them reverts to its state when it was registered. On commit, all modified and/or new objects that were registered or are reachable from registered objects are written to the database.
There is only one unit of work at a time. They cannot (as yet) be nested, and code continues to work with the original objects.
Note that although the basic protocol is similar, the internals of the unit of work implementation are quite different. In particular, there are no write queries in Glorp. Instead it generates a “RowMap” containing all rows for registered objects, computes the set that needs writing, and then computes a write order based on this, and then finally writes all the rows at once. This is useful in that it makes rows easy to compose from multiple objects, so back-references on one-to-many relationships aren’t required. It also makes it easier to use performance features like Oracle array binding.
It is very important to note that Glorp supports writes ONLY via unit of work. There is no equivalent to TOPLink’s #writeObject:. This is very useful in some ways, but is probably the largest point of incompatibility between TOPLink and Glorp. It may be possible to simulate a writeObject: type of operation by starting a unit of work, registering only the one object and then committing that unit of work, but this hasn’t really been tried. A related issue is that there is no current support for “private owned” relationships, which are primarily used with writeObject: and for cascaded deletes.

Expressions

Glorp expressions are broadly similar to TOPLink 5.x expressions. They can be built up either from “query blocks” or directly. For example
[:each | each owner address street = ‘Main’
AND: [each id ~= 3]].
This is parsed in the same basic manner as TOPLink. The block is evaluated with a doesNotUnderstand: proxy passed in as the parameter. A tree of messages is built up on top of this proxy and a tree then built out of it. Like TOPLink this does not handle inlined messages well, so it is necessary to use constructs like AND: or & rather than the system and:.
Expressions do not have to be written in terms of object relationships, but can be constructed directly from fields. For example
query where: [:each |
((each getTable: linkTable) getField: linkTableField)
= ((each getTable: entryTable) getField: idField)].
It’s also possible to construct expressions directly. An expression is always based on a BaseExpression. For example
base := BaseExpression new.
id := base get: #id.
id get: #= withArguments: (Array with: (3 asGlorpExpression))
This can be quite tedious, so it is normally used only for piecing together other expressions. All internal queries are generated in terms of these expressions, so anything the system can do can also be built as a user query.

Also note the #asOuterJoin message which is useful in an expression, e.g.
[:each | each address asOuterJoin street = ‘foo’].

Queries

One of the most important features in an object-relational mapping system is the ability to issue arbitary queries. Like TOPLink, Glorp does not have a string-based query language, but rather constructs query objects, and uses Smalltalk blocks as one of the primary building blocks for them.
Like TOPLink there is shortcut protocol on the session for simple querying operations, but more complex operations require creating a query object and executing it against the session. The basic operations are
Query readManyOf: aClass where: anExpression
Query readOneOf: aClass where: anExpression
Queries can specify their result collection type (only for readMany queries, obviously). The possible types include dictionaries. For example, in GlorpDictionaryMappingTest we want a collection of items as values in a dictionary keyed by a value in the link table. This is indicated as
query
retrieveDictionaryWithKey: [:each |
(each getTable: linkTable)
getField: refField]
value: [:each | each].
The collection type can also be a GlorpCursoredStream. In this case we don’t read the entire result set at once (assuming that the database supports that) and instead return an object that responds to Smalltalk stream protocol but returns objects. This also removes duplicates if we’ve done joins that would cause an object to be returned more than once. There is also a GlorpCursoredCollection, which returns something which responds to sequenceable collection protocol, but is represented internally by a cursor. It won’t read the actual objects until they’re needed.
Queries can specify an ordering, indicated by the orderBy: method. This can take either an expression block, or a symbol representing an attribute of the object. Multiple orderBy:’s indicate secondary, tertiary, etc. orderings.
Virtual collections exist. See GlorpVirtualCollection.
By default a query reads objects of the class it was given. It’s possible to tell the query to bring back more or less data using the #retrieve: and alsoFetch: methods. The #retrieve: method tells the query to bring back a particular piece of data. That can be an attribute of the class, in which case we will bring back only that data rather than the whole object. It can also be a related object, in which case we will return that object as well. The two forms can be mixed arbitrarily. e.g.
aQuery retrieve: [:each | each id].
aQuery retrieve: [:each | each firstName].
aQuery retrieve: [:each | each address].
This will bring back a collection of arrays. Each array will contain a number (the id) a string (the first name) and an address object.

The #alsoFetch: method is similar to #retrieve:, but doesn’t affect the returned results. It just ensures that the related object is also fetched and goes into cache. The returned result from the query remains the same, but related objects will automatically be accessible.
aQuery alsoFetch: [:each | each address].
The contents of a retrieve: expression can also be functions. This includes aggregate functions. so
aQuery retrieve: [:each | each countStar].
or
aQuery retrieve: [:each | each purchaseAmount max].
These two features together are roughly equivalent to TOPLink’s report query and to joined reads. Note that when traversing relationships it may be important to add an #asOuterJoin, otherwise rows may disappear unexpectedly from the results.

Another performance optimization are filtered queries. This is roughly equivalent to TOPLink’s batch reads. In a filtered read we don’t read the person and their address together. Instead we read a group of people, and if we fire the proxy for any of their addresses, the addresses of the entire group are fetched at once. This makes two database round trips instead of one using a #retrieve:, but is much better than the hundreds we would make without either, and is nice because you don’t have to know in advance whether or not you want the related objects. This can also be set on a mapping so that the filtered reads are automatically used.

Specific Features

This section lists specific areas that, as of Glorp 0.3.6 that are notably missing features relative to TOPLink. I attempt to list them roughly in order of importance, though this will obviously vary by application. I note that the difficulty of addressing these varies widely, and some would be quite simple to add.
- The descriptor protocol, while similar in principle, is not at all compatible in detail. It would probably be possible to have a reasonable compatibility layer, but nothing of this sort exists.
- There is no ability to read table descriptions from the database. These must be manually entered in code, which is quite tedious for any significant-sized system.
- Thread Safety. Glorp is currently only minimally thread safe. There is no copying unit of work equivalent to ServerSession/ClientSession. The safest technique is to use one session per process. This works well, but means that there is no sharing of read-only objects.
- There is no connection pooling yet, which can be a problem given the previous suggestion.
- Inheritance support. Currently only two of the three possible strategies are supported, and there may be issues with what’s possible relative to TOPLink.
- Exceptions – Glorp has, as yet, very few specific exceptions or validation, so errors in mapping or query execution will be harder to catch and diagnose.
- There is limited support for validation of descriptors, queries, etc.
- Support for using custom SQL is currently much more limited. The basic mechanisms are available (see e.g. SQLStringSelectCommand) but hooks are not yet provided to let descriptors or mappings use handwritten SQL for particular functions. There is no support yet for stored procedures. At the basic level, this is just invoking a command with a specific sql string and binding arguments)
- There is no support for writeObject:. This is arguably a feature, but may be a problem for compatibility.
- Expressions cannot be evaluated against objects in memory. This means that queries won’t find modified or new objects in memory. This was an optional feature in TOPLink, since it could be quite expensive.
- Sequences support identity columns and sequences, but not allocation in a separate sequence table.
- Locking options. Glorp supports basic optimistic locking using a timestamp or version number field, but there is no pessimistic locking and some optimistic options are missing.
- There is no “private owned” mechanism, important for cascaded deletes.
- There is no equivalent to shallow write. This can be an issue for cyclic relationships, where there is no order of writes that can do a correct insert. However, this is really only an issue for databaes with identity columns, since we otherwise do sequence pre-allocation and the primary keys should all be available before any writes are done.
- Some mappings are missing – notably Object Type for enumerations, Swapped Objects, Direct Collection, and Variable Class relationships
- Embedded values cannot (I don’t think) be nil.
- Some of the more advanced features (e.g. weak cache, parameter binding, array binding) are only implemented for VisualWorks.
- There are no existence checking options. An object which was read from the database will be assumed to be an update. All others are assumed to be inserts.
- There are no descriptor events (although they aren’t as necessary, because things like inserting a last-modified-date can be done with a generator field type rather than using events (see e.g. GlorpDatabaseValueGenerator, VersionType)
- The set of database functions is quite limited so far, although I believe the infrastructure is there to support any function, including database-specific functions. The list just needs to be filled out.
- There is no ability to specify an error block for a query or for a session in general. I’m not sure an error block is a good idea, as it’s a practice that largely predates exception handling, but it may be a compatibility issue.
- The set of subselect operations available is limited to anySatisfy/noneSatisfy type of operations. TOPLink 5.x has more flexibility. However TOPLink 4.x has significantly less.
- There is no ability to do no-cache reads/writes.
- Partial reads of an object, where the remaining instance variables will be put in as proxies and retrieved when necessary, is not supported. However, there is a good equivalent to report query, using the retrieve: and alsoFetch: operations. This can provide similar functionality, although it retrieves just data rather than the partial objects.
- There is no equivalent to named queries and the query manager.
- There is no session broker equivalent
- There is no equivalent to TOPLink’s query by example. However, query by example was quite weak, and there is additional flexibility in queries that can fairly easily compensate (or be used to implement a query-by-example facility).
- There are no equivalents for some of the TOPLink add-on tools, including the profiler and integrity checker. And of course there is no equivalent to the TOPLink 5.x graphical builder.
Comments