How Glorp Works

Introduction

GLORP (Generic Lightweight Object-Relational Persistence) is a simple (well, relatively simple) framework for reading and writing Smalltalk objects from relational databases. Many of its design principles are inspired by The Object People/Oracle's TOPLink products, and we extend our thanks to The Object People for sponsoring this project and assisting with development. Most of the initial development was done at Camp Smalltalk 2000 in San Diego.

This document provides a quick overview of roughly what it does, the design principles, and the current implementation strategy. Hopefully this, in combination with the test cases and the code is enough to allow someone to usefully work on Glorp without needing to consult the authors. It'd be nice if it had some pictures, but it doesn't.

Glorp currently consists of 2 subsystems

Glorp - the code

GlorpTest - the tests cases, and associated support classes

Fundamental Architecture

Principles

Meta-data driven - There is a declarative description of the correspondence between an object and its database representation, and this is used to drive reads, writes, joins, and anything else we need to do. Code does not have to be generated, although at some point it may be possible to precompile operations into a static form.

Transaction-driven - There is no explicit write operation for objects. Objects become part of a transaction and are automatically written, if necessary, when that transaction commits. Currently this supports only one level of transactions, but there are pending thoughts on doing nested and parallel transactions using the same fundamental mechanisms.

Non-intrusive - Is should be possible to use most of the facilities of Glorp (parallel transactions being the exception) with unmodified objects. Glorp should also be able to accommodate a wide variety of database schemas and object models.

Terminology

Session - The mechanism the application uses to talk to the database.

Query - A representation of reading objects from the database

Descriptor - For each class mapped by Glorp there is a descriptor, describing in overall terms how that class is mapped.

Mapping - For each instance variable of a mapped class, there is one mapping which describes how that attribute is read and written.

Mapping

To map, we need a model of the table structure and the object structure. Currently both are built up in code. We expect that in the longer term table structure might be imported from the database and there might be some mechanism for more easily describing the mapping.

The current example of mapping is shown in GlorpDemoDescriptorSystem. This class has the ability to generate descriptors and table definitions for all of the test classes and tables. It builds up each by perform:ing "descriptorFor" or "tableFor" the appropriate name. Note that identity is preserved at all levels using this system. That is, asking for the same descriptor twice will produce the identical descriptor. Asking for the same table twice will produce the identical table, and fields within the table are treated likewise. Preserving identity on these objects is fundamental to the operation of Glorp. Not only does this allow more efficient lookups, but it avoids issues of name matching, complex printing, copying, and so forth.

Tables and fields also have strong semantic representation. Fields know whether they are primary keys, what their position in a table is, their type, etc. Tables contain foreign key information which should correspond to the foreign key constraints in the database. This is used for determining write order.

Mappings come in different varieties, each of which is defined differently

DirectMapping - A direct correspondence between an instance variable and a field (e.g. name<->NAME) e.g. (DirectMapping from: #name to: (myTable fieldNamed: 'NM'))

OneToOneMapping - A one to one relationship in terms of foreign keys

e.g. OneToOneMapping new attributeName: #address; referenceClass: Address; mappingCriteria: (Join from: (personTable fieldNamed: 'ADDRESS_ID') to: (addressTable fieldNamed: 'ID')).

OneToManyMapping - A one to many relationship in terms of foreign keys, represented in the database by a foreign key from the many to the one.

e.g. OneToManyMapping new attributeName: #emailAddresses; referenceClass: EmailAddressAddress; mappingCriteria: (Join from: (personTable fieldNamed: ID') to: (addressTable fieldNamed: 'PERSON_ID')).

ManyToManyMapping - A many to many relationship, represented in the database by an intermediate link table. Note that it's also possible in general to represent a one-to-many relationship with a link table. This is not currently supported. In memory, a many-to-many will require two mappings, one for each side of the relationship.

e.g. in Customer ManyToManyMapping new attributeName: #accounts referenceClass: BankAccount mappingCriteria: (Join from: (customerTable fieldNamed: 'ID') to: (linkTable fieldNamed: 'CUSTOMER_ID')). in BankAccount ManyToManyMapping new attributeName: #accountHolders referenceClass: Customer mappingCriteria: (Join from: (accountTable fieldNamed: 'ID') to: (linkTable fieldNamed: 'ACCT_ID').

EmbeddedValueOneToOneMapping - A one to one relationship, where the target object is not stored in a separate table, but rather as part of the row of the containing object. In cases where the object might be re-used in multiple places, this mapping also supports aliasing the fields from where they "normally" would be stored.

e.g. EmbeddedValueOneToOneMapping new attributeName: #amount referenceClass: Money fieldTranslation: ((Join new) addSource: (transactionTable fieldNamed: 'MONEY_AMT') target: (moneyTable fieldNamed: 'AMOUNT'); addSource: (transactionTable fieldNamed: 'MONEY_CURR') target: (moneyTable fieldNamed: 'CURRENCY'))

Join

All of the above mappings use Join to specify the foreign key relationship. This is defined as a list of source and target fields with implicit equality between them. When used in a mapping, it is expected that the source fields come from tables associated with the object containing the mapping, and the target fields come from tables associated with the other table (which might be a table directly associated with the other object, or might be a link table).

Note that there is no distinction as to which table is the source of the value. The join simply specifies that both of the values have to be equal. When there are multiple source and target fields, the semantics are that source[i]=target[i] for all i, with each of those clauses ANDed together.

Also note that in embedded value mappings, the join can also be used to express the field translation between the "real" table of the target object and the way it's used in the table it's embedded into. The primary purpose is still the same, indicating which fields correspond to which.

Reading

Reading happens by asking the session to execute queries.

Queries need to know what kind of object they're looking for, whether they want one or many, and what the where clause should be. The complicated part of the where clause is described (briefly) below. The rest is relatively easy. First, figure out if the query is finding one object by primary key. If it is, do a cache lookup and if you find the object don't bother going to the database. Otherwise, build the SQL, execute the SQL, get back a bunch of rows, and build the objects out of the rows.

Building the SQL is complicated by 2 things. First, we can have parameterized queries. The most common source of these is mappings. Every mapping implies a query, and that query takes parameters that are derived from the row we read for the parent object. In fact, when we construct a proxy we store 3 things: the session to execute in, the query to execute, and the parameters to execute it with. This isn't that hard, we just need to know the order of parameters and do substitution when we print the SQL.

The second thing is the "Tracing". This doesn't entirely work yet, but the idea is that a query can be executed with some specification of other objects which are to be read as part of the same operation. Effectively, you can say read employees, but while you do it read their manager, their address, and their department. Some things are traced by default (e.g. embedded values) others are optional. Currently this is really only exercised by the embedded value mappings. Given the query and the tracing, the query is broken down into multiple SimpleQuery instances, each of which knows a collection of nodes from the tracing graph it should read. (Note: the tracing nodes are the same expression objects as used in the where clause). Each simple query should be able to run as a single SQL statement. It will figure out which fields and which tables need to be read, construct a collection of element builders for each of those objects, read the data, then ask the element builders to build their objects.

At least with the VisualWorks drivers, query results come back as arrays, which is nicely efficient but means that we need to know how to index into them for particular fields. Fields know their position in the table, so that's easy for a single-table query. If a query might bring back multiple tables, or only partial results, we need to map that position into a position in the result set. This is done through the element builder, which maintains an array mapping regular field positions in the tables corresponding to that element builder's object to the field positions in the actual result set.

This code is not exactly clean yet, and has the fewest tests. It's also way over-general for what the tool can do right now, but it promises to be very cool once it works.

Where Clauses

The where clause is specified through a block.

e.g. Query returningMany: Employee where: [:emp | emp name = 'Fred']. or [:emp | emp manager manager address city = 'Ottawa'].What we do with a block is pass in a doesNotUnderstand: hander object which is used to build up a tree representing the message sends. We then use this tree to build up a full expression tree of more complex objects (GlorpExpression and subclasses) representing the where clause. This can then be manipulated and printed as SQL.

nil or false may be used as a "no-clause" and true may be used as an "all-clause"

Writing

Writing is only possible through a transaction, currently called a UnitOfWork. The sequence of calls is

session beginUnitOfWork. session registerObject: Customer new. session commitUnitOfWork.When an object is registered with the unit of work, it, and any objects reachable from it will be noted as part of the set of objects associated with this transaction. Reading an object while a unit of work is in progress implicitly causes registration of that object. When the unit of work is committed, these objects will be written.

Writing occurs in three phases. First, rows are generated in memory for all objects which are to be written. Then, the order in which these rows are to be written is determined, based on the foreign key constraints between the tables and rows. Finally, the rows are actually written to the database.

In the first phase, we generate rows. These are stored in a RowMap, which is essentially a group of identity dictionaries (one per table) mapping from objects to the rows that they are stored in. For each object, we iterate over all the mappings. Each mapping can do one of two things. It can put values into fields in a row, or it can assert an equality constraint on some fields. For example, suppose we have a Customer C1 named 'John Smith' being written. Customer has a direct mapping from the instance variable 'name' to the field 'CUSTOMER.NM', so that mapping would look up the row for the customer table indexed by C1. If there is no such row, it would be created. Now it puts the value 'John Smith' into the field 'NM' of that row.

So far so good. Now Customer has a one-to-one mapping to Address. This asserts that the field CUSTOMER.ADDRESS_ID and the field ADDRESS.ID should be equal. The mapping looks up the CUSTOMER row indexed by C1, and looks up the ADDRESS row indexed by his address A1. Again, if either row doesn't exist it will be created. Now the mapping asserts that these two fields must be equal. It does this by storing a wrapper (FieldValueWrapper) into the row, rather than a literal value. If multiple fields should have the same value, we ensure that their rows contain the same wrapper, using the FieldUnifier. We call it a unifier because this is sort of like Prolog unification, only without the need to backtrack.

So far we have two rows in the RowMap. One for Address A1, which contains no values yet, and one for Customer C1 which contains a name. Further, the row for A1's ID is constrained to have the same value as the row for C1's ADDRESS_ID. Neither has a value yet. Eventually, when we execute the mapping for ADDRESS.ID, a value will be written into A1's row and will automatically appear in C1's row.

This is a very cool mechanism, because it means that we only know about two operatiosn on these rows, objects never need to get values out of other objects, and it doesn't matter at all in which order things are written into the RowMap. At the end of it all, we have all the values that matter, and things that should be the same are.

Complications

There are a couple of complications that arise with writes.

With many to many mappings there is a table which has no corresponding object. In this case we use a spurious object as the key (RowMapKey) which is a two-instance variable object with key1 and key2. This provides an index into the dictionary, and is easily found by either object. It is also keyed by equality rather than identity.

With embedded value mappings, the fields have to be translated when doing writes. It would be possible to have the mappings be aware of whether or not the object containing them was part of an embedded value mapping and take that into account when writing fields. That would be one way to do it. However, a simpler mechanism is to allow all the writes to take place normally, and to use the fieldTranslation expression to unify the "normal" fields and the embedded fields. This ensures that the correct values are placed in the embedded fields, but leaves an additional row which must be marked as not to be written.

What Works

Mapping is there but all has to be done manually.

Writes work pretty well. There's no comparison of changes, everything in a unit of work is always written.

Reads work reasonably well.

There's no exception handling at all, the entire library is not at all thread-safe, and there are lots of other things missing.

For more information, see the list of tasks, which has a pretty good implication of what doesn't work yet.

Last edited: Tuesday, October 7, 2003 7:42 AM