Derived Variables

By Sandra Schloen, August 2017

Updated February 2018; January 2019; June 2020; July 2021

Derived Variables, introduced in the Spring of 2017, provide additional property-based features that have a variety of uses. There are many official derivation Types that can be applied to variables – Aggregation, Calculation, Concatenation, Conversion, Count, Definition, Maximum, Minimum, Selection, Semantic Web, and Substitution – but they can be used in combination to create powerful effects. They are often used in conjunction with the auto-label feature of Predefinitions to derive names of new items. They might also be used as additional columns in Table View (see the Format Specification of a Set) to derive table values based on other properties (which themselves do not need to be included in the table).

Derived variables have a data Type specified in the usual way. Typically they will be decimal, integer, or alphanumeric types. The Derivation type is chosen from the pick-list provided. Note that not all derivation types apply for all data types. OCHRE will ensure that an appropriate selection is made.

Derived variables are often organized by a project within a special-purpose hierarchy of the Property variables category. This makes them easy to find and update as needed.

A derived variable is integrated within the Taxonomy like any other variable, within the context in which it is needed.

Another consideration is whether the derived property is Dynamic. Turn on this option when there is still active data entry that might impact the derived value of the property or when dependent values are likely to change for any reason. Otherwise, to avoid constant recalculation, derived properties will be assumed to be static, by default.

Note that derived properties can be used in tables and graphs even if they have not been explicitly assigned to items. In these cases, (obviously) the derived values will be dynamic, by definition, and will be calculated on-the-fly for the impacted items. The user should expect additional computational delay.

For a default static derivation, the derived value will be calculated when the property is applied to an item and Save'd. Thereafter the saved value will be used. Use this default option when the initially assigned value is not likely to change (e.g., the weight of pottery long since collected; the number of words in a Shakespeare speech; the value of a coin hoard).

To prime the static derived values en masse, add the relevant items to a Set, include the derived property as a Table Column in the Set, and check ON the Recalculate option as shown above.

A few examples will illustrate some of the many possible uses of derived variables.

Selection

The simplest derivation type is that of Selection which allows you to use OCHRE's usual linking mechanism to create an ordered list of variables. OCHRE will search the item on which the derived property is applied for any of the variables, starting with the first one in the list and working down. When a value of a listed variable is found, OCHRE returns that value as the value of the derived variable. The value will be returned in the data type of the original value and formatted appropriately; e.g. as a link, or a date, etc.

Concatenation

Concatenation is a derivation type which can be used with Alphanumeric (String) Variables. This option lets you specify new character strings by joining together (concatenating) a sequence of existing strings. Strings that can participate include any combination of:

  • Hard-coded character strings

  • A few standard template values (that of [YY] or [YYYY] to represent the current year)

  • Intrinsic field values of the ‘self’ item, in particular [Name], [Abbreviation], and [Code]; the caret creates a reference to the immediate ‘parent’ item instead (e.g. [^Name] for the parent item's Name, [^Abbreviation] for the parent's Abbreviation).

  • Any number of property values of any properties on either the ‘self’ item or any of its ancestors

  • The + (plus-sign) which indicates that any hard-coded text preceding the plus-sign is only present if the value following the plus-sign is not blank

A recent excavation project collected faunal remains in paper bags to which a barcode label was affixed. The item was designated by a Location or object type of Faunal remains collection – a name much too long to print on a barcode label. These “bone bags” as they were known were represented by OCHRE database items inserted within the appropriate unit of excavation but they were not uniquely identified by the project. Rather, a derived variable was created which used the Concatenation technique to simply assign the hard-coded string “Bone”; this became the item’s Name when applied via an appropriate Predefinition. Simple enough, but worth it when you get it essentially for free with the Predefinition’s auto-label feature.

A Concatenation Formula may contain any of several templates that refer to intrinsic data of the item to which the formula is applied. These templates are entered in square brackets, valid options for which are [Name], [Abbreviation], and [Code].

A Concatenation Formula may also contain any number of variables linked in (using the hyperlink operator) from the Taxonomy. The value of the linked-in variable, as found on the item to which the formula is applied, or as found on any of its ancestors (if not found on the item itself), is substituted in for the linked-in variable.

Here, for example, the "C" is hard-coded (representing a Ceramic item). The [YY] is an allowable template item representing the current 2-digit year ("17"). The Grid and Square are properties on an ancestor item within which this item is in context. They are joined by a hard-coded dot and followed by a hashtag symbol. The Registered C no. is a serial#-variable, a next-unique-value of which will be assigned to the item.

This Variable, called Auto-label C#, is then specified as an auto-label option on a Predefinition. When the Predefinition is applied to an item, it will be auto-labeled with a Name generated by the formula, in this case something like: C17-65.42#1.

Here is a final Concatenation example of a derived variable used to auto-abbreviate an item using a Predefinition. Again, the Grid, Square, and Finegrid are properties on an ancestor item within which this item is in context. Note that they are joined by a dot and the plus-sign. This indicates, for example, that if the Finegrid is absent then omit the dot following the Square. The hard-coded U is a prefix for the property value of the variable Unit, found on the current item. The [Name] template item represents the Name of the item itself. Thus if this derived variable's Formula is applied to the Abbreviation of an item whose Name is MC123456, the derived Abbreviation would be something like: 92.33.U20 MC123456.

The remaining derivation types apply primarily to numeric-Type variables, either Decimal or Integral.

Aggregation

One of our archaeology projects counts and weighs the Pottery body sherds according to common ware types found at the site. Each ware type that is tracked is inserted as a sub-item within the Pottery Pail item listing the Quantity and Weight of the representative pot sherds. A separate sub-item is used to tally and weigh the Diagnostic sherds.

But in this case we let OCHRE calculate the Total sherd count for the Pail. Simply adding this derived Variable to the Properties of the Pottery Pail item triggers OCHRE calculation of its value. OCHRE checks the sub-items for instances of the listed Based on variables, and aggregates their values.

Notice that the resulting aggregate Value is displayed in a highlight color and is read-only; this is because the value is derived and therefore not editable.

Count

Related to the aggregation option is a simpler form of derivation that merely counts the number of contained items based on the current context. No property specification is needed if you specify the type of item to be counted from the Based on Category pick list.

If a Based-on-variables option is supplied, only those sub-items which have been tagged with any of the given properties are considered in the Count. (The values of these properties are ignored since the matching items will simply be tallied.)

Applying the "Count words" property to a Speech from the CEDAR, Shakespeare project Taming of the Shrew, we get a count of all words contextualized within that designated speech item. When Discourse units are requested as the items to be counted, only items tagged as "word" are considered, and any items whose transcription is entirely blank are ignored. The "Count characters" property also illustrated above counts word-based Epigraphic units instead of Discourse units.

Minimum/Maximum

The Minimum and Maximum options work on the same principle of the Aggregation and Count, only this time picking out the minimum/maximum value respectively. In addition, these options apply to Coordinate Type properties as well as numeric ones.

Minimum and Maximum allow a Qualifier which includes the [SELF] template, so that the current-item-and-all-its-subitems are considered, not just the sub-items. Take, for example, the case where a Locus excavation unit has been given coordinates. But within that unit are Pails or Baskets, which themselves many contain small finds, any of which may have been assigned coordinates. The [SELF] option will consider coordinates on the Locus as well as all of its sub-items when determining the minimum or maximum value.

For Coordinate properties you can also specify the templates [X], [Y], or [Z] (the default) to target one of the coordinate points. A point label matching string, optionally ending in the '*' wildcard, is allowed. In the example above, the derived value will be determined as the maximum value of the coordinate assigned to the item or any of its sub-items whose z-value starts with "T" (e.g. "Top").

Calculation

Use the Add-hyperlink-with-name option of the string-link tools (as shown for Concatenation above) to target selected variables from the Taxonomy to construct a new value based on a simple formula. OCHRE calculations support addition, subtraction, multiplication, and division, along with parentheses for grouping, exponents (e.g. 2^3), square root (e.g. sqrt(9)) and a few trigonometry functions (sin, cos, tan).

Within the constraints of the available operators, constant values (e.g. pi) can be used in calculations like in the calculation of "Circumference" shown below.

Here is another example that calculates the percentage of the area of unheated halls relative to the overall bathing area in ancient bathhouses.

Display format mask

On the options pane of a numeric variable, use the Display format mask to format the value.

Note the single quotes around the percent sign (to add a literal character). Add decimal digits to indicate the number of significant digits to include (e.g. 0.00'%' to show 2 digits after the decimal point).

Caution: Do not use any special characters in the name of a Variable used in a Calculation that could be construed as having a mathematical purpose. For example, do not use the asterisk, plus/minus signs, slash, parentheses, etc., in the name of a Variable. E.g. "Weight (kg)" would NOT be valid, as the parentheses would be interpreted mathematically.

Conversion

This option works just like Selection, described above, but then converts the resulting value to the units specified. Decimal or integer variables are expected here.

Substitution

"Quantity, qualified" is a Variable used to describe Greek coin hoards when they are cataloged as having "a few", "some", or "a great many" coins in the hoard, for example. This type of qualified data would not normally be available for computational methods. But OCHRE gives the option of imputing a value for each of the qualifications and thus converts them to numeric equivalents which can be used, say, as input to a statistical process. By using the Substitution derivation option, the user can specify a numeric value to substitute in for each of the nominal/ordinal values of the property. Note that you need to own the values in your own project in order to give them imputed values using Substitution.

Definition

The Definition derivation option provides a mechanism for quantifying phrases or clauses in a Text based on a chain of information that relates quantities and their measures, and provides OCHRE with the means to evaluate them. OCHRE will look up the definition of an implicated word -- one that represents a measure (e.g. "kurru" in the phrase "15 kurru") -- in the project Dictionary. On the Properties of its Dictionary entry, the predefined variable Measure will link to an appropriately defined Concept item that provides a Conversion factor reconciling the measure with the specified standard unit.

From the phrase or clause of a Text where the derived variable, here Measurement (l), is applied, OCHRE will search its subitems for a word that represents a measure (based on its dictionary definition) and a word that represents a number (based on its Type), and will calculate the appropriate value given in the specified standard units.

Here, for example, is the system of measures (Concept items) on which the derived variable is based ...

... where "kurru" is defined as a Concept item with a Conversion factor that indicates how it relates to the standard measure (LB volumetric measure) defined by this project:

(Note that the hierarchy LB Measures needs to be linked to OCHRE's master system of Measurements; contact the OCHRE Data Service to achieve this.)

Here is the Dictionary entry that is, by definition, a Measure (where Measure is the OCHRE predefined variable -- link this into the project's own Taxonomy). That is, by definition it is linked to an appropriate Concept item via the property Measure.

Finally, here is the derived variable itself:

This, the derived variable Measurement (l), is the property that is applied to a phrase from a cuneiform Text. For an example of this see Summarizing Complex Units.

Semantic Web

The Semantic Web derivation option provides a lookup and linking mechanism for interacting with the world of the Semantic Web. See OCHRE and the Semantic Web for details.