Visualization: CEDAR-DV, The Taming of the Shrew

By Sandra Schloen, June 2020

Charting, for Speaker analysis: The Taming of the Shrew (CEDAR-DV)

Data being generated by the CEDAR, Digital Variorum project presents interesting opportunities for visualization which we highlight here by way of demonstrating the Visualization Wizard (VizWiz).

Disclaimer: Note that this is presented as sample data only and we make no comment on its interpretation.

Quantifying data using OCHRE Properties

Producing interesting charts typically requires a set of quantitative data which is not how we normally think of textual data. But those familiar with OCHRE's strategy for managing texts know that we represent a text using an epigraphic hierarchy representing the pages, lines, signs or character-strings, and/or a discourse hierarchy of words, sentences, phrases, and so on.

For Shrew, the CEDAR team has broken down each Page of the Bodleian Folio to identify stage directions, character identifications, and the speeches by each character. The words of each speech unit are collected within a parent discourse unit, labeled as to the speaker and line numbers of the play; for example, "Hostess 208.1.5" represents the speech by the Hostess character on page 208, column 1, line 5. This item is tagged as a Speech, and identifies the character using its Person item.

Fine-grained data is important for careful analysis, but it is often desirable to aggregate quantitative data to determine proportions, distributions, or other kinds of trends. For this we created a derived variable to count the number of words within the speech unit, and another to count the number of characters within the speech unit. These properties automatically determine the appropriate counts and Save these as property values for each speech item.

Pie chart

The most simple chart is the pie chart, illustrating relative proportions of numeric data. The Format tab of the VizWiz's Content selection pane considers the data included in the Set and determines which of the Variables represented among this set of data are numeric, and therefore appropriate as Data options to chart. Variables that are nominal or ordinal are presented as options for the other dimensions to chart.

For a quick view of the speaking roles in Taming of the Shrew we created a Set containing all the Speech items. For this example we chose the Word Count variable as the source of the numeric data for the chart, and we selected the Character as the other dimension, here called the Series, across which the data will be broken out into percentages. Selecting the Pie Chart type we get the default graph: OCHRE will Sum the numeric data to compile the percentages for the pie chart and sort the slices of the resulting pie chart in descending order numerically.

Click the Draw Chart button to produce the chart.

Customizing the chart

With Petruchio emerging as the clear winner, we explore other ways to customize the chart.

From the Chart Options panel, choosing "Value" instead of "Percentage" as the Data option to display the actual data value, here the actual number of words spoken by the character. Since this will be a whole number, we set the # decimals to zero. To focus on the main Characters, we ask OCHRE to consider only the Top-10 characters. Setting the Legend placement to "Bottom" results in a more vertically-oriented display when we refresh by clicking Draw Chart. (Note, that instead of choosing Top-N you could also consolidate items below a specified threshold value into an "Other" class (note the Consolidate below spinner on the Chart options panel).


XY-Series Charts

An XY-Series chart lets you plot data across multiple dimensions, and OCHRE supports a variety of other chart types, shown here, a simple Bar chart.

On an XY-Series chart, the numeric Data variable is plotted on the vertical dimension (the y-axis) and the Series variable is plotted along the horizontal axis (the x-axis).

Statistics options

OCHRE's Charting support offers a basic set of statistics options, the default behavior being to Sum the values of the Data option variable(s). Other options include Maximum and Minimum, plus Mean, Median, and Mode. For clarity, if the Statistics option is not the default, then it is added to the y-axis label, e.g. "Word count (mean)".

Speeches tallied, by Character, simply by counting the number of items tagged as a Speech (a nominal property)

Simple counting

If the Statistics option is set to Count there is no need for a quantitative property to Sum or a derived variable to calculate. Instead, choose a Series option for the x-axis and OCHRE will simply tally and plot the number of occurrences of each Value of the selected Variable. If the Data option is set to Percentage, the counts will be displayed as such rather than as absolute values.

(The multi-line graph is illustrated here but the Bar (series), or Stacked bar would also be appropriate options.)

Multiple data options

Let's say that instead of plotting the calculated sum of Rims, Nozzles, Bases, Handles, and Sherds (RNBHS Count or Weight), you want to see each of these Data options plotted separately. Multiple Data options can be selected (instead of a Group-by option) to plot along the additional dimension (e.g. the multiple line/bar/area charts, default or stacked). In this case we select multiple Variables for Data (Base count, Rim count, etc.) a single Series option, and no Group-by option.

Multiple Variables for Data with a single Series option ("Character"),
Here is the same data as above, but as a bar chart, showing multiple Variables for Data as separate bars.

Adding the dimension of Time

OCHRE was designed to handle the dimension of time using the Periods category, and these Periods become a factor in the charting process. An XY-Chart lets you include an additional dimension on the chart, a Group-by option based on a nominal or ordinal variable, to create an outer-level breakdown of the property values. It is often appropriate or desirable to use the Time dimension as the group-by option.

Charting across time is straightforward when the items being charted are, themselves, assigned to a time Period, as is the case here where each Speech is assigned to the Act/Scene (a Period described by the property Scene#) to which it belongs.

The time Periods — that is, the Acts and Scenes — are itemized in OCHRE's Periods category and organized hierarchically, Scenes within the Acts. Each Period is also described by "Period Type" as an Act or a Scene, allowing us to filter either by just Acts or just Scenes. Each Scene is also assigned the (alphanumeric) property "Scene#", giving us a descriptive variable to use for charting.

With an itemized list of Acts and Scenes, and with each Speech assigned to its appropriate Scene, we can chart speaking roles across the time dimension of the play. Note that any Series or Group-by option based on an OCHRE Period will automatically be sorted in time sequence based on the Period hierarchy.

Variations on the Theme

Try some variations using different chart types and statistics.

Here, for example, is the same graph as above but with consolidating all characters who speak fewer than 100 words into an "Other" category.

Here, for example, is the Mean speech length by Character. "Lord" is apparently the most long-winded!

Use the Save button on the chart menu to save any chart as an image file (PNG format).