Correlation machines

The contribution of psychologists, sociologists and the like to mechanical computing

Correlation is a statistical concept that is used in many scientific areas. Two or more quantities are correlated if there is a “connection” between the pairwise values they can assume. Correlation is usually quantified with the “Pearson coefficient of correlation” r, spanning −1 to 1.

In this presentation we will use two quantities: X and Y. Figure 1 shows a number of data sets with these quantities, and the corresponding Pearson coefficient of correlation r .

The interpretation of correlation coefficients is tricky. A high (absolute) coefficient of correlation does not necessarily indicate a linear relation between X and Y. The four data sets in the lower row of Figure 1 all have the same r, but look quite different. Data with a low (absolute) coefficient of correlation can still show a clear pattern. The data sets in the third row of Figure 1 have a zero coefficient of correlation, but still seem to have some connection between X and Y.

The Pearson coefficient of correlation was introduced in 1895 by Karl Pearson and started to be used at a large scale in the beginning of the 20th century by, among others, economists, psychologists, agricultural experts and brewers.

Figure 1: Pearson r value for various data sets (Source: Wikipedia)

How is r calculated?

If there are n “subjects” in the data set, and the subject i has a value xi for X and yi for Y, then r is given by:

So one needs the following sums: Σxi , Σyi , Σxi², Σyi² and Σxi yi. The first four sums are also needed for the calculation of averages and standard deviations. If there are more quantities (X,Y,Z,...) one will need all primary sums, square sums and (at least) pairwise cross sums. And one would like to do all these calculations by entering the data only once…

By 1900 Hollerith tabulators and punch cards could be used for this purpose, but this equipment was large and expensive. Another method consisted of employing human “computers” who, without any knowledge of statistics, performed calculations using pre-printed forms. A large variety of these forms were offered commercially, with small variations of the formula for r and built-in error checking (Figure 2).

The first step in the calculation of a correlation coefficient usually consisted in “transmutation” of the data: the range of the values was normalized to (generally) integer numbers between 1 and 20. After this step one could use a printed multiplication table, and mental addition to calculate the correlation. This calculation could also be performed with a “normal” mechanical calculating machine, preferably one with a very wide result register. The appendix describes how these calculations were done.

It is clear that this was a time-consuming error-prone job. That’s why a need arose for dedicated, inexpensive correlation calculators.

Figure 2: Correlation forms, for method using (xi + yi)

top:[Toops1925]; bottom:[Cureton1929]

Figure 3: Cover of Barlow's tables, 1840

Figure 4: Statistical “abacus” [Toops1926]

A Simple Solution

An early statistical tool looked like an “abacus” with 25 piano wires, one for each “transmuted” of xi. Each wire carried 50 platelets with xi on one side and Ni, ΣX = Ni ×xi and ΣX² = Ni ×xi² on the other). After entering all X-data, the device showed a histogram and by adding the data on the last visible platelets on all wires one got n, Σxi and Σxi². For the cross term Σxi yi one had to calculate xi yi separately.

Note that each platelet is unique!


Dedicated Machines

Hull

By 1921 Clark Hull, a psychologist from Wisconsin, constructed a purely mechanical machine (Figure 5) for calculating the sums mentioned above [Hull1925]. The data, having integer values between 0 and 999, was punched into paper tape that had to be fed multiple times through the machine. Hull could the calculate correlations between a large number of quantities. The resulting averages, standard deviations and correlation values could be registered in thin metal strips for use in correlation-based predictions. The machine was used to give occupational advice for 40 professions based on 60 psychological parameters [Mechanix 1929]. Two versions of this machine were built, one for the Wisconsin Psychological Laboratory and one for the National Research Council. Because some researches started sending Hull data for processing, he proposed to establish a Central Correlation Bureau that would compute correlations on demand. At the time, it was thought that two machines would be enough for all the correlation-needs in the United States.

Clark Hull

Figure 5: Hull's machine

Dodd

Around 1925 Stuart C. Dodd, a psychologist at Princeton University, made a simpler correlation calculator [Dodd1926]. This machine (Figure 6) contained drums on which square numbers where represented by pins of different lengths (separate pins for units and tens). These drums are the square number equivalent of the multiplication bodies as used in the Millionaire calculator.

Dodd designed different versions of this device. Later development was continued by the Cambridge Instrument Co. Inc., New York, who sold these correlation machines to the universities of Harvard, Berkeley, and Chicago.

Figure 6: Prototype and scheme of Dodd's machine

Seashore

A third correlation machine (Figure 7) carries the name of Carl Emil Seashore (Sjöstrand), again a psychologist. This machine was sold around 1930 by the C.H. Stoelting Co. from Chicago, for $550. The machine calculated Σxi, Σyi, Σxi², Σyi² and Σ(xi−yi, using a slightly different formula for r.

Interestingly, this machine calculates the square of xi by taking the sum of 1,3,5,…(2xi−1) using a shifting stack of odd-toothed gears. After the operator has set a pair of xi and yi values using the two dials, and hit the motor key, the machine rotates and shifts the gear stacks until xi , yi , xi², yi² and (xiyi are added to their respective counters. Because this requires a lot of rotations per data point, the machine is provided with an electrical motor.

Figure 7: Seashore's machine and principle.

Henry J. Burt

In 1931, Henry J. Burt, Assistant Professor of Rural Sociology at the University of Missouri, published a report on "The Analysis of Social Data" in which he describes his Selecto-Meter (Figure 8). This was a giant machine located in the attic of University of Missouri's Mumford Hall[UMC2022]. Data were entered by inserting brass plugs in a 7½ feet high and 60 feet long tabulation surface with 135360 holes. The machine had 10 electrical counters mounted on a moving frame that was driven along the tabulation surface by an electrical motor.

Figure 8: Selecto-Meter

Analog methods

If X and Y are regarded as coordinates of point masses in a two-dimensional space, then 1/n Σxi and 1/n Σyi give the center of mass, and Σxi² and Σyi² give the moments of inertia with respect to the X and Y axes.

Price

In 1935 the psychologist Bronson Price [Price1935] proposed to use these properties for the calculation of r : construct a bed of nails on which the data is attached using thin rings, and determine the center of mass and the moments of inertia of the whole shebang. The practical problem was that the bed and the nails had to be very light, or the rings had to be very heavy. Price never tried it himself...

Harsh and Stevens

C.M. Harsh and Stanley Smith Stevens, psychologists at Harvard, created an analog correlation machine working with small balls (Figure 9). r was estimated using the angle between X-Y and Y-X regression lines

Stanley S. Stevens

Figure 9: Harsh and Stevens’ analog “instrument” [Harsh1938]

Platt

Later John R. Platt revived the idea [Platt1943]. Platt was a biophysicist from Michigan who ended up in sociology. He used a flat metal sieve in which the data was set with lead pins (Figure 10). The center of mass was determined by hanging the thing twice, from different vertices, getting the two plumb lines from these vertices, and then obtaining their intersection. For the determination of the moments of inertia the thing was regarded as a torsion pendulum: the whole thing was given a rotation around a vertically placed X-, Y- or diagonal-axis and then released, and the time needed for at least 20 to-and-fro rotations was measured. Platt claimed that he could determine r with an accuracy of 0.01.

John R. Platt

Figure 10: Platt’s analog “instrument” [Platt1943]

The hydraulic device of Schumann

A somewhat more complicated, and more dangerous, device was made by the South-African meteorologist Theodor Eberhardt Werner Schumann [Schumann1940]. This device contained trough filled with mercury in which suspended iron rods were floating (Figure 11). The device was able to calculate multiple correlation coefficients (between X and Y, and between X and Z, ...) in one go. However, the averages of yi and zi should be (made) zero. The correlation coefficients were determined from the rotation of the horizontal “Christmas trees.”

Schumann also used the iron-rod-in-mercury principle to solve sets of linear equations. A trained human computer would need 5m² + ¼ m³ minutes to solve a set of m equations with m variables, while Schumann’s machine could do that in 6m minutes.

T.E.W. Schumann

Figure 11: Schumann’s device [Schumann1940]

The electrical device of Ford

Adelbert Ford, a psychologist of Michigan University, built an electrical correlation machine in 1931 [Ford1931]. The data was entered on a panel with 100 potentiometers (Figure 12). The position of a potentiometer corresponds with the (transmuted) X and Y value of a data point. For each data point the corresponding potentiometer is turned 1 unit. The actual rotation angle for one unit changes with the position of the potentiometer, and this way squares and cross products are “calculated”. Each potentiometer is connected to its own coil, and all these coils together form the secondary side of a transformer. To the primary side a voltmeter is attached which is used to read Σxi yi. Since the transmuted data is required to average to zero, for the calculation of r Ford only needs Σxi yi and √(Σxi² Σyi²), with the latter being set with a complicated calibration procedure.

Figure 12: Ford’s Correlator: a quarter section of the panel with potentiometers for entering data

Post processing

The mechanical correlation machines only calculated sums. To calculate r, square roots and quotients had to be computed using a slide rule or log table (or a mechanical calculating machine). Fortunately, the required accuracy for r was usually only one decimal (in a −1 to +1 range).

Warning

Correlation is not causation. Check out some spurious correlations.

Epilogue

The correlation machines present a unique contribution of psychologists, sociologists and the like to the development of computing devices. This was clearly brought about by practical needs and a contemporary inclination towards technology among psychologists [Draaisma1992].

I have been unable to find correlation machines of European origin. On the contrary: in 1929 the German mathematician Wilhelm Cauer attempted to buy an American Hull machine for Göttingen University [Petzold2000].

Of the mentioned machines no surviving specimens are known to me, except for the Hull machine that was rediscovered in 1997 by Hartmut Petzold in a depot of the National Museum of American History [Petzold2000].

References

A first version of this article was presented at the IM2016 in Trento, Italy