SUT - Automation

There are many tasks in SUT compilation which can be facilitated by automation. We will give some examples here.

Automatic balancing with RAS

Final small discrepancies between supply and use (by product and by industry) can be removed with a RAS procedure which tries to minimize the discrepancies by both rows and columns. This procedure is carried out in NA Builder by using (or “playing”) the RAS rule. “Rules” in NA Builder are stand-alone programs in themselves which can be invoked at any time to carry out particular tasks. Here the task is the RAS procedure. Before it can be used, the general RAS procedure needs to be made specific for the example at hand. This is done by specifying its “arguments”. In this particular example the specified RAS rule is given in the rule editor as follows:

Similar to the case of formulas addresses, ranges in rules are specified in special NA Builder format (containing the 4 elements: sheet, partition, row classification code, column classification code), or as Excel ranges for easy reference, as shown here. Note that the three main arguments to be specified are ranges: the range with the table to be adjusted with RAS and ranges with given row and column totals. The other arguments are optional (“OPT”), e.g. to specify the number of iterations. If no optional arguments are specified, defaults are used (10 for the number of iterations). As is the case for formulas, when classifications change, e.g. by adding more product details, the RAS rule (and any other rule) will remain intact and need not be specified again.

What is the RAS method? The RAS method allows one to adjust the cells of a table in such a way that the row and column totals for the range approximate as closely as possible a set of row and column constraints. The method is best illustrated using a simple example. Let the following table be given:

Let the totals across the rows and columns be constrained to the following:

Let us first try to adjust the values of the table in such a way that the row sums equal the values of the row constraint. RAS updates the table as follows:

The sums across the rows now indeed equal the given constraint.

Next, let us adjust the values of the table in such a way that the column sums equal the values of the other constraint. RAS now updates the table as follows:

The sums across the columns now equal the constraint. But as we see from the table, the sums across rows do not equal the other constraint anymore. So RAS starts with another round of updating, first across rows, then across columns. It will keep updating this way until the difference between the row/column sums and the constraints is “small enough” (smaller than the so-called convergence limit, to be specified in advance). In practice one should implement the RAS procedure in such a way that cells, row and columns can be “fixed”, i.e. not part of the proportional adjustment.

The RAS-algorithm (developed in the early sixties of the last century by R. Stone) iteratively adjusts a matrix A (of size m rows and n columns), with row sums u0 (equal to A.i, with i a vector having only values 1) and column sums v0 (equal to i.A), to a new matrix X that satisfies a new set of given row sums u and column sums v (here i.u must equal i.v, otherwise convergence cannot be achieved). The RAS-algorithm produces the new matrix X with the required row and column sums such that:

in which r ̂ and s ̂ are diagonal matrices with positive entries on the main diagonal (the right-hand side of this formula is the reason that the procedure is commonly called “RAS”). A limitation of this algorithm is that the entries of the matrix A must be non-negative. The RAS procedure can therefore not be used on a table containing changes in inventories and net exports, which can all be negative.

The article “The Solution of Updating or Regionalizing a Matrix with both Positive and Negative Entries”, by Theo Junius & Jan Oosterhaven (Economic Systems Research, Vol. 15, No. 1, 2003) presents a generalization of the RAS algorithm, named Generalized RAS (GRAS), which also works for tables with negative entries.

A full description of this method can be found in this article. To see how GRAS (and RAS, which is a special case of GRAS for a table with no negative entries) works, define a matrix Z with entries z(i,j) so that

The solution z(i,j) can be found by finding the optimum for the following Lagrange function (see article):

with P the set of pairs of indices (i, j) for which a(i,j)≥0 and N the set of pairs of indices (i, j) for which a(i,j)<0 (and “ln” is the natural logarithm). Here the n parameters λ(i) and the m parameters τ(j) are called “multipliers”, and they determine the extent to which the constraints are included in determining the optimal value of L.

The solution can be written as:

The vectors r and s with elements r(i) and s(j) can be found as follows. Define

Here p(i,j) are the elements of the matrix P, containing the positive elements of A. Also define

with n(i,j) the elements of the matrix N, containing the negative elements of A. Similarly define

The solution vectors r and s can be found using the following formulas:

(with “e” being the natural exponent function with power 1)

The procedure to find the solution is now as follows:

  • Select an initial vector r(0) = i’ (a vector containing only the values 1; the reason for this choice is that, in this case, the equilibrium row multipliers r may be interpreted economically as “substitution effects” and the column multipliers s as “fabrication effects”, see article)

  • Calculate s(1)=σ(r(0)) using equation (5)

  • Calculate r(1)=ρ(s(1)) using equation (4)

  • Repeat this until the r and s vectors converge

  • Calculate the matrix Z using equations (2) and (3)

  • Calculate the matrix X using equation (1)

The GRAS algorithm has been added to NA Builder as an option to the available RAS rule. This rule uses the following arguments:

  • Arg1: <range to RAS>

  • Arg2: <row range with (column) totals>

  • Arg3: <column range with (row) totals>

  • Arg4: <OPT: range with exceptions (non-zero cells)>

  • Arg5: <OPT: range for results>

  • Arg6: <OPT: # iterations (-1=only rows -2=only columns; even=rows, columns; odd=columns, rows)>

  • Arg7: <OPT: conv. limit>

  • Arg8: <OPT: 1 = write if not converged>

  • Arg9: <OPT: 1 = write changes with respect to original>

  • Arg10: <OPT: 1 = GRAS; SHOWSCRT>

  • Arg11: <OPT: range extensions>

Arg10 facilitates the GRAS method.

It should be noted that the differences of the initial table and the constraints should not be too big. Also, there can be more than one optimum, in which case it is possible that the prorate and GRAS algorithms give different results. This will become less likely if the differences of the initial table and the constraints become smaller.

The example used in the paper is:

The table after applying GRAS is:

A few more comments on the RAS rule:

  • One can control the order of the procedure by using arg.6 which contains the number of iterations; if set to -1 only a rows adjustment is carried out and if set to -2 only a columns adjustment; if this number is even then first the rows will be adjusted and then the columns, if odd the procedure is reversed

  • One can exclude single cells or ranges of cells from the RAS procedure by using arg.4 which may optionally contain a table with the same dimensions as arg.1 and where non zero entries are interpreted as "fixing" the corresponding entries in the arg.1 table (i.e. these will not be changed by RAS)

  • The ranges in arg.1 etc. are normally restricted to partitions or parts of partitions, but in the SUT one may extend them over partition boundaries; this can be accomplished by arg11 can be used to extend the ranges with extra columns and / or rows. E.g. arg11 = [c2-c3],[r2-r2] will add the second and third columns after the range and the second row below the range. There can be any number of such extensions.

Scripts

Although individual rules can be carried out (“played”) at any time, many tasks are complex and require more than one rule to be carried out. An example is the “constraining” procedure referred to earlier, setting up a number of different formulas in various ranges. Although these formulas can be set up directly from the formula list, it is easier to use the FORMULA rule for this, which is a simple automated way to set up any formula specification prepared earlier. Now, for this particular example we would need to set up five different formula specifications which would require five different FORMULA rules, each of which would need to be played separately. “Scripts” as containers for rules can make this easier by “packaging” the separate rules into one container. The following “constrain” script contains the five rules in the correct order:

Once the script has been specified, it can be used (or “played”) at any time, carrying out the constraining task with one click, irrespective of what particular classifications have been attached to the templates.

Rules of different kinds can also be combined in script. E.g. the following script automates the task of reading in “exogeneous” data for the preparation of the SUT for 2014:

In this script there are FILEREAD, LABEL and PARAM rules, each of which needs to be specified before they can be collected in the script. The FILEREAD rule carries out the file import task, using a “file specification”, which can be viewed (and changed) in the data source editor:

Note that the path “..\” is shorthand for the particular folder where NA Builder is located. Also note that this particular source is a table. The alternative to tabular data storage are data lists given as combinations of sheet, partition, row classification code, column classification code and value, with one value per record, e.g.:

Again, the point of such rules is that they need to be specified only once. Any changes in the file specification can be made directly in the files list, by e.g. changing the Excel file name.

Rounding

Another example is a script for preparing SDMX tables from the balanced SUT:

The concept of rounding as used in this script is straightforward. Given the number of decimals that have to remain after rounding the other decimals are rounded by a normal rounding procedure.

Thus:

  • 6.75 rounded to 1 decimals equals 6.8

  • 6.75 rounded to 0 decimals equals 7

  • 6.75 rounded to 2 decimals equals 6.75

By way of convention a negative value for the number of decimals that have to remain, indicates rounding to multiples of 10. Thus,

  • 6.75 rounded to -1 decimals equals 10

  • 6.75 rounded to -2 decimals equals 0

And,

  • 675 rounded to -2 decimals equals 700

  • 675 rounded to -3 decimals equals 1000

  • 675 rounded to -4 decimals equals 0

Another concept of rounding is cumulative rounding - also supported in NA Builder - which is perhaps introduced best by way of example.

Let the following 2 x 2 table be given:

Rounding this table to zero decimals will give the following result:

Although each cell is rounded correctly, the row totals, column totals and the grand total of the table are all rounded off incorrectly. Using Cumulative Rounding to zero decimals will give the following result:

Some of the individual cell entries are now rounded off incorrectly, but row totals, column totals and grand total are rounded correctly.

Cumulative rounding takes place as follows. Start with rounding cell (1,1). The result will be the value 1 as in the normal rounding case. The difference before and after rounding 0.4 is added to cell (1,2) and cell (2,1) so that the totals for row 1 and column 1 remain the same. The difference is also subtracted from cell (2,2) so that the totals for row 2 and column 2 also remain unaffected by the rounding of cell (1,1). Thus, after the cumulative rounding of cell (1,1) the table looks as follows:

Proceeding now to cell (1,2) it is rounded to the value 3. From cell (2,2) is subtracted the value 0.2 The table now looks as follows:

Proceeding with cell (2,1) it is rounded to the value 4 with cell (2,2) being set to the value 3.6 to remain the balance. The table now looks as:

Finally cell (2,2) is rounded to the value of 4.

Cumulative rounding is especially useful in the context of pro-rating. Say a column C is pro-rated over a table T. Suppose that the values of C are small in relation to the values of T for some of the columns of T. Then it is possible that there will be a systematic rounding off to below. But then the difference between the original T and the T after pro-rating might be different then the original column C, a situation clearly not desirable. The process of cumulative rounding on the other hand will always assure that the difference is precisely equal to C.

Automation with Scripts

As the examples above illustrate complex compilation tasks can be carried out with scripts containing sequences of appropriate rules in the correct order.

Some rules are quite powerful, such as the above RAS and IO (to derive input-output tables, more on this here).

Other rules are convenient during balancing such as ABSORB, DISAGGR, PRORATE, RECODE (more information on these here) and ROUND.

There are also rules specific to working with time-series, such as CROSSCUT, DENTON, FOLD, UNFOLD (more information here).

Rules useful when working with quarterly NA are: COMPARE, INDEX, DISTRIBUTE (more information here).

Other rules not mentioned elsewhere are:

  • Rules for elementary tasks: COPY, EMPTY, EMPTYSHEET, SHEETCOPY

  • Rules for elementary operations: ADD (adds / subtract ranges), DIAG (place row or column entries on matrix diagonal), DIVIDE, MULTIPLY, PROJECT (e.g. matrix to row / column / cell), RAISE, TRANSPOSE

  • Rules for automation of formulas: CALCULATE, FORMULA

  • Rules for working with sheets and views: SHEET, VIEW

  • Rules for saving data: NORMALISE, RANGESAVE, SHEETSAVE

  • Rules for loading data: DATALOAD, FILEREAD

  • Rules for assigning labels in tables: LABEL

  • Rules for assigning program parameters: PARAM.

It is possible to change file, sheet and rule specifications with the PARAM rule, thereby carrying the level of automation a step further. E.g. a compilation script can be made for any year T using a file specification reading in different data files for different years contained in the same folder. This script could contain a PARAM rule with which the file name can be changed. There is a way to specify such a rule so that it uses the input value from the cell "Message" or "Captured range" from the "SB" sheet. So if a particular filename is entered here the PARAM rule will use it to update the file specification which is used in the script.