Page Action: Add Data Aggregator Pipeline

An Aggregating Pipeline performs simple calculations as data is imported. This speeds up the time from raw data to insights. As with the other pipelines, Aggregated Pipelines are added on the Manage Pipeline page. Selecting "New Aggregated Processor Pipeline" after clicking the "Add New Processor Pipeline" displays the first page of the specification flow:

As with the Data Processor Pipeline, users select the source, the first file, the type of file and the pattern that the pipeline should match to get the next file in the sequence.

There is an option to delete the Static Data file once it has been imported. This will leave the aggregated file application and the original Static Data file will remain in the source storage folder.

Users can also rate limit these pipelines to prevent mistakingly importing too many files at once.

In Stage 2 of the specification of the pipeline, users select which variables that are required to perform the aggregation and any calculations.

On the next screen, users can select the combinations they want to sum data within. Each combination will produce a separate file for each source file produced. In this example the applicaiton will produce a file for STOREID * Category, one for STOREID and one for CATEGORY.

Once the combinations are selected, users can then specify how to sum the other other variables selected. At present there are two options Sum and Presence. Sum adds the numbers together for that field at the specified level of aggregation. For example aggregating at a Store * Category level the Sum of QTY will calculate the sum of units for each category in each store ID. This is the right calculation for numeric variables.

The Presence calculation will provide a count of the number of unique instances in the variable QTY and save that for each Store and Category combination. This is the right calculation metric for multi-dimension (categorical) data.

The last step is the specify any filtering of the data to restrict the aggregation to a sample of the data files. IMPORTANT: Since filtering is performed downstream the field that is being filtered must be present in the final aggregated file.

Specify any cloud storage locations to deliver it to via the ADD DELIVERY function.

Specify an additional suffix for the file to make it easier to find in the STATIC DATA table. Click "SAVE" to create the pipeline.

You are now returned to the Manage Pipelines Page where you can start the pipeline running.

Report abuse

Page Action: Add Data Aggregator Pipeline

Visit KnowledgeLeaps

Ask A Question