Materialized Views

This technique is really more about relational databases, but as it helped me solve a performance issue I encountered in my KNIME Climate Data Access, I will include it here.

I wanted to create some KNIME Data Apps on extreme rainfall events.

I have a table in my Postgres database called daily_rainfall, which has one row for each weather station for each day for a period of 10 years, so the table has over 800 million rows. And as I am running this on a small 8 GByte Ubuntu platform, it is giving me performance issues - long waiting times for users of the Data Apps.

Materialized views came to the rescue. I first ran into them working with Oracle databases many years ago. Postgres also has them and I suppose most mainstream RDBMSs will have them as well.

Materialized views are kind of an evolution of 'normal' RDBMS views. Technically, a 'normal' view is just a stored SELECT statement. For my use case, let us suppose that I want to make a Data App about the rainfall data, but just for the year 2024. For this, I could create a view like this (pseudocode) :

CREATE VIEW daily_rain_2024 as

SELECT "station id",

date,

rainfall

FROM daily_rainfall

WHERE year = 2024;

In KNIMEs DB Table Selector this would show up like on the screenshot below, in the View folder

OK so far.

But, as I mentioned before, a 'normal' view is actually just a stored SELECT statement. This means that if I were to use the daily_rain_2024 view instead of the original daily_rainfall table, I would not get any performance benefit, because 'behind the scenes', the database engine will just execute the SELECT statement against the underlying (daily_rainfall) table.

When I create a Materialized view, however, the result of the SELECT statement will be stored physically. This means that when afterwards I execute a query against the materialized view, I WILL get a performance benefit, because the query will be executed against a smaller table (in our example with only 2024s data)

The syntax to create a materialized view is very similar to that for a 'normal' view, basically you just replace CREATE VIEW with CREATE MATERIALIZED VIEW.

Below you see how the materialized view(s) appear in the DB Table selector node :

One drawback of materialized views is that they are static: this means that if new rows are added to the daily_rainfall table, they will not automatically appear in the mv_rain_2024 view. The solution to this is to REFRESH the materialized view, with a SQL statement like REFRESH MATERIALIZED VIEW mv_rain_2024.

(NOTE: in my example case, if now near the end of 2025 I am sure that no rows for 2024 will be added to the base table , I would not need to refresh my mv_rain_2024, but that is a design decision.)

An easy way to automate this would be to add a DB SQL Executor node to the end of the ETL workflow that loads the base table, as seen in the screenshot below.

Page updated

Google Sites

Report abuse