In this topic we will be looking at (near) Real Time Business Intelligence. There are many possible use cases for this.
The idea is to load data from one or more data sources in almost real time. We will develop an example workflow on this page.
In this example we will be loading weather data. There are many APIs for this available on the internet. We will be using the one from Open-meteo . Below is the workflow that reads the elevation, temperature, wind speed and wind direction for a place near Valencia, Spain and writes these data to a table in PostgreSql.
Here is the resulting table in Postgres
OK, fine. But this gives us just one row of data. In the next workflow below, we will make KNIME loop to continuously repeat this process. This is controlled by the 'Generic Loop Start' and Variable Condition Loop End nodes. The Wait node creates a 30 second pause between loop iterations in order not to flood the API with calls.
When we let this Workflow run for a few minutes, we will get the following in our Postgres table:
Notice, there is a bug in my workflow, because the timestamp for all the rows is identical. The cause is that the 'Date&Time range Creator' node is only executed once, and not for all iterations of the loop. I solved this by using a Database trigger instead to set the timestamp.
UPDATE: Thanks to feedback from Markus Lauber (mlauber71) I managed to solve the bug without need for the Database trigger. The solution was as simple as adding a Flow Variable connection between the Loop Start and Date&Time Range Creator nodes. Works perfectly. (I have put the updated workflow in my Google Drive folder under the name Get_data_from_web_loop_V2.)
In case you're interested, below is the code for implementing the trigger in PostgreSql:
-- Create a function to set the timestamp on insert
CREATE OR REPLACE FUNCTION set_timestamp()
RETURNS TRIGGER AS $$
BEGIN
NEW.timestamp = NOW();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
-- Create the trigger that fires before each insert on the table
CREATE TRIGGER set_timestamp
BEFORE INSERT ON "OpenMeteo"
FOR EACH ROW
EXECUTE PROCEDURE set_timestamp();
Now, what remains to do is to execute our workflow as a background (batch) process.
The system I am running is UNIX (Linux Mint to be more concrete). What I do is open a Terminal window, navigate to the directory where KNIME is installed, and execute the following command:
./knime -nosave -consoleLog -noexit -nosplash -reset -application org.knime.product.KNIME_BATCH_APPLICATION -workflowDir="/home/marc/knime-workspace/RTBI/Get_data_from_web_loop".
Where workflowDir is the folder where my workflow resides.
On the next page we will see how to create a graph that automatically updates