Machine Learning

Forecast of Electricity Consumption Introduction Electricity is the only commodity that is produced and consumed simultaneously; therefore, a perfect balance between supply and consumption in the electricity power market must always be maintained. Forecasting electricity consumption is of national interest to any country since electricity is a key source of energy. A reliable forecast of energy consumption, production, and distribution meets the stable and long-term policy. The presence of economies of scale, focus on environmental concerns, regulatory requirements, and a favorable public image, coupled with inflation, rapidly rising energy prices, the emergence of alternative fuels and technologies, changes in life styles, and so on, has generated the need to use modeling techniques which capture the effect of factors such as prices, income, population, technology, and other economic, demographic, policy, and technological variables. Underestimation could lead to under-capacity utilization, which would result in poor quality of service including localized brownouts, or even blackouts. While on the other hand, an overestimation could lead to the authorization of a plant that may not be needed for several years. The requirement is to ensure optimal phasing of investments, a long-term consideration, and rationalizing pricing structures and designing demand-side management programs, to meet the nature of short- or medium-term needs. The forecast further drives various plans and decisions on investment, construction, and conservation.

Getting ready In order to carry out forecasting of electricity consumption, we shall be using a dataset collected on smart meter data with time series aggregated by four located industries.

Step 1 - collecting and describing data The dataset titled DT_4_ind shall be used. The numeric variable is as follows: value The non-numeric variables are as follows: date_time week date type How to do it... Let's get into the details.

Step 2 - exploring data The following packages need to be loaded as a first step to be carried out: > install.packages("feather") > install.packages("data.table") > install.packages("ggplot2") > install.packages("plotly") > install.packages("animation") > library(feather) > library(data.table) > library(ggplot2) > library(plotly) > library(animation) Version info: Code for this page was tested in R version 3.2.2

Let's explore the data and understand the relationships between the variables. Checking whether an object is as.data.table(): Binary columnar serialization for data frames is carried out using feather. In order to share, read, and write data across data analysis languages easily, feather is used. The read_feather() function is used to read feather files. We'll begin by importing the DT_4_ind dataset. We will be saving the data to the AggData data frame: > AggData <- as.data.table(read_feather("d:/DT_4_ind")) Exploring the internal structure of the AggData data frame: The str() function displays the internal structure of the data frame. AggData is passed as an R object to the str() function: > str(AggData) The result is as follows: Printing the AggData frame. The head() function returns the first part of the base data frame. The AggData data frame is passed as an input parameter: > head(AggData) The result is as follows: Plotting the aggregated time series data of electricity consumption by industry.

The ggplot() function declares the input data frame for a graphic and specifies the set of plot aesthetics intended to be common throughout. data = AggData is the dataset to be used for plotting while aes() describes how variables in the data are mapped to visual properties. geom_line() produces the single line that tries to connect all the observations: > ggplot(data = AggData, aes(x = date, y = value)) + + geom_line() + + facet_grid(type ~ ., scales = "free_y") + + theme(panel.border = element_blank(), + panel.background = element_blank(), + panel.grid.minor = element_line(colour = "grey90"), + panel.grid.major = element_line(colour = "green"), + panel.grid.major.x = element_line(colour = "red"), + axis.text = element_text(size = 10), + axis.title = element_text(size = 12, face = "bold"), + strip.text = element_text(size = 9, face = "bold")) + + labs(title = "Electricity Consumption - Industry", x = "Date", y = "Load (kW)") The result is as follows:

It is important to note that the consumption of the industry Food Sales & Storage does not change much during holidays compared to others.


Step 3 - time series - regression analysis The regression model is formulated as follows: Variables (inputs) are of two types of seasonal dummy variables--daily and weekly . is the electricity consumption at the time i, where are the regression coefficients to be estimated. Printing the contents of the AggData data frame: > AggData The result is as follows: Transforming the characters of weekdays to integers: The as.factor() function is used to encode a vector as a factor. The as.integer() function creates the AggData[, week] object of the integer type: > AggData[, week_num := as.integer(as.factor(AggData[, week]))] Printing the contents of the AggData data frame after the change: > AggData

The result is as follows: Extracting unique industry types from the AggData data frame using the following: > n_type <- unique(AggData[, type]) Printing the contents of the data frame n_type after the change: > n_type The result is as follows: Extracting unique dates from the AggData data frame using the following: > n_date <- unique(AggData[, date]) Extracting unique weekdays from the AggData data frame using the following: > n_weekdays <- unique(AggData[, week]) Setting the period value using the following: > period <- 48 Performing regression analysis on a sample dataset.

We extract education (schools) buildings over a period of 2 weeks. The result is stored in the data_reg data frame. n_type[2]represents education buildings and n_date[57:70]denotes a 2-week period: > data_reg <- AggData[(type == n_type[2] & date %in% n_date[57:70])] Printing the contents of the data_reg data frame after the change: > data_reg The result is as follows: Plotting the sample dataset of education (school buildings) over a period of 2 weeks (February 27 to March 12) The ggplot() function declares the input data frame for a graphic and specifies the set of plot aesthetics intended to be common throughout. data_reg is the dataset to be used for plotting while aes() describes how variables in the data are mapped to visual properties. geom_line() produces the single line that tries to connect all the observations: > ggplot(data_reg, aes(date_time, value)) + + geom_line() + + theme(panel.border = element_blank(), + panel.background = element_blank(), + panel.grid.minor = element_line(colour = "grey90"), + panel.grid.major = element_line(colour = "green"), + panel.grid.major.x = element_line(colour = "red"), + axis.text = element_text(size = 10), + axis.title = element_text(size = 12, face = "bold")) + labs(title = "Regression Analysis - Education Buildings", x = "Date", y = "Load (kW)")

The result is as follows: Extracting the number of rows from the data_reg data frame: > N <- nrow(data_reg) Calculating the number of days in the training set: > trainset_window <- N / period Creating independent seasonal dummy variables--daily and weekly . The daily seasonal value is extracted from 1,.....period, 1,.......period for 48 vectors of daily variables. The weekly value is extracted from week_num. The result is then stored in one vector, matrix_train: > matrix_train <- data.table(Load = data_reg[, value], Daily = as.factor(rep(1:period, trainset_window)), Weekly = as.factor(data_reg[, week_num]))

Printing the contents of the matrix_train data frame after the change: > matrix_train The result is as follows: Creating a linear model. The lm() function fits the linear models: Load ~ 0 + . is the formula. Since lm() automatically adds to the linear model intercept, we define it now as 0. data = matrix_train defines the data frame which contains the data: > linear_model_1 <- lm(Load ~ 0 + ., data = matrix_train) Printing the contents of the linear_model_1 data frame after the change: > linear_model_1 The result is as follows:

Producing result summaries of the model linear_model_1: > summary_1 <- summary(linear_model_1) Printing the contents of the summary_1 data frame after the change: > summary_1 The result is as follows: