A wavelet‐nearest neighbor model for short‐term load forecasting

Load forecasts of short lead times ranging from an hour to a day ahead are essential for improving the economic efficiency and reliability of power systems. This paper proposes a hybrid model based on the wavelet transform (WT) and the weighted nearest neighbor (WNN) techniques to predict the day ahead electrical load. The WT is used to decompose the load series into deterministic series and fluctuation series that reflect the changing dynamics of data. The two subseries are then separately forecast using appropriately fitted WNN models. The final forecast is obtained by composing the predicted results of each subseries. The hourly electrical load of California and Spanish energy markets are taken as experimental data and the mean absolute percentage error (MAPE), Weekly MAPE (WMAPE) and Monthly MAPE (MMAPE) are computed to evaluate the forecasting performance of the next‐day load forecasts. The forecasting efficiency of the proposed model is evaluated using db2, db4, db5 and bior 3.1 wavelets. The results demonstrate the forecasting accuracy of the proposed hybrid model.


Introduction
Load forecasting in the current, increasingly liberalized electricity power markets is of crucial importance as a means for producers to optimize and rationalize energy supply [1]. Short-term load forecasts with lead times ranging from an hour to several days ahead are essential for improving the economic efficiency and raising the reliability of power system operations [2]. This importance has led to the development of a wide variety of models/techniques differing in complexity, flexibility and data requirement [3]. Forecasting models for load forecasting can be broadly classified into conventional (statistical), intelligent, and hybrid models. Surveys over models and methods for load forecasting can be found in [4][5][6][7][8]. The conventional models are linear and are known to show some weakness in the presence of special events and nonlinearity [9,10]. Intelligent models designed using artificial intelligence techniques are found to be particularly useful in modeling the uncertain and nonlinear patterns within the load data. Among all intelligent techniques, Artificial Neural Network (ANN) models are the most popular, as they are able to give better performance in dealing with the nonlinear relationships among the input variables [6,11]. However, in spite of the wide-spread use of neural networks, the technique suffers from a number of limitations, including difficulty in determining the optimum network topology and training parameters [12].
The efficiency of these methods is usually dependent on current tuning of their adjustable parameter, for example, the number of hidden models of the nearest neighbor (NN) [13]. This has led to the development of hybrid/combination models that blend different techniques to enhance the performance and overcome the limitations of existing models [3]. The basic idea of combining different models in forecasting is to use each model's unique feature to capture different patterns within the data, thereby enhancing the accuracy of the forecasts [14]. Both theoretical and empirical findings in the literature show that combining different methods is an effective and efficient way to improve forecasts [15]. Hybrid models that combine the wavelet transform (WT) with other techniques have been extensively used for short-term load forecasting (STLF) [16]. The WT is known to provide a sound mathematical technique for designing and deploying filters, which facilitate interpretation, understanding the data and the analysis methodology [17]. WT has been successfully integrated with other techniques such as the Kalman filter [18], Kohonen neural network [19], neural networks [17,[20][21][22][23][24], ARIMA models [16,25], and exponential smoothing [26,27] for STLF. Studies on methods that combine WT with other techniques have revealed that wavelet-based filtering techniques have produced more accurate and acceptable results as compared to nonwavelet methods. The proposed model is a univariate model and the forecast horizon considered is 24 h. For forecasting load, with lead times ranging from an hour to a day ahead, univariate models are frequently considered [12,28,29]. It is further argued that the weather variables tend to change in a smooth fashion over short time frames and this change will be captured in the demand series itself [30], that is, the load series embodies all necessary information to model the underlying generating process.
Recently, a learning technique based on Weighted Nearest Neighbors (WNN) has been applied to forecast the next day hourly energy consumption and hourly energy price, reporting promising results [31,32]. The nearest neighbor method is capable of accounting for both the nonlinearities and the nonstationarity of the given time-series data [33], and WNN can be used to find and weight similar load data to predict the day ahead load. The WNN technique is dependent on two critical parameters, namely the embedding dimension and the optimum number of nearest neighbors for forecast accuracy. The fast-changing load dynamics will influence these two parameters differently from that of the slower changing dynamics. It is observed that if the disturbance "oscillates" faster than the trend, then it is possible to synthesize a digital filter to attenuate the disturbance effect of preserving the local trend. This situation often occurs in practice since the local trend presents a slow varying dynamics when compared to the load disturbance one [34]. The idea of using smoothing filters/techniques to extract the deterministic and stochastic components of power signal is not new [35]. However, WT has some unique features that facilitates the decomposition easily and helps capture systems dynamics at different scales.
In time series prediction, the hybrid models involving WT can help learn fast dynamics and decrease noise fluctuations simultaneously, thereby treating the problem of underfitting/overfitting trade-off [36]. A combination of WT and WNN presented in this paper will help capture the load dynamics at different scales and provide a prediction to the constitutive series of load. An attempt has been made in this paper to separate the faster varying load dynamics (fluctuations) from the slowly varying load data (deterministic) utilizing Discrete Wavelet Transforms (DWT). The wavelet denoising method is used here to decompose load series into deterministic and fluctuation components. The choice of the wavelet function and the decomposition scale are two crucial factors that would help capture the inherent features of load data, and the determination of them is important. Daubechies wavelets of order 2, 4 and 5 and Biorthogonal wavelet (Bior 3.1) have been investigated in this paper for their ability to extract the trend and fluctuation components of load data for subsequent prediction by WNN. The model is tested in the electricity markets of California and Spain to forecast the next day load. The results demonstrate that the method has better feasibility and efficiency than the nonwavelet model.
The paper is organized as follows: The Theory section outlines the features of DWT and the WNN techniques [32]. In the next section the Proposed Methodology is presented. In Numerical Results and Discussion section, the results of applying the proposed model to the energy markets of California and Spain is presented and discussed. Finally, some concluding remarks.

Theory Discrete wavelet transform
The DWT whose main idea is the process of multiresolution analysis [37] is a technique that enables one to make a joint time-frequency analysis of discrete-time signals. It represents a signal in terms of shifted and dilated versions of a scaling function /(t) and wavelet function w(t). The set of functions {/ j,k (t)} j,k2Z (based on the father wavelet /(t)) and {w j,k (t)} j,k2Z (based on the mother wavelet w (t)) is the linear span for all functions from L 2 (R). The wavelet and scaling functions defined by w m;k ðtÞ ¼ 2 m=2 wð2 m t À kÞ and / m;k ðtÞ form an orthonormal and compact support basis. The variables m and k are used to scale and dilate the mother wavelet function to generate wavelets such as Daubechies family [38]. A signal f(t) can then be represented as It is assumed here that the initial resolution level corresponds to m = 0. For any arbitrary value j 0 , the representation is The first term in the expansion gives the general trend of the function and the second sum adds up the accuracy losses as the scale decreases. The totality of the coefficients fc j0;k ; d m;k g m ! j0;k2Z gives the DWT of the signal f(t). The coefficients c m,k and d m,k are obtained using the algorithm [37] through the use of quadrature mirror filters h k , g k related to the wavelet and scaling basis functions by These can be determined recursively by The WT is used here to decompose a time series into a linear combination of different frequencies and it helps quantify the influence of a pattern with a certain feature at a certain time on the load.

WNN technique
The observed hourly data are considered up to day "d" and the prediction of 24 hourly data corresponding to day "d + 1" by WNN technique [32] is elucidated in this section. Let L i 2 R 24 be a vector composed of the 24 hourly data corresponding to an arbitrary day "i" that is, L i = [l 1 , l 2 , . . ., l 24 ]. The associated vector LL i 2 R 24m is the hourly data contained in a window of m consecutive days preceding day "i" where m is a parameter to be determined. Using the Euclidean norm, the distance for any couple of days i and j can be defined as The k nearest neighbors of day "d" using the metric "dist" and based on closeness the neighbor set is formed as N = {q 1 , q 2 , . . ., q k } q 1 and q k are the first and kth neighbor in order of distance.
The prediction is given by The WNN is applied here to the constitutive series obtained after the wavelet decomposition.

Proposed Methodology
The objective of the present work is to simplify the mathematical structure of the load series and separate the deterministic component (trend) from the fluctuating component so that both the subseries can be forecasted with less complicated models. Data denoising is a technique that gives a clear picture of the actual pattern and helps improve the interpretation and forecast of data [39]. Wavelet denoising methods attain near optimal solution, allow discontinuities and spatial variation in the trend and do not require a prior assumption about the trend's structure [40]. This helps the separation of the data into a deterministic smoothed version of the series and a rapidly varying component. Moreover, applying the wavelet filters on the data at each scale transfers the noise characteristics of the data into a set of coefficients whose absolute values are smaller than the rest of the coefficients [39]. By wavelet thresholding, the noisy coefficients can be separated to recover the trend from the data [40]. The basic idea here is to utilize the DWT to decompose, threshold and reconstruct the load series data into constitutive seriesdeterministic and fluctuationwhich are then utilized for forecasting by the WNN technique. The proposed forecast strategy can be summarized in the following step by step algorithm: 1 Decompose the original load signal (X) via the DWT into two, three and four levels of decomposition. The most suitable resolution level is chosen based upon the approximation signal's ability to capture the general pattern of its original. At level k, the load series is decomposed into one approximation series a k and k detail series d 1 , d 2 , . . ., d k . 2 The threshold for each scale is defined adaptively using the Stein's Unbiased Risk Estimate (SURE) [41,42]. The method introduces a threshold t k which minimizes equation (10) for the kth scale.
where n is the length of the wavelet coefficient vector at kth scale and #S is the cardinality of the set S. In case of extreme sparsity of wavelet coefficients in a level, the hybrid threshold is used [42]. 3 After thresholding the coefficients, wavelet reconstruction of the decomposed levels is carried out to obtain the deterministic/trend series (X t ) from the data. 4 The fluctuation series (X f ) is obtained using the relation X f = X À X t . 5 The series X f , X t are used by WNN to find the optimal window length (m) and the optimum number of neighbors (k) of the two series by sliding a window of length (m) days along the training data and minimizing are the forecast values of fluctuation and trend, respectively, for day d + 1 and X d+1 are the actual load values of day d + 1. The optimum values for m, k are then used in equation (9) to obtain the 24 h ahead trend ðX t Þ and fluctuation values ðX f Þ. 6 The next-day load or 24 h ahead load is obtained usinĝ X ¼X t þX f .

Numerical Results and Discussion
The data California was the first U.S. state to restructure its electricity market, which started at the beginning of 1998. This experience with the electricity industry restructuring has earned it a reputation as an incubator of bad policy ideas [43]. The year 2000 was a crucial year in the electricity market of California and is considered in the present work. The development of the forecasting model focusing on databases of one region makes it hard to determine the accuracy of the model and hence, the proposed model is also tested on the load data of the Spain energy market. The Spanish electricity market, started in 1998, relies heavily on a pool where energy is traded through an auction process [32]. Spain joined the European Union in 1986 and adopted the Euro as its currency in 2002 and it is in the year 2002 that Spain has introduced a number of measures to support the use of renewable energy [44]. Therefore, the load data of the year 2002 is also considered in the present study to evaluate the proposed method.  and reconstruct the data at different resolution levels, using orthogonal (db2, db4, db5) and biorthogonal wavelets (Bior 3.1). To build the forecasting model for day-ahead prediction, the information available to the method is the hourly load data of 14 days (2 weeks) previous to the day of the week whose load is to be forecasted. The best resolution level is tested. It has been found that three levels of decomposition is most  appropriate, because the approximation signal at level three has described the general pattern of the load series meaningfully than the others. In addition, the trend and seasonality is revealed in the wavelet decomposition. A four level decomposition of the series is performed and the capability of the WT to identify the seasonal factors of the series is illustrated in Figure 1 through the autocorrelation factor (ACF) of the approximations and details. The seasonality cycle is clearly seen upto three levels of decomposition but is lost in the fourth level of decomposition. The ability of the third level of decomposition to capture the general pattern of load series is further illustrated in Figure 2 using the bior 3.1 wavelet.

Forecasting process
The proposed methodology is applied to the training data and the 24 h ahead forecast is obtained. The 2-week training window for a particular day is shifted 1 day ahead, and the forecasts for the next 24 h are obtained. In this way, the forecast for the entire year is obtained. This has been carried out to illustrate the accuracy of the proposed model across all seasons and days of the whole year. The performance of the model is evaluated using the mean absolute percentage error (MAPE) metric. The MAPE is computed as follows: where l i is the actual load,l i is the predicted load and N is the number of predictions. The two variations of MAPE [45] namely WMAPE (weekly mean absolute percentage error) and MMAPE (monthly mean absolute percentage error) are also used here to evaluate the performance of the proposed model across all months of      Tables 3  and 4.
There is no obvious standard model to which the performance of a proposed load forecasting method could be compared [46]. However, to gain some insight into the performance of the model, the results are compared with the Na€ ıve forecasts (NF) and the direct application of the WNN model [32] on load data. The NF is useful in correctly capturing the shapes of forecasted profiles, but not their scaling [46]. The NF for the loads on day "d" are given by the loads on the day with the same denomination in the week before, d À 7: Lðd; hÞ ¼ Lðd À 7; hÞ; h ¼ 1; 2; . . .; 24:   One day ahead MMAPE for all the months of California market in the year 2000 is given in Table 3. The first column values correspond to the NF values, second column to the nonwavelet application of WNN (DWNN) and the rest of the columns give the measure values corresponding to the proposed model for the wavelets db2, db4, db5 and bior 3.1. From the results, one can observe that bior 3.1 gives the best MMAPE for all the months except March. In the month of March, the data are more skewed and for such a case db5 or db2 gives better results. The corresponding 1 day ahead MMAPE for all the months of the Spain market in the year 2002 is given in Table 4. The wide variations in the climatic conditions of Spain and the significant outliers have resulted in MMAPE values being slightly higher than that of California. Here too, bior 3.1 was able to give the best results for almost all the months of the year. The proposed technique has been evaluated across all the days of the year, irrespective of the seasons and special events/days to determine the adaptability of the wavelet preprocessing techniques to these factors/variations. The results demonstrate that the proposed wavelet-based model is seen to increase the forecast accuracy by 15% to 30% compared to the corresponding nonwavelet model [31,32].
Prediction behavior of the proposed technique in the best forecast month throughout the year is given in Figure 4 for the California market and in Figure 5 for the Spain market. The closeness of the predicted values of the load with the actual values during the entire month is seen in the figures. Analyzing the results represented in Figures 3-5 and Tables 1-4, it can be concluded that the forecast accuracy of the proposed technique is superior to that of the WNN technique. In other words, the forecast accuracy of WNN technique can be considerably improved by wavelet preprocessing.

Conclusion
A new load forecasting model based on WT combined with the WNN technique is presented in this paper. The wavelets considered in this study are db2, db4, db5 and bior 3.1. The analysis of the results obtained indicates that the basic idea of the proposed model in reducing the complex structure of electrical load data by splitting it into deterministic and fluctuation components works well as it has considerably improved the performance of the proposed model over the nonwavelet model. Moreover, the model is seen to perform well, irrespective of the seasons and holidays. The Bior 3.1 wavelet is seen to perform well across all the months of the whole year in the energy markets of California and Spain for the years of 2000 and 2002, respectively. The accuracy of the WNN model is increased by around 25% on integrating WT with WNN model.