On the tails of the wind ramp distributions

We analyzed several multiyear wind speed datasets from 4 different geographical locations. The probability density functions of wind ramps from all these sites revealed remarkably similar shapes. The tails of the probability density functions are much heavier than a Gaussian distribution, and they also systematically depend on time increments. Quite interestingly, from a purely statistical standpoint, the characteristics of the extreme ramp-up and ramp-down events are found to be almost identical. With the aid of extreme value theory, we describe several other inherent features of extreme wind ramps in this paper.

Probability density functions of wind ramps ( u) from 4 tall-tower sites (FINO1, Høvsøre, Cabauw, and NWTC). Multiyear, 10-min averaged wind data measured by the topmost sensors on these towers are used here. Further details are provided in Section 2. The wind increment values are normalized by the corresponding standard deviations ( u ). The left and right panels represent time-increments ( ) of 10 and 60 min, respectively. A Gaussian pdf is overlaid (dashed line) as a reference. NWTC, National Wind Technology Center; pdf, probability density function [Colour figure can be viewed at wileyonlinelibrary.com] • Do the pdf tails corresponding to the ramp-up and ramp-down events behave differently?
• How do the tails depend on the height (above ground level)?
• What is the impact of aggregation (filtering) on the tails?
• Can the dependence of tail properties on the sample size be quantified?
To the best of our knowledge, none of these questions have been answered in the literature in a comprehensive manner. We do point out that a handful of studies [10][11][12][13][14][15][16] provided important building blocks for our research. Unfortunately, several of these papers focused on wind gusts, and thus, their findings cannot be very relevant for mesoscale wind ramps. More critically, most of these studies used very limited amount of observational data (often with durations of a few hours to merely a few days) and came up with conflicting results. For example, by analyzing only a few days worth of data, Liu et al 13 concluded that wind ramps follow truncated stable distributions. In a follow-up study, however, Liu and Hu 12 arrived at an opposite conclusion when they made use of a slightly larger dataset. We argue that, in lieu of converged statistics, the results from these past studies cannot be faithfully generalized.
The present study differs from the others in 2 areas. First, it primarily relies on rigorous statistical analyses for pdf characterization instead of qualitative visual inspection. Second, it uses multiyear wind datasets from 4 tall-tower sites: FINO 1 (North Sea), Høvsøre (Denmark), Cabauw (the Netherlands), and National Wind Technology Center (NWTC; USA). Since these sites are quite diverse in nature, we have more confidence in generalizing the outcomes. In the following section, we briefly describe these datasets.

DESCRIPTION OF WIND DATASETS
Over the past decades, the wind energy and boundary layer meteorology communities have invested significant resources in installing and operating a few tall-towers around the world. A multitude of research-grade sensors (eg, high-fidelity cup anemometers) are mounted on these towers at various heights. Owing to their periodic calibration and regular maintenance, the meteorological datasets (including wind speed time series) collected by these sensors are deemed to be of the highest quality. Thus, it is not surprising that these datasets have been heavily used to advance our understanding of the lower part of the atmospheric boundary layer. For example, very recently, Kiliyanpilakkil et al 17,18 conducted rigorous scaling analyses of wind datasets from 3 of these prominent tall-towers: FINO 1, Cabauw, and NWTC. We also leverage on the same datasets, supplemented by measurements from Høvsøre, to demonstrate the statistical characterization of wind ramps. Table 1, along with the following subsections, provides more details into these locations.

FINO 1
It is an offshore platform in the North Sea. 19

Høvsøre
This meteorological tower is situated in a rural area close to the west coast of Jutland, Denmark and played a pivotal role in numerous wind energy studies. 22,23 We analyze 10-min average wind data from 6 levels: 10, 40, 60, 80, 100, and 116 m collected during the years 2005-2015. In this case, each time series consists of approximately 567 000 samples.

Cabauw
The Cabauw Experimental Site for Atmospheric Research tower is located in the western part of the Netherlands. [24][25][26] We use 170 months of

NWTC
We analyze multiyear (2004-2014) wind data from a 80-m tall tower (called M2) located at the foothills of the Colorado Rocky near Boulder, Colorado and maintained by the National Renewable Energy Laboratory NWTC. This location represents complex terrain and is prone to various wind flows and disturbances. 27 The NWTC dataset includes 1-min averaged, cup anemometer-based, wind speed time series from 4 heights: 10, 20, 50, and 80 m. Each time series is made up of approximately 5.78 million points with virtually no data gaps.

METHODOLOGY
To investigate the tail features of the wind ramp events, we have borrowed a well-established methodology, called the Hill plot, 37 from the extreme value (EV) theory. [28][29][30] In this section, we explain this approach in detail by using synthetically generated random variates from 2 heavy-tailed distributions.
By definition, a heavy-tailed distribution (F) satisfies 31 : where F is the so-called complementary cumulative distribution function (ccdf), is a positive constant, and is known as the tail-index (or shape parameter). In principle, can be estimated from the slope, − d log F(x) d log x . 31 However, in practice, the most common approach is to invoke the concept of order statistics. 32 The rank-ordered values (in decreasing order) of x can be written as follows: , it is expected to exhibit the following power-law behavior (aka Zipf law 33 ): Over the years, several estimators for have been proposed in the literature, including (but not limited to) Pickand's estimator, 34 Hill estimator, 35 and the Dekkers-Einmahl-de Haan estimator. 36 In this work, we use the popular Hill estimator ( H ): where k = 1, … , N − 1. When H is plotted against k, it is known as the Hill plot. 37 For EV distributions (eg, Pareto), estimated H is supposed to stabilize with increasing values of k. In Figure 2, we show an illustrative example using generalized Pareto (GP) distributed variates.
The pdf of the GP distribution can be written as follows: where a, b, c are the parameters of GP. By integrating this equation, one can derive the ccdf of GP as follows: Thus, the ccdf of GP is expected to decline as a power-law with tail-index = 1∕c.
In Figure 2 (left panel), the rank-order plots for the GP distribution are shown for 3 values of c. For each case, the sample size is 10 7 and the parameters a and b are assumed to be equal to 1 and 0, respectively. By construction, only positive random variates are generated in this case. The tail indices are determined via the Hill plot in the right panel of Figure 2. Clearly, the H values rapidly stabilize towards 1 c for all the 3 cases, as would be desired. This example attests to the prowess of the Hill plot in estimating the tail indices from a rather simple EV distribution. Next, we investigate the usefulness of the Hill plot using a far more complicated distribution with 2 distinct tail behaviors.
The generalized hyperbolic skew student's t (GHSST) distribution is often used in financial modeling and risk management. 38,39 It has the innate ability to fit pdfs with heavy tails and significant asymmetry. A brief overview of this distribution is provided in Appendix A. A realization of the GHSST variates is shown in the top panel of Figure 3. Large positive values, signifying a heavy right tail, can be readily observed in this plot.
The pdf of the generated GHSST variates is shown in the middle-left panel of Figure 3. For comparison, a Gaussian pdf is overlaid on this plot.
Clearly, both the left and right tails of the GHSST variates are much heavier than the Gaussian pdf. They also show different decaying behaviors as evident in the middle-right panel. For large values of x, the right tail portrays a quasi-linear appearance in this log-log representation. In other words, the right tail is characterized by a power-law distribution, which is in-line with the asymptotic limit discussed in Appendix A. In contrast, the left tail strongly departs from linearity highlighting its mixed-exponential-power-law behavior. The rank-order plot, shown in the bottom-left panel of At this point, we would like to point out that Equation 2 has limited applicability in real-world scenarios. For such cases, this equation should be generalized as follows: where, L is a slowly varying function (eg, exponential). For large x, L(x) may be approximated as a constant .
The tail indices from the GHSST variates are estimated via the Hill plot and shown in the bottom-right panel of Figure 3. These results should be interpreted carefully by taking into consideration Equation 7. In the case of right tail, the H values stabilize rapidly as in the case of GP distributed variates. For the left tail, the H values are significantly higher; also, the stabilization is slightly slower (difficult to detect in this figure without zooming in). In this case, the exponential term modulates the power-law tail. Please note that the estimated H values are very high for the random Gaussian variates and they never stabilize as the tails simply follow exponential behavior. In Figure 4, we document the influence of sample size on H estimation using the GHSST variates. In the case of the right tail (exhibiting power-law behavior), H values are more or less insensitive to sample size (N). In contrast, for the left tail, H keeps increasing as N increases. This nonconvergence essentially corroborates the fact that the left tail of the GHSST pdf does not exhibit a purely power-law behavior; rather, it follows a mixed-exponential-power-law.* In summary, with the aid of randomly generated variates, we have demonstrated that the Hill plots can be very effective in characterizing different types of tail behaviors. It is also computationally very efficient. For these reasons, in the following section, we will invoke this methodology to address the science questions posed in Section 1. The idealized examples shown in Figures 2, 3, and 4 will provide guidance for interpreting the wind ramp characteristics observed within various observational datasets. *In a recent paper, 40 we reported a similar trend for the normal inverse Gaussian distribution, which also possesses mixed-exponential-power-law tails.

RESULTS
Given that the NWTC dataset offers the largest sample size, we select it first for comprehensive analysis. The original granularity of the wind time series is 1 min. Henceforth, we refer to this series as NWTC 1min . To study the effects of aggregation on the tail characteristics, we created a 10-min average series (sample size: approximately 578 000) by simple moving averaging (followed by downsampling) of the NWTC 1min series. This new series will be identified as NWTC 10min . In addition to the results reported in the current section, these series are also used in Appendices B and C to investigate the issues of nonstationarity and correlation.
The ccdf (F) for both the NWTC 1min and NWTC 10min time series are shown in Figure 5. The left and right panels represent = 10 min and = 60 min, respectively. On these plots, F for a Gaussian distribution is also shown for comparison. In this log-log representation, we only focus on the right tail  First of all, both the NWTC 1min and NWTC 10min cases clearly portray non-Gaussian tails. The implication of this heavy-tail behavior is rather crucial for the wind energy community. For example, the exceedance probability of a strong ramp-up event of magnitude 5 u is very small (much less than 10 −6 ) if one assumes Gaussianity. According to observations, however, the exceedance probability is almost 10 −2 . In other words, the assumption of Gaussianity leads to severe underestimation of extreme wind ramp events.
In Figure 1, we reported that the tails of the wind ramp pdfs systematically depend on . Thus, it is not surprising that the same dependence is also evident from the ccdfs. From Figure 5, one can discern that, in comparison with = 10 min, the right tail decays faster in the case of = 60 min.
Later on, we will establish that this trend is actually monotonic in the range of = 10 to 360 min.
According to Figure 5, the agreement between NWTC 1min -and NWTC 10min -based F curves are excellent up to u ≈ 5 u . Beyond that point, the NWTC 1min -based F curve starts to decline faster. With the aid of the Hill plots, we will further investigate if this discrepancy is because of the disparity in sample sizes, or it is an artifact of aggregation.
The Hill plots for NWTC 1min are shown in Figure 6. The values of H are found to be noticeably higher for larger values. In other words, the wind ramp pdf tails decay faster for larger values, which is in-line with our earlier finding. Both the ramp-up (noted as u + ) and ramp-down ( u − ) cases follow similar trends. However, the H curves never fully stabilize for either case in the considered range (1 < k ≤ 3000). Thus, we can deduce that the wind ramp pdf tails do not obey power-laws. Later on, we will explore further if these NWTC data-based results also hold for other datasets. To quantify the effect of sample size on the H values, we adopted a Monte-Carlo-type strategy. From the NWTC 1min time series (sample size 5.78 million), we extract 100 contiguous subsets from random locations. Each subset is called NWTC i sub and contains 578 000 samples. The index i varies from 1 to 100 to demarcate each subset. We then perform Hill plot analysis on each subset separately and compute ensemble statistics. These ensemble Hill plots are shown in Figure 7. The ramp-up and ramp-down cases are shown separately in the left and right panels, respectively.
The overall trends of the H values reported in Figures 6 and 7 are qualitatively very similar. However, the magnitudes of H for the NWTC i sub cases are significantly lower than the NWTC 1 min series. We would like to remind the readers that a similar sample size dependency was reported earlier in the case of the left tail (depicting a mixed-exponential-power-law behavior) of the GHSST pdf (refer to bottom-left panel of Figure 4). Thus, on the basis of the Hill plot analyses and ccdf plots, we can confidently claim that the wind ramp distributions do not exhibit power-law tails.
To further bolster our claim, we have computed H for 3 other locations: FINO 1, Høvsøre, and Cabauw. Wind data from the topmost sensor levels are used. Figures 8, 9, and 10 show the Hill plots for 3 different values of , 10, 60, and 180 min, respectively. In each of these figures, we also included the Hill plot for the NWTC 10min series. Based on these figures, several assertions can be made. First of all, the Hill plots from all the locations look remarkably similar. For (almost) all the cases, the differences between the ramp-up and ramp-down events are marginal. At the same time, for  all plots, the values of H do not stabilize and continue to decrease with increasing k. Thus, we can safely rule out power-law being a viable candidate for wind ramp distributions. In-line with our earlier finding, the H values exhibit dependence on . Thus, stable distributions 32 should not be used to characterize wind ramp distributions.
Earlier, we have concluded that the H values strongly depend on sample size. Now, in order to probe the impact of aggregation, we compare the H values for the NWTC 10min series (bottom-right panels of Figures 8, 9, and 10) against the corresponding values from the NWTC i sub series (reported in Figure 7). Please note that these series have identical sample size; albeit, they have different granularity. From the plots, it is quite evident that the H values from the NWTC 10min and NWTC i sub series are very much comparable. Thus, within the limited filtering range of 1 to 10 min, the aggregation effect is negligible. However, one should not downplay the effects of sample size.
Thus far, we have only focused on wind data from the topmost sensors of all the 4 meteorological towers. It would be interesting to find out if/how the H values depend on sensor height. Instead of plotting several individual Hill plots, we opt for plotting averaged H values so as to report all the results succinctly in Figures 11 and 12. The values of ⟨ H ⟩ are computed for 2000 ≤ k ≤ 3000. We intentionally (and incorrectly) assume that over this range the values of H have fully stabilized. Despite this ad hoc assumption, the results are quite revealing. For both the ramp-up and ramp-down cases, ⟨ H ⟩ increase monotonically from approximately 4 (at = 10 min) to approximately 6 − 7 (at = 360 min).
The diversity in ⟨ H ⟩ values across various heights is rather small (especially for < 180 min). Among all the locations, the spread of ⟨ H ⟩ is the most significant at Cabauw; the ⟨ H ⟩ curves from the top two sensors, located at heights of 140 and 200 m, seem to branch out from others for > 180 min. We speculate that certain meteorological processes (eg, low-level jets) influence the wind ramp statistics at higher altitudes. However, we need more observational datasets from higher altitudes (possibly collected by lidars and/or sodars) to shed further light on this intriguing finding.

CONCLUSIONS
In this study, we analyzed several long-term wind speed datasets composed of 4 different geographical locations, from offshore to complex terrain.
We showed that the wind ramp pdfs from all the sites reveal amazingly similar shape characteristics. Most interestingly, the tails of the wind ramp pdfs are much heavier than Gaussian and decay faster as time increments increase. With the aid of the Hill plots, we showed that the extreme ramp-up and ramp-down events behave similarly from a statistical point of view. Moreover, the tail-index statistics exhibited minimal dependence with respect to height above the ground.
Another important aspect of these results showed that the tails of the wind ramp distributions do not follow a power-law distribution, rather modulated by a slowly varying function. We speculate this function to be an exponential. Therefore, in future work, we will use several types of pdfs from the generalized hyperbolic distribution family (eg, GHSST, normal inverse Gaussian) to determine the ideal candidate for capturing the tail characteristics of wind ramp distributions.
As the wind energy industry continues to flourish, and wind ramp prediction becomes increasingly important, the results from this study should be used for model validations and improvement. It would be critical to find out if the state-of-the-art numerical weather prediction models and time-series forecasting tools are able to capture the extreme ramp behaviors accurately. It is also envisaged that the contemporary synthetic wind speed generators (eg, D' Amico et al 41 and Negra et al 42 ), heavily relying on statistical information, will tremendously benefit from our findings.