Functional clustering analysis of Chinese provincial wind power generation

China is a broad territory country. There are significant differences in the terrain, climate, and other environmental factors between different provinces, which affect wind power generation. In order to better analyze the situation of wind power generation in Chinese provinces, this paper uses the functional clustering analysis to classify the monthly data of wind power generation in 30 Chinese provinces from March 2013 to October 2019. The empirical results of this paper show that the wind energy generation in Chinese provinces can be divided into three categories, and the results are consistent with the actual situation. In this paper, functional clustering analysis is used to analyze monthly data, compared with the traditional clustering analysis to analyze annual data which are obtained by accumulated monthly data. Higher-dimensional data can be used for analysis to reduce information loss. Moreover, data can be viewed as functions, and more information can be mined by analyzing derivative functions, and so on. The analysis of wind energy generation has certain guiding significance for the development and utilization of renewable energy.


Introduction
With the growing size of the world economy, world energy consumption continues to increase. Meeting the challenges to economic development and environmental problems, the realization of sustainable economic development is getting more and more common analysis, while the latter focuses on applied research. At present, many traditional statistical analysis methods have been improved and extended to the FDA framework.
Cluster analysis is an important method in statistical research. It is an effective method to simplify the data structure by unsupervised classification without prior knowledge and just based on the characteristics of data. With so much data being collected, methods to identify the same set of protons in the data are increasingly needed. The purpose of cluster analysis is to identify homogeneous data groups without using any prior knowledge of data group labels. Functional clustering analysis combines FDA with clustering analysis. Functional clustering analysis divides functional objects into multiple classes, so that the objects in the class have a similar curve change pattern, and the objects among the classes have different curve change patterns. The method is used to explore the potential class structure in the functional data set. In recent years, functional clustering analysis has developed rapidly. Abraham et al. (2003) used the B-spline basis function to reconstruct functional data and then performed k-mean clustering analysis on the coefficients of the basis function. The constructed mixed models are based on mixed effects, which are therefore suitable for sparse function data clustering analysis (James and Sugar, 2003). Compared with traditional clustering method, functional clustering method can analyze high dimension data, which can use more information about the data and reduce the information loss, so that the cluster results are more reliable than traditional method results.
This paper uses the method of functional clustering analysis to study the wind power generation in different Chinese provinces and classifies the wind power generation situation in China. FDA can use high frequency data to analyze, compared to the traditional way to analyze aggregation data, which can reduce information loss. On the other hand, FDA viewed the data as a function, which can calculate the derived function, so that we can analyze the change rate of the data and get more useful information from the data. Based on the above reasons, we say that we use FDA method which can more effectively analyze the wind power generation situation in different Chinese provinces, and provide suggestions for the development and utilization of wind power resources in China.

Functional principal component analysis (FPCA)
By selecting the appropriate basis function system and smoothing coefficient, the discrete points are transformed into a functional data object. When analyzing functional data object, FPCA should be first considered to reduce the infinite dimensions. FPCA transforms the infinite functional data into the finite functional principal scores by keeping the information as much as possible.
We view a random curve XðtÞ as a random element of separable Hilbert space H ¼ L 2 ½0; T satisfying jjXðtÞjj 2 ¼ hXðtÞ; XðtÞi ¼ R I X 2 ðtÞdt < 1. In applications, we observe a sample consisting of N curves X i ðtÞ(i ¼ 1; 2; . . . n, t 2 ½0; T). We view each curve as a realization of a random function XðtÞ. We define the mean function lðtÞ ¼ E½XðtÞ and the covariance function Gðs; tÞ ¼ cov½XðsÞ; XðtÞ.
According to the Mercer lemma, the covariance function can be decomposed into Gðs; tÞ ¼ where / k ðtÞ is the kth orthogonal eigenfunction (principal component function) and k k is the corresponding kth eigenvalues k 1 ! k 2 ! . . . ! 0. Based on the Kathunen-Lo eve theorem, the stochastic process can be decomposed into the mean function and the summation of products between the principal component function and the principal component score where lðtÞ is the mean function and b ik is the kth principal component score of the ith curve, which is the projection of ½X i ðtÞ À lðtÞ onto the kth eigenfunction. Use the first K principal components to approximate the information of the original data where eðtÞ is the error term that cannot be explained by the first K principal components. In this paper, the cumulative contribution rate is adopted to select the best principal component number K. This paper extracted the first K principal components with a cumulative contribution rate exceeding 95%.

Functional clustering method
Based on the FPCA, the infinite data are truncated to finite functional principal component scores, which realize the dimension reduction. The functional clustering method is performing the traditional clustering analysis on the functional principal component scores.
Cluster analysis is an important data mining technology. Cluster analysis can classify data according to data characteristics without knowing the data classification before. The basic idea of cluster analysis is to construct a matrix of independent variables, classify the individuals with similar properties into the same category, and classify the individuals with large different property into different categories. So that after classifying, the individuals within the category have higher homogeneity and the individuals among different types have higher heterogeneity. At present, there are many mature clustering analysis methods, such as density based method, hierarchy based method, partition based method, grid based method, model based method, and so on.
This paper uses the typical k-means approach. K-means algorithm is a clustering analysis algorithm that is solved through iteration. The specific steps are as follows: (i) Input the initial data set and specify the number of divided clusters k; (ii) Arbitrary selection of k data object points as the initial clustering center; (iii) Assign data objects to the most similar cluster according to the average value of objects in the cluster; (iv) Update cluster mean value; (v) Calculate the clustering criterion function SSE; (vi) Repeat steps (iii)-(v) until the SSE value of the criterion function does not change any more; (vii) Output k clusters that satisfy the convergence of the square error criterion function.
There are several ways of defining distance. In this paper, Euclidean distance is selected to measure the distance, and the sum of square error is used as the objective function of clustering where K indicates the center of the clustering, c i is the ith center, and dist denotes the Euclidean distance.
By minimizing the objective function SSE, we can get To sum up, it can be concluded that the optimal center is the minimized SSE, which is also the cluster mean value.

Empirical study
Cluster analysis is helpful to understand the characteristics of variables and identify their types, so as to learn more useful information. This paper selects the monthly data of wind power generation in Chinese provinces level data, which is from March 2013 to October 2019, by using the functional cluster analysis to analyze the main characteristics of wind power generation in China.
The cumulative annual value of wind power generation in each Chinese province from 2014 to 2018 is shown in Table 1 and the gross number is shown in Figure 1. In terms of the total amount, China's wind power generation is increasing yearly and growing rapidly. The total wind power generation in 2017 is about twice that of 2014. And the generation in Sichuan province increases most rapidly, the generation in 2018 is nearly 22 times more than in 2014.
The distribution map of wind power generation in each Chinese province in 2014, 2016, and 2018 is shown in the Figures 2 to 4, respectively. It can be seen that the wind power generation in China has the following characteristics: first, the wind power generation in Inner Mongolia has always been the largest in China. Its unique geographical features make it become the most important wind power generation region. Second, Ningxia, Hebei, Gansu, Xinjiang, Heilongjiang, Jilin, Liaoning, and other provinces have large amount of wind power generation, which is geographically concentrated in northern China. Third, east coast area is another significant wind power generation region. But compared with the rapid development of wind power generation in other provinces, the generation of east coast does not have evident increase. What is more, Yunnan which sites in southwest China has seen a rapid increase in wind power generation in recent years, making it the largest wind power generation province in the south China.
To better analyze monthly wind power generation data and reduce information loss caused by the annual data that obtained via accumulating the monthly data. This paper analyzes monthly wind power generation data of 30 Chinese provinces from March 2013 to October 2019 from the perspective of FDA. First, the monthly data of wind power generation are fitted with B-spline basis function system, and the monthly discrete data points are transformed into a functional data object.    The functional data objects of wind power generation monthly data which contain 30 lines are shown in Figure 5. Take the first derivative of the functional data object to obtain the first derivative function data object, which is illustrated in Figure 6. When we analyze the data in the view of function, we see the data generation process behind the data is a function. And we can use the tool of derivation to analyze the function change rate and get more information behind the data. From Figures 5 and 6, we can conclude the following results. First, although there are differences in the amount of wind power generation in different provinces, the wind power generation presents the same cyclical changes. Wind, as a climate element, which is linked to seasonal change, shows the annual change character. Second, wind power generation in most Chinese provinces is roughly the same, except for a few provinces that have large wind power generation, most provincial wind power generation is similar (the curves of wind power generation function are concentrated at the bottom of the figure). Third, with the passage of time, the wind power generation in all provinces shows a continuous and turbulent growth trend and the growth rate is accelerating. It shows that the utilization of wind energy resources in all Chinese provinces is increasing. For further analysis of functional data object, it is necessary to reduce the dimension of functional data objects. The main tool is to use FPCA. FPCA was performed on the monthly wind power generation functional data object. Two functional principal component functions are selected here, which explain the variation of 94.38 and 3.16% (cumulative explanation over 95%), respectively. The two functional principal component functions are shown in Figure 7. The first functional principal component function represents the overall wind power generation trend. We can see that the curve is constantly fluctuating and showing an overall upward trend. The second functional principal component function represents the adjustment of local wind power generation, showing a downward trend of oscillation. Accordingly, the scores of the first two functional principal components corresponding to each province were calculated, as shown in Table 2.
Functional clustering analysis uses the clustering analysis on the functional principal component scores. The results are shown in Table 3. According to the results of cluster analysis, only Inner Mongolia belongs to the first category. The provinces in category 2 are Hebei, Shanxi, Liaoning, Jiangsu, Shandong, Yunnan, Gansu, Xinjiang, and Ningxia. The remaining provinces fall into category 3. The results are similar to the result of the wind power generation distribution map. The monthly wind power generation in category 1 is greater than that in category 2, and that in category 2 is greater than that in category 3. The first two types of provinces are the main regions of wind power generation in China, mainly located in northern China. We can see that northern China is the main area of wind power generation in China. Topographically, these provinces are mainly distributed in the plateau region, which is rich in wind resources and has great potential for wind power generation development. Based on the FDA, we can see that the monthly wind power generation has a periodic character, which indicates that the monthly wind power generation is related to the climate and weather changes. And with the wind power generation has increased in recent years, the periodic changes become more obvious. It shows that in the wind power abundant month, the wind power generation potential gets more developed.   For comparison, k-means clustering analysis was performed with annual data. The results are shown in Table 4. By comparing the results of the functional clustering analysis with the results of traditional clustering analysis in annual data, it can be found that the two results are consistent. This shows that the results of functional clustering analysis are reliable. The multi-dimensional data can be used in the functional clustering analysis. Compared with the accumulated annual data, using functional clustering analysis to analyze the monthly data has a lot of advantages, such as the information leakage is reduced, and the results are more intuitive and easier to understand.

Conclusion
Energy plays an important role in country's economic development. As a major energy consumer country in the world, China's energy reserves are of great significance to China's economic development. Wind energy is an important renewable energy source; it can effectively alleviate an energy shortage, reduce the damage to the environment, and contribute to the realization of sustainable economic development. China is shifting from high-speed growth to high-quality development, paying greater attention to the quality of growth and the balance between economy and environment. As an important new energy, wind power generation can effectively get rid of excessive dependence on fossil fuel, alleviate an energy shortage, and improve the ecological environment. It is of profound significance to vigorously develop clean energy such as wind energy and adopt advanced science and technology to improve energy production and energy efficiency.
In this paper, the monthly data of wind power generation in 30 provinces of China from March 2013 to October 2019 were analyzed by using functional cluster analysis. The conclusions are as follows. First, the overall growth trend of wind power generation in China is accelerating. Second, in geographically, wind power generation in the north of China is greater than in the south of China. But in recent years, wind power generation in southern provinces such as Yunnan has soared. Third, from the perspective of monthly data, wind power generation is cyclical and related to seasonal climate. In the future development of wind energy resources, regional and climatic factors should be taken into account and advanced technology should be adopted to increase the output of wind power generation. Fourth, the results of the functional clustering analysis of Chinese wind power generation monthly data and the traditional clustering analysis of Chinese wind power generation annual data are consistent. Compared with the traditional cluster analysis method, the functional cluster analysis can use to analyze high frequency data, which reduce information loss and make the results more intuitive and easier to understand. At the same time, when analyzing data from the perspective of functional data view, the data generation process behind the data is regarded as a function, which can be used to calculate the derivative function so as to mine more information. Such as, when we calculate the derivative function, we find out that monthly wind power generation data have the tendency of accelerating and cyclical. But if we are simply summing the monthly data to get the yearly data, we lost the information of the change between different months and cannot get the result. The functional clustering analysis uses more information and the result is more reliable.
Based on the empirical results of this paper, we give some suggestions for Chinese provincial wind power generation development in the future. First, northern China and plateau regions have abundant wind power resources, further developing wind power resources of these regions can increase the wind power generations. Second, Chinese provincial monthly wind power generation has a periodic character which is related to the climate and weather changes, so that if we take the weather changes factor into consideration we can get the potential generation.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.