Research on information security of users’ electricity data including electric vehicle based on elliptic curve encryption

In the smart grid and big data environment, accurate and large amount of power load data for users can be obtained with the wide application of non-intrusive load monitoring technology. In the research process of customers’ information, information security protection of users’ electricity data has become a research hotspot urgently. This article proposes a new type of load decomposition method for electric vehicle load information and compares it with hidden Markov model algorithm to verify its accuracy. On this basis, the elliptic curve encryption algorithm is used to encrypt the users’ electricity data, and the function and effectiveness of the encryption algorithm are verified by comparing the load decomposition of the electric vehicle with the unencrypted data.


Introduction
With the dramatic development of smart grid and electric vehicle, machine learning methods have been widely used for information security of smart grid in various fields 1,2 such as non-intrusive load monitoring technology, 3 information inference attack and preserving, 4,5 and Internet of Things technology. 6 On the aspect of researches on users' electricity load identifications, it is mainly focused on the identifications of users' load equipment. [7][8][9] Yet, it is precisely in the smart grid and big data environment that users' electricity data face serious information security threat. In the process of big data information being researched, the privacy protection of information is also a major problem to be solved. [10][11][12] Previous studies [13][14][15] proposed k-anonymity and its improved algorithm to generalize and compress the sensitive data to achieve the purpose with hidden  16 proposed blind signature, anonymous processing, and Paillier homomorphic encryption to realize the anonymous data transmission and to protect users' electricity data from being leaked.
In this article, a new type of load decomposition is applied for the process of load decomposition, including electric vehicles' load signals, and compared with hidden Markov model (HMM) algorithm to prove its superiority. At the same time, the security of the users' electricity data is considered during the transmission. The elliptic curve encryption (ECC) algorithm is used to encrypt the users' electricity data, and the new load decomposition method mentioned above is used to decompose the loads of electric vehicles. The feasibility of the ECC algorithm is verified by the decomposition results of encrypted and unencrypted data.

New load decomposition algorithm
The new load decomposition algorithm is a method of separating the electric vehicles' charging signals from the gathered house electrical power signals. This method can effectively alleviate the interference from household appliances except electric vehicles (especially from air-conditioning (AC) system). In this way, electric vehicles' charging signal detection and electric power estimation can be accurately realized under the interference from the power signals of AC system.

New load decomposition algorithm implementation steps
Threshold processing of aggregated signals. For a given aggregated signal, the threshold signal is first applied to the aggregated signal for threshold processing, the portions of which having an amplitude less than the specified threshold are deleted, and the waveforms greater than the threshold portions are retained. The following is the threshold processing formula here, l ¼ D max 2:5, jrefers to the sampling numbers that are more than 2 kW and the symbol ¼ D represents equal by definition.
Filter AC pulse sequence. After the aggregated signal is processed by threshold, the AC pulse sequence still needs to be removed. Assuming that the duration of AC pulse sequence does not exceed minutes, the sequences that pulse sequences are not exceeding minutes are marked as ''segments to be deleted.'' One of the ''segments to be deleted'' is taken as initial data, and the adjacent ''to-be-deleted segments'' are analyzed and checked whether it meets the following two conditions where T filter indicates the duration of a ''to-be-deleted segment'' adjacent to the initial ''deleted segment,''D cur indicates the duration of the initial ''deleted segment,''h represents the duration extension parameter, D is defined as the parameter value with the influence of D cur and h, and G ap indicates the distance between the initial ''to-be-deleted segment'' and the adjacent ''to-bedeleted segment.'' The adjacent ''to-be-deleted segment'' that satisfies the above two conditions will be marked as a new initial ''to-be-deleted segment.'' The above filter operation is repeated again. If the search criteria do not meet the above requirements of the filter, then the next pulse sequence segment will be initially checked. All periods of time marked as ''segments to be deleted'' are deleted from x(t) until the search for the entire aggregated signal is completed. In order to prevent from accidentally deleting a waveform segment with a long duration (this waveform segment generally is a power signal containing electric appliance such as electric cars, dryers, and ovens), a threshold time T 1 can be set so that the duration of all deleted segments does not exceed the specified threshold T 1 .
Removal of residual interference. There are many residual interference in the aggregated signal, which refer to the fluctuations in the power signals of electrical appliances, the loss of the power line, and the hybrid error of low-amplitude electrical appliances' power signals. After the threshold processing, the position information of each divided waveform segment can be obtained, and the residual interference amplitude around each segment can be estimated. The amplitude of the interference of each segment is estimated by taking the average value between the minimum value before segmentation of point A and the minimum value after segmentation of point B.
For each waveform segment, we can set two reference points to distinguish them. The minimum value of A is set as a reference before each segment, and the minimum value of B is set as a reference point after each segment. We can estimate the interference amplitude of each segment by averaging the minimum values of these two reference points. The residual interference amplitude can be removed by subtracting the segment from other related local interference.
Classification of segmented signals. After the above processing, the resulting aggregated signal leaves only a few signals which can be divided into the following three categories.
First, the cumulative counting function is defined where H(t) represents the aggregated signals after the first three steps of processing and r represents a threshold value of maximum amplitude value from 0 to H(t).
The function of equation H(t).r is to calculate the sampling points whose amplitude values are greater than r in H(t). Second, the gradient function g(r) of the cumulative function f (r) is calculated, and the type of the segmented signal can be initially judged based on the number of peaks of the gradient function. The segmented signal belongs to the third type when two gradient peaks appear in the gradient function. The segmented signal belongs to the second type when the gradient function exhibits a prominent peak. The segmented signal in the first type does not show a prominent peak.
Third, a normalized gradient function is defined if the gradient function has two segmented signals above two prominent peaks Then, calculate the areas, S n and S q , under the curve g n . If And the segmented waveform is classified as the first type of waveform; otherwise, it is classified as the third type of waveform.
Reconstruction of various types of electric vehicle charging load waveform. After classifying the segmented waveform in the first four steps, it is necessary to take one more step to extract the electric vehicle charging load waveform in the sub-class of Figure 1. The effective width W valid of each waveform segment is hereby defined as the width (time) at the bottom of this waveform segment. However, the effective height H valid is defined as the height when the waveform segment width is only 80% of the bottom width.
Neutron class identification of three types and the processing of electric vehicle charging load waveform reconstruction are shown in Figure 2.
When the electric vehicle charging load overlaps with other equipment loads, the electric vehicle charging load waveform needs to be restructured. Considering that the electric vehicle waveform has a characteristic of constancy and amplitude stability, the height of the electric vehicle waveform can be estimated on the same day or some other time in a single day. Therefore, this article uses the actual height of the electric vehicle charging load and calculates the effective width of the segmented signal to reconstruct the square waveform of the electric vehicle.

Evaluation index for new load decomposition algorithm effect
In order to evaluate the efficacy of the decomposition algorithm, three evaluation indicators are defined, as follows: 1. Average estimated error percentage ER 1 of monthly electricity usage Figure 1. Three types of segmentation signals.
2. Average estimated error ER 2 of monthly electricity consumption 3. Mean square error (MSE) of electric vehicle charging load signals where E i, j t indicates the actual charging loads of the electric vehicle in the jth month of the ith year, E i, j e indicates the estimated charging loads of the electric vehicle in the jth month of the ith year, and N is the total months.

ECC algorithm
ECC algorithm is an asymmetric cryptosystem based on the elliptic curve discrete logarithm problem. It has strong unit data security and it is widely used due to the advantage of its short key. Based on the finite field arithmetic operation, ECC algorithm is easy to complete using computer operation. 17,18 Prime field elliptic curve Set p as a prime number, and the entire remainder set f0, 1, . . . , (p À 1)g of modulo p with respect to modulo p constitutes a finite field F P of a p-order prime number. Prime field F P can define an elliptic curve where a, b 2 F p , (4a 3 + 27b 2 ) mod (p) 6 ¼ 0, x, y 2 F p . The pair (x, y) is a point on formula (10), and an infinity point ' of additional regulation is also a point on equation (10). In that way, the set E P (F P ) of points on the curve determined by the elliptic curve in equation (10) on the prime field F P is If there is a point P on the elliptic curve and there exists a minimum positive integer n 0 , which make multiplication n 0 P = O ' , then n 0 will be called as the order of P, and if n 0 does not exist, then P is of infinite order.

ECC encryption and decryption principle
An elliptic curve on F P is usually described as Tm = (p, a, b, G, n, h). An elliptic curve can be determined by p, a, and b; the base point is G; the order of point G is n; m represents the number of all points on the elliptic curve; and h is the integer part of a quotient of m/n. Take the electric power user DJ using elliptic curve encryption to send the household load information in plain text X to the electric power company DM as an example. 19,20 The design flow is shown in Figure 3.

Case analysis
This section is based on the aggregated power signal data from Hickory Street Database of the Americas someday. Through the new load decomposition algorithm mentioned above, the process of decomposing the charging load waveform of the electric vehicle for 1 day is described in detail and compared with the HMM method proposed in Zheng et al. 21 The accuracy is verified.

Implementation of new load decomposition algorithm
Threshold processing of aggregated signals. The threshold l = 2500 is set. The effect of the initial threshold processing of the aggregated power signal of a certain day, according to formula (1), is shown in Figure 4.
By comparing the aggregated signals after the threshold processing with the signals before the threshold processing, the bottom burr waveform is sharpened and appears flat.
Filter AC pulse sequence. It can be seen from the aggregated signal after the threshold processing that there are still many AC pulse sequences in the signal which need to be filtered out.
Set the parameters T = 10, h = 1, T 1 = 90, and set the filter according to formula (2), formula (3), and T 1 anti-false filter parameters. The filtering effect is shown in Figure 5.
After filtering the AC pulse sequence, the aggregated signal has retained several segmented waveforms (there are two segmented waveforms in Figure 4). The filtering effect is significant.
Removal of residual interference. Set the position of A = 5 and B = 5 as reference points and remove the interference amplitude using the above methods of removing residual interference. The effect after removing the residual interference is shown in Figure 6.
After that, the signal for removing the residual interference can redo the threshold process and filter the AC pulse sequence process. Now move to the next step.
Classification of segmented signals. As it can be seen from Figure 5, the remaining two segmented waveforms need  to be classified after the threshold process and filtering process.
According to equation (4), the cumulative counting functions, f 1 (r) and f 2 (r), corresponding to the two segmented waveforms, x 1 (t) and x 2 (t), and their corresponding gradient functions, g 1 (r) and g 2 (r), are solved. The curves are shown in Figures 7 and 8, respectively.
It can be seen from waveform g 1 (r) of Figure 7 that there is a peak value. For this reason, this segment can be classified as the second type. There are two peak values that can be seen from waveform g 2 (r) of Figure 8. Accordingly, this segment can be classified as the third type. Convert g 2 of Figure 7 to a standardized gradient function based on equation (5) and calculate  S n = 30:65% 3 S q \35%S q based on equation (6). In the same way, the second segment can be judged as the third type.
Reconstruction of various types of electric vehicle charging load waveform. The segmented waveform of Figure 7 is the second type of waveform. Hence, the effective height H validÀ1 and the effective width W validÀ1 of the waveform of Figure 9 are calculated first.
The segmented waveform of Figure 9 is the third type of waveform. First, determine whether the electric vehicle charging load waveform is on the top or bottom of the segmented waveform.
Set the threshold l 2 = l + 3000 = 5500, and the effective time width calculated is 257 min, less than 300 min. Consequently, the sub-segment waveform needs to be analyzed in the next step. Figure 10 is a sub-segment waveform where the segmented waveform is filtered again. Figure 11 is a waveform where the segmented sub-segment waveform is subjected to a capacity filling process.
The filtering does not delete the sub-segment waveform. Therefore, the top of the sub-segment waveform is the electric vehicle charging load. Then, the effective height and effective width of the top of the sub-segment waveform in Figure 11 are also obtained using the way shown in Figure 9. Eventually, the electric vehicle charging load waveform in the aggregated signal is reconstructed, as shown in Figure 12.
Based on the aggregated signal, three decomposition effect evaluation indexes are calculated respectively

Comparisons between new load decomposition algorithm and HMM algorithm
The new load decomposition algorithm proposed in this article and the HMM algorithm proposed in Zeadally et al. 22 are used under the same conditions. Users' electric vehicle load waveforms of Hickory Street Database of the Americas are decomposed. The data in Table 1 can be obtained.
As shown in Table 1, the average estimated errors of monthly electricity consumption are compared. The errors of the new algorithm are smaller than the errors of the HMM algorithm. As for the average estimation error of electric energy consumption of 8.44% of the new algorithm, the estimation error of HMM algorithm which is up to 64.4% is undoubtedly not suitable for energy decomposition with electric vehicle charging signals. Therefore, the new algorithm is better than the HMM algorithm.

ECC-based encrypted signal decomposition
The ECC algorithm is used to encrypt the aggregated signal of a certain day. [21][22][23][24] Parameter setting of ECC algorithm is calculated as follows: 1. y 2 = x 3 + ax + b ( mod p) in equation (10) is selected as the encrypted elliptic curve; 2. With regard to the determination of parameter p, the larger the p-value, the safer it will be. But   it will also affect the calculation speed. In order to meet the security requirements, a prime number of about 200 bits is generally selected; 3. With regard to the determination of parameters a and b, a can be a positive integer randomly generated which is less than p À 1. After the determination of a depending on the condition in equation (13), we have A positive integer b less than p À 1 is randomly generated again.
4. About the determination of the base point, the determination of curve y 2 = x 3 + ax + b ( mod p) is based on the determination of parameters a, b, and p. The x coordinate of the base point G can be an arbitrary integer from 0 to p À 1. When the x coordinate of the base point G is determined, the y coordinate of the base point G can be obtained by the inverse operation of y 2 = x 3 + ax + b ( mod p); 5. For the determination of the private key k, randomly select a prime number from 0 to p À 1 as the private key k; 6. With regard to the determination of the public key K, multiply k by the base point G to get the public key K, which is K = kG.
The process of decomposing the charging load waveform of the electric vehicle is described as shown above. After that, stealers can only obtain the encrypted signal as shown in Figure 13 at most.
Compared with the encrypted signals and unencrypted signals, the encrypted one expands to an order of magnitude with violent vibration frequency. Afterwards, the new decomposition algorithm is used to decompose the encrypted signal to verify whether it can also decompose the electric vehicle load waveform or not and prove its encryption security. The decomposition results are shown in Figure 14.
As it can be seen from Figure 14, the load waveform of the electric vehicle with the signals encrypted cannot be decomposed by the new decomposition method, which ensures the security of signals during the transmission.

Conclusion
This article proposes a new type of load decomposition algorithm especially for electric vehicle users. By comparing with the HMM algorithm, the superiority of the new load decomposition algorithm is verified from three evaluation indicators, ER 1 , ER 2 , MSE. Based on this, an ECC-based data encryption scheme is put forward and the proposed new load decomposition algorithm is applied to decompose the encrypted user power aggregation signal. The experimental result shows that the new load decomposition algorithm cannot decompose the electric vehicle load waveform in the encrypted aggregate signal, verifying the security on the basis of ECC data encryption scheme. In the future, a growing number of terminals will communicate with electric vehicles. [25][26][27][28] Researches in the field of data security and privacy protection will be increasingly emphasized. This article will find other ways of data aggregation encryption for further exploration and propose communication encryption for owners of electric vehicles.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the Natural Science