The algorithm of nighttime pedestrian detection in intelligent surveillance for renewable energy power stations

Intelligent surveillance is an important management method for the construction and operation of power stations such as wind power and solar power. The identification and detection of equipment, facilities, personnel, and behaviors of personnel are the key technology for the ubiquitous electricity The Internet of Things. This paper proposes a video solution based on support vector machine and histogram of oriented gradient (HOG) methods for pedestrian safety problems that are common in night driving. First, a series of image preprocessing methods are used to optimize night images and detect lane lines. Second, an image is divided into intelligent regions to be adapted to different road environments. Finally, the HOG and support vector machine methods are used to optimize the pedestrian image on a Linux system, which reduces the number of false alarms in pedestrian detection and the workload of the pedestrian detection algorithm. The test results show that the system can successfully detect pedestrians at night. With image preprocessing optimization, the correct rate of nighttime pedestrian detection can be significantly improved, and the correct rate of detection can reach 92.4%. After the division area is optimized, the number of false alarms decreases significantly, and the average frame rate of the optimized video reaches 28 frames per second.


Introduction
The development of renewable energy to reduce reliance on traditional thermal, coal, or nuclear power generation methods has been the goal of governments in recent years (Johansson, 1975). China as the world's most important country for wind power and solar power, had the world's largest wind (169 GW) and solar (77.7 GW) capacities along with highly competitive wind and solar photovoltaic (PV) manufacturing. Wind power and solar power facilities are widely distributed in deserted or sparsely populated areas, where external communication conditions are poor, people, materials, and machines are dynamically concentrated, lead to a lot of safety risks at the construction and operation. It affects the normal operation of the power plant and causes serious losses to the enterprise (Aki et al., 2016;Schmarzo, 2015;Zeid and Davis, 2014). Many experiences show: regulatory difficulties are one of the key factors causing problems in the management of wind and solar companies. Intelligent monitoring system is the main technical means of the current power plant supervision. It automatically provides intelligent warning through the built-in computer vision algorithm, reduces personnel participation, and greatly improves the efficiency of monitoring system (Chen et al., 2017(Chen et al., , 2018Choudhury et al., 2017). This paper takes the typical application demand of nocturnal pedestrian detection as the research object, which is of high economic value and practical significance (Han et al., 2019;Pease, 2015).
Deep learning technology is an effective method to improve the performance of pedestrian detection algorithms . This kind of method is based on the statistical rules of a large number of pedestrian and non-pedestrian samples in the appearance mode and mainly includes two technical aspects: feature extraction and a classifier design. Currently, pedestrian feature descriptors applied to pedestrian detection include Haar-like feature applied, local binary pattern (LBP) feature applied, edgelet feature applied (Luo et al., 2019), scale invariant feature transform applied (Chen et al., 2018), histograms of oriented gradients (HOG) features applied, and multifeature based on Hough forest applied (Peng et al., 2014). The classification algorithms commonly used in the field of pedestrian detection mainly include: artificial neural networks, various boosting methods, and support vector machine (SVM) (Su et al., 2012). The disadvantages of this method are as follows: deep learning requires a large number of data samples for training to obtain the pedestrian classifier model and maintain a sufficiently high accuracy. At the same time, because a large number of data need to be processed by a hardware system, the performance of the hardware system is very important. To improve the performance of the hardware system, most pedestrian detection systems run on a PC platform, but their volume and power consumption are relatively large. PCs are not suitable for specific application scenarios, such as a vehicle-mounted scene (Kim and Kim, 2018;Liem and Gavrila, 2014;Na et al., 2018).
To meet the demand for intelligent pedestrian detection at night, this paper proposes a vehicle-mounted nighttime pedestrian detection algorithm based on SVM and HOG. In the vehicle illumination recognition system, a camera is used to capture the image data of the road ahead, and the RGB image is converted into a hue saturation lightness (HSL) image. The image edge and erosion center are obtained by introducing an image processing mathematical morphology method. All irregular reflectors are identified according to the symmetrical shape characteristics of the vehicle lighting. By dividing the image into upper and lower parts, we can identify the position of the headlights in the lower part of the road area in the image. In addition, the upper and lower boundary lines can be dynamically adjusted according to the farthest position of the lane line to adapt to actual road conditions, such as ramps and downhill roads. Then, the lighting direction of the matrix headlamp can be controlled according to the lighting environment. In the proposed system, liquid crystal display and wireless Bluetooth are configured, and Bluetooth communication technology and a mobile client interface are used to achieve image adjustments. The recognition calibration line added to the image display is used to solve the shooting angle offset problem and optimize the recognition effect. In short, the system can reduce the number of recognition errors and effectively improve the correct rate (Qazi et al., 2011).

System design principle
The problem of pedestrian detection at night includes the process of feature extraction in a broad sense. In a narrow sense, it includes matching methods, which are methods of determining whether the images match or not. Traditional matching methods are mostly based on simple distance operations or correlation operations, which have the advantages of good generality and high computational efficiency. However, in this particular style of image, it is difficult to achieve high precision. This paper mainly discusses the problem of selecting a measurement strategy and proposes a transformation of the problem of detecting nighttime pedestrians into a classification problem to overcome the shortcomings of traditional matching methods (Liu and Zhang, 2019;Su et al., 2012).

Image preprocessing
Before processing, feature extraction and classification of video images, chroma transformation, pixel edge acquisition, image segmentation and other image preprocessing should be carried out to optimize the acquired nighttime images.
Chromaticity conversion. The initial image acquired by the system may affect all three parameters of the image RGB values of the image due to different lighting conditions (Pan et al., 2015), but only the H parameter is affected in the image of HSL value of the image. To improve the image stability of the acquisition, in the image preprocessing steps, the RGB values of the image will be converted into HSL values. RGB values are converted into HSL values by the following In formulas (1) to (3), h represents hue, l represents lightness, and s represents saturation. Max and min represent the maximum and minimum RGB values, respectively. After converting the image, we recorded the pixel coordinates of the pixels with maximum brightness according to formula (3) and set it as the central target point (X, Y) to facilitate the mask operation of the image.
Pixel edge acquisition. After the chroma conversion of the image was completed, the brightness of the pedestrians and the background were significantly different. The areas represented by pedestrians on the image are white with high brightness, while the background brightness is black. The purpose of this paper is to determine the position of the pedestrian area (the highlighted area) on the image. To determine the upper boundary, lower boundary, left boundary, and right boundary of the highlighted area of the image, the image is masked to smooth the image. We use the central target point (X, Y) as the structural element to corrode the image in formula (2), create a 3 Â 3 convolution core as the mask, and use the HSL threshold to create a mask. The original image and mask perform bit-by-bit operations starting from (X, Y). For the source image data F(x, y), the convolution operator is 3 Â 3, and the average mask in the neighborhood of 4 is calculated by the following formula (4) Fðx; yÞ ¼ Zðx À 1; yÞ þ Zðx þ 1; yÞ þ Zðx; y À 1Þ þ Zðx; y þ 1Þ 4 (4) In formula (4), x and y represent the horizontal and vertical coordinates of the image, respectively. By the above method, the center coordinates and boundary coordinates of the highlighted area on the image and the distance between the center and the boundary can be determined preliminarily. To further distinguish pedestrians from other highlighted objects, the pedestrian shape is analyzed in this paper. Ordinarily, pedestrians stand upright. A cuboid was used as the model to set up the basic conditions, and the length-width ratio of the cuboid was set in the range of 2:1-5:1 5 minðn 1 þ n 2 ; n 3 þ n 4 Þ > maxðn 1 þ n 2 ; n 3 þ n 4 Þ > 2minðn 1 þ n 2 ; n 3 þ n 4 Þ (5) min means the minimum value of n 1 þ n 2 and n 3 þ n 4 and max means the maximum value of n 1 þ n 2 and n 3 þ n 4 .
Image segmentation. The image area is roughly divided into upper and lower sections to avoid the interference from the sky, the street lights, and trees. However, in actual road conditions, such as uphill and downhill roads, the camera cannot make upward and downward adjustments in real time, so the image division area is not accurate, and the correct recognition area and nonrecognition area cannot be divided. This system proposes a method of optimizing the effective region division. When it is necessary to divide the image into sky and road regions, the boundary line is dynamically adjusted according to the actual situation of the road. In addition, when there is no lane reference line on the road, Line1 is set to be a boundary line at 1/2 of the image width by default. By default, the distance from Line1 to Line2 is set to 1/10 of the image width (the height of the pedestrian in the image corresponding to the actual distance), and Line2 is aligned with the actual boundary line when the camera is actually tested and installed.
In Figure 1, the content of effective area division in the case of an uphill road is introduced. As shown in Figure 1(a), the original figure shows that there are tag lines on the road, and the furthest point from the lane is used as the reference point of the horizontal dividing line, Line2, in the system. The reference points for Line2 are shown in Figure 1(b), which is the brightened image after the chroma conversion. The relative distance between Line1 and Line2 is the approximate height of the pedestrian, as shown in Figure 1(c) and (d). The relative distances between Line1 and Line2 are fixed and the lines are dynamically adjusted upward and downward according to the actual boundary between the road and sky. Then, the image is processed by the normalized segmentation method. The image is represented by an undirected weighted graph, and the pixels in the image are regarded as nodes in the graph. The segmentation criterion based on graph theory is used to perform image segmentation. The graph G ¼ (V, E) consists of some nodes V and edge E, and each edge corresponds to the weight W ij of the two nodes it connects. If each pixel is regarded as a node, then W ij is the similarity between two pixels The degree of dissimilarity between regions A and B is defined as wðu; vÞ (7)  The degree of correlation between parts A and V is defined as wðu; tÞ (8) Then, image segmentation can be defined as To find the optimal segmentation outcome, the minimum value of NCut must be determined. It is very difficult to solve the above problems directly, and the amount of computation required to solve the problems is very large. Therefore, it is necessary to transform the problems into Rayleigh quotient minimization problems min x NcutðxÞ ¼ min y y T ðD À WÞy y T Dy (10) min y corresponds to the minimum value of y and Ncut(x) min corresponds to the minimum value of NCut. The solution can be further converted to the generalized eigenvalue problem after the constraint conditions are relaxed The solution can divide the graph according to the eigenvector corresponding to the second smallest eigenvalue, and the result is the optimal NCut value, so the result image of the two-segmentation can be obtained. After obtaining the result of two-segmentation, the sub-region is divided into two parts by the recursive method, and finally, the result of multi-segmentation is obtained.

HOG feature extraction
To accurately distinguish pedestrians and other non-pedestrians in highlighted areas, this paper uses the HOG feature and machine learning method (Aleena and Anamik, 2016). The HOG descriptor is a feature descriptor used for object detection in computer vision and image processing. It calculates a statistical histogram of the local area of the image to form characteristics compared with other methods that describe the characteristics. HOG has many advantages. First, since HOG operates on the local grid unit of an image, it can maintain a favorable level of invariance to the geometric and optical deformations of the image. These two kinds of deformations can only occur in a larger space. Second, with coarse spatial sampling, fine direction sampling, and strong local optical normalization, as long as pedestrians can maintain a generally upright posture, pedestrians are allowed to have some subtle limb movements; the subtle movements can be ignored without affecting the detection process. Therefore, HOG features are particularly suitable for human detection in this paper. The HOG feature extraction process is as follows: Gamma normalization. The image is normalized to reduce the influence of local shadows and the background as shown below Computing the image gradient. The gradients of the horizontal and vertical coordinates of the image are calculated. The pedestrian contour and edge information can be captured in the derivation process according to the gradient direction values of each pixel position. The gradients of the pixels (x, y) in the image are G x ðx; yÞ ¼ Hðx þ 1; yÞ À Hðx À 1; yÞ (13) G y ðx; yÞ ¼ Hðx; y þ 1Þ À Hðx; y À 1Þ (14) G x (x, y), G y (x, y), and H(x, y) represent the horizontal gradient, the vertical gradient, and the pixel value at the pixel point (x, y) in the input image, respectively, and the gradient increment and the gradient direction at the pixel point (x, y) are shown in equations (15) and (16), respectively In this paper, the [-1,0,1] T gradient operator is used to perform convolution operations on the original image to obtain the gradient components in the horizontal direction x. Similarly, the [1,0,-1] T gradient operator is used to perform convolution operations on the original image to obtain the gradient components in the vertical direction y. Then, the gradient size and direction of the pixel are calculated by combining the above formulas.
Cell division. The histogram of the oriented gradients of each cell element is constructed. The image is divided into several "cells"; in a 64 Â 128 image, each 8 Â 8 pixel constitutes a cell, and each 2 Â 2 cell constitutes a block. Using 8 pixels as the step size, then there are 7 scanning windows in the horizontal direction and 15 scanning windows in the vertical direction, with a total of 36 Â 7 Â 15 ¼ 3780 features.
Normalized gradient histogram. The directional histogram corresponding to a single cell is transformed into a one-dimensional vector; the number of corresponding directional gradients is coded according to the prescribed group spacing (8,10,6,12,4,5,8,6,14), and nine features of a single cell are obtained. Each block (scanning window) contains 2 Â 2 cells, that is, 2 Â 2 Â 9 ¼ 36 features. The final feature number of an image with a size of 64 Â 128 is 36 Â 7 Â 15 ¼ 3780. In this way, an intuitive gradient graph is decomposed and extracted into a feature vector easily decoded by a computer.
Collecting HOG features. The last step involves collecting the HOG features from all overlapping blocks in the detection window and combining them into the final feature vector for classification.

Image classification based on SVM
The above HOG features can be classified by SVM to distinguish pedestrians from nonpedestrians. The equation of an hyperplane is where x is the eigenvector. The optimal classification surface needs to meet the following condition At the same time, the following conditions are met where (x i , y i ), i ¼ 1,2. . .n is the data sample set obtained by training, y i 2{þ1,-1}, þ1 is a positive sample, -1 is a negative sample; the Lagrange multiplier a i is used to convert to To solve a i *, the following equations are used If the sample is not linearly separable, a slack variable n needs to be added, which corresponds to the interval that allows the ith data point to deviate. If n is arbitrarily large, any hyperplane meets the condition. Therefore, on the basis of the original objective, the total amount of n should be as small as possible, and the new objective function should be changed into Equation (22) can be converted to SVM can also transform features into high-dimensional spaces by a kernel function to solve nonlinear problems. The optimal classification function is K(x i , x j ) is the kernel function and the eigenvector x outputs the positive and negative categories of the target through the above formula.
Experimental results and discussion The problem system architecture In this paper, the Raspberry hardware platform and PC platform are used for analysis. As shown in Table 1, the frame rate of the video display can reach more than 30 frames per second. However, when comparing the volume, the hardware volume of the setup in this paper is much smaller than the volume of a desktop PC, which improves the portability of the system. At the same time, input/output (IO) ports exist on peripheral ports and improve the compatibility of peripheral devices. In addition, the Raspberry hardware platform is much cheaper than a PC (Liu Q, 2016). In conclusion, the hardware in this paper is more suitable for vehicle scenes.

Lane line detection
When the image area is divided, the adjustments of the dividing line are based on the farthest point in front of the lane line, when there is a lane line. To obtain the farthest point, the system carries out image edge detection and a series of image processing steps. The process of line detection is as follows: pixel is not the real boundary point. The threshold range of this system is 50-120, and the edges of edge images are cv2.Canny (gray, 50, 120). The effects of edge extraction are shown in Figure 4. • Several straight lines are obtained by the probabilistic Hoff transform of the image after the edge detection step. The above edge image is calculated by using the statistical function cv2.HoughLinesP (DST, lines, rho, theta, threshold, minLineLength, maxLineGap) in OpenCV, where seven parameters are adjusted, DST is the binarized input image in Figure 5(b); the variable lines represent the parameters for storing the detected line; rho represents the resolution of the radius q of the parameter polar coordinates in units of pixels; theta represents the resolution of the angle h in the polar coordinates of the parameter; threshold represents the setting of the threshold, that is, the minimum curve intersection required for a straight line; minLineLength represents the minimum number of points that can form a straight line, as a line with an insufficient number of points is excluded; and maxLineGap represents the maximum distance of a point from a target line. The system was tested to select the ideal parameter value and detect the straight lines ¼ cv2.HoughLinesP (edges, 1, np.pi/180, 100, 10, 10, 30). The result is shown in Figure 6. The result is indicated by a green line.   tag lines have a slope with a diagonal line. The system sets a discrimination condition for the slope of the line. The conditional procedure as follows for x1, y1, x2, y2 in lines1: if x1-x2 < y1-y2: cv2.line(img, (x1, y1), (x2, y2), (0, 255, 0), 2) elif (x1-x2)> (y1-y2) | (x1 ¼¼ x2): cv2.line(img, (x1, y1), (x2, y2), (0, 0, 255), 2) The result is shown in Figure 7

HOG feature extraction
The HOG descriptor is a very effective feature descriptor for target detection. Some local shape features of the image can be represented by using the gradient of the image and the direction density segment of the edge, which can well describe pedestrian shape features in static images or videos well.
In this system, 64 Â 128 pixel images are taken as samples, and 8 Â 8 pixel unit samples are divided into small cell units. Each cell unit is processed with gradient and edge histogram, and the histograms of the cell units are combined to form image features. To improve the detection performance of the feature, four adjacent cells are combined into one block. The formation of the block is generated by sliding each cell unit, and the block moves eight pixels at a time, that is, a cell width, so that the sample can obtain a total of 105 blocks of size 7 Â 15. The pixel composition of each block is 16 Â 16, and the relationship between the cells and blocks is shown in Figure 8.
For each cell unit of a block, it is required to project a gradient direction to obtain a gradient direction histogram for each individual cell unit. The direction of the gradient will be divided into nine intervals, the angle range is 0-180 , and every 20 is an interval. The histogram of the gradient direction of four cell units in each block is connected in series to form the feature of the block, namely the feature of 36 dimensions. By traversing the entire sample, the feature vector of size 36 Â 105 with a total of 3780 dimensions can be obtained.

Nighttime pedestrian classification based on SVM
In this paper, SVM is mainly adopted for classification decisions and is combined with HOG descriptor information. In the classification problem, SVM is faced with a binary classification problem. It was found in the experiment that the accuracy results of linear and nonlinear classification methods are nearly identical. In the end, the linear classification method is selected, as it is faster than the nonlinear method. Assuming that there is a joint probability F(x, y) in the relationship between input x and output y, the process of classifier learning involves obtaining an optimal function based on n independent samples to be detected, so that the expected risk of prediction is minimized. The basic model of classifier classification learning is shown in Figure 9, and the training process of the classifier is shown in Figure 10.
In the pedestrian detection test of the system, the results of the detection are obtained in an actual night environment. As shown in Figure 11(a), the light is sufficient under the street light, the pedestrian contour is obvious, the objects on the road are clearly visible, and the system can detect the pedestrian well. In Figure 11(b), with weak light, the pedestrian  contour is still visible, and the system can detect pedestrian well. However, due to the contrast between the light column generated by the street light and the surrounding dark light environment, the system detects the street light as a pedestrian. As shown in Figure 11 (c), in the dark environment, the infrared camera works, and the two pedestrians on the right are barely visible; however, the people on the left are blurred, so the system does not detect the pedestrian.
To verify the above results, the system performs data statistics in similar nighttime scenes. As shown in Table 2, under good light conditions, the accuracy of detection is the highest, reaching 91.8%. The second condition was weak illumination, with an accuracy of 90.1%. The lowest accuracy was in the case with no light, with an accuracy rate of 81.2%. Therefore, better the nighttime lighting conditions lead to the higher detection performance of the system. Regarding the false alarm rate, the highest rate was in the case with weak illumination, which indicates that street lights have a large impact on the detection results of the system under a weak illumination environment and that the system needs to be further optimized. In actual scenarios, most roads have street lights and car lights. The actual lighting conditions are mostly between weak and good lighting. Therefore, the system has high adaptability in a night environment.
To further improve the performance of the system, the system divides the image area into the sky and road in the image preprocessing step, eliminates the interference of the area above the image, and carries out frame separation processing to optimize the fluency of the video. As shown in Figure 12, pedestrians are correctly detected in (a) before optimization, but street light is detected incorrectly. After area division optimization in Figure 12(b), the pedestrian is correctly detected, and street light interference is reduced.  The method before optimization is method 1 and the method after optimization is method 2. As shown in Table 3, the correct rate of method 2 is slightly higher than that of method 1, reaching 92.4%. The false positive rate is obviously lower than that of method 1, which is only 3.8%. The system maintains a high correct rate of pedestrian detection and effectively reduces the false positive rate. In addition, the average video frame rate of the video of method 2 reaches 28 frames/s, which is significantly better than that of method 1. In summary, method 2 is obviously superior to method 1. While maintaining a high accuracy rate, the false alarm rate and video fluency are also optimized.

Conclusion
The nighttime pedestrian detection system proposed in this paper can be applied to automobile driving, which is of great significance in night driving and provides important research value in the future development of unmanned driving system. At present, there are still many deficiencies in this system, and additional improvements are needed in the following aspects: 1. In the design of the pedestrian detection classifier, the correct rate of detection using the SVM is influenced by the number of training samples. This relation is not ideal at present. If we want to further improve the detection rate and reduce the number of false alarms, we need a sufficient number of training samples. At the same time, with an increase in the number of training samples, the processing time of the algorithm increases, which reduces Figure 12. Comparison of (a) before and (b) after pedestrian detection optimization. the real-time performance of the system. In the following research, deep learning and SVM can be combined to improve the performance of the system. 2. The method of region division of the image proposed in this paper has not been fully adapted to actual road conditions. If a road without a lane line appears all the time and uphill and downhill situations occur at the same time; the effectiveness of this image region division method is halved. In a future area division method, it may be advantageous to add a gyroscope to determine whether the vehicle is in an uphill or downhill situation, and according to the offset angle with the horizontal direction, the image division area can be adjusted to improve the adaptability to the road.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.