A novel semi-empirical supervised model of vortex-induced vertical force on a flat closed-box bridge deck

This study presents a novel single-degree-of-freedom model of vortex-induced vertical force, which is based on supervised learning. There are three steps in the process of modeling. First, a hypothesis function based on the Taylor expansion is applied to describe the complicated of vortex-induced vertical force. Second, this hypothesis function is optimized by spectrum and correlation analysis. The terms in this function are deleted when they meet one of the following cases: the frequency amplitudes are close to 0; the correlation coefficients with the vortex-induced vertical force are less than 0.3; the correlation coefficients with other low-order terms are more than 0.8. Third, the validity and reliability of the optimized function are verified by comparative and residual analysis. The process of optimization makes the proposed model simple and well describes the main characteristics of vortex-induced vertical forces. Moreover, the maximum displacement is accurately predicted according to the proposed model. Simulation results show that the proposed model has a high coefficient of determination (R2) compared with Scanlan’s and Zhu’s models, which means that the proposed model is more suitable to describe vortex-induced vertical forces.


Introduction
The vortex-induced vibration (VIV) for long-span bridges is one of the most common aerodynamic phenomena due to wind-structure interactions at low wind speeds. 1 If the frequency of vortex shedding approaches one of the natural frequencies of a long-span bridge, a ''lock-in'' phenomenon will occur. 2 Under ''lock-in'' conditions, the VIV causes large amplitude response on long-span bridges. 3 Although the VIV does not always result in catastrophic events, it seriously impacts the fatigue life and results in a loss of usability of the structure. The necessity of producing methods against VIVs has led a lot of researchers to study the VIV phenomena. [4][5][6][7][8][9][10][11][12][13] However, the VIV is so complex that now a purely theoretical model cannot be established. A convenient way is to form a semi-empirical model based on the main phenomenology of VIVs through wind tunnel tests.
In the past few decades, many models have been proposed. [14][15][16][17][18][19] The vortex-induced force is difficult to be accurately measured in wind tunnel tests because of the limitation of experimental technology; most of the models are unsupervised. These unsupervised models 1 are built on the Van der Pol oscillator or the professional knowledge of researchers. Ehsan and Scanlan 3 proposed a nonlinear model based on the Van der Pol oscillator, and the model has been widely applied in engineering. However, it fails to describe the nonlinear vortex-induced vertical force (VIVF). [20][21][22] Wu and Kareem 20 used the Volterra series to present a nonlinear model for long-span bridges. This model can capture the typical characteristic of VIV such as the limit cycle oscillation, but its kernels are very difficult to obtain due to the complexity of the multi-input nonlinear system. Marra et al. 22 analyzed experimental data by Griffin plot and found that the relationship between the Scruton number and the linear damping coefficient of Scanlan's model is linear and the relationship between the Scruton number and the nonlinear damping coefficient is quadratic. They proposed a model based on Scanlan's model to fit the experimental data, but at least three wind tunnel tests are required to identify parameters of the model. Although these models enrich the knowledge of VIVs, they are to some extent ambiguous, because these models are built on and validated by the measured displacement. 21 With the development of science, the time history of the VIVF has been accurately measured in wind tunnel tests. 21,23 It is possible to use the measured VIVF to supervise and evaluate a VIVF model. Zhu et al. 21 proposed an improved model by adding a quadratic term of velocity and displacement to Scanlan's model and modifying the nonlinear damping term of the model. Compared with Scanlan's model, this improved model well describes VIV phenomena. Zhu et al. 23 simplified this improved model by deleting the quadratic term, the linear stiffness, and the instantaneous lift term. However, the information embedded in the measured data is not fully mined in modeling, which may be good for modeling.
It is urgent to study a simple and convenient model with new technology. Supervised learning is a good modeling method, and its primary task is to infer a function from training data, including the labeled data. 24 The essence of learning is to find relationships between the feature data and the labeled data. Supervised learning is used to model regression 25 and classification. 26 In the study, the measured VIVF is regarded as the labeled data, and a functional relationship between the labeled data and other measured data is sought by supervised learning. According to this functional relationship, a novel semi-empirical supervised model is proposed. The proposed model has few parameters and lowcomputational complexity. Importantly, the maximum oscillation amplitude of bluff bodies can be accurately estimated at different wind speeds by the proposed model. The rest of the study is organized as follows. In the ''Related work'' section, Scanlan's model and Zhu's model are simply described. In the ''Proposed model'' section, a new model is proposed, the modeling is discussed in detail, and parameters of the proposed model are identified. In the ''Simulation results'' section, by simulation results, the validity and reliability of the proposed model are verified. The ''Conclusion and future works'' section summarizes the study.

Related work
The study focuses on the vertical vibration of bluff bodies under uniform flow conditions and ignores variables of inflow and torsional response. Therefore, the general form of a motion equation that describes the across-flow response of a single-degree-of-freedom can be expressed as follows where m is the mass per unit span length; j is the mechanical damping ratio; v is the natural frequency of the bluff body; y, _ y, and € y are displacement, velocity, and acceleration, respectively; and F(y, _ y, t) is a function of y, _ y, and time t. Ehsan and Scanlan 3 proposed an empirical nonlinear model as below where r is the air density, U is the wind speed, D is the characteristic length of the bluff body, v s is the circular frequency of vortex shedding, and f is the phase difference between the vortex shedding and the displacement response. At the lock-in region, Y 1 , e, Y 2 ,C L , v s , and f are parameters that need to be identified, and the term 1 2C L sin (v s t + f) is negligibly small. 27 Zhu et al. 21 put forward an improved nonlinear model, which can be expressed as follows Compared with Scanlan's model, Zhu's model has a new term Y 3 y_ y DU and a modified term e _ y 3 U 3 . It has seven parameters fY 1 , e, Y 2 , Y 3 ,C L , v s , fg; thus, Zhu's model is more complex than Scanlan's model.

Proposed model
Many studies on VIVF are based on measured vertical displacements. By comparing the reconstructed displacement with the measured displacement, the validity of the VIFF model is verified. 3,13 Therefore, there is an unknown gap between the measured VIVF and the VIVF reconstructed by these models.
To better describe the characteristics of VIVF, the study uses supervised learning to mine relationships of data and build a good VIVF model, and the measured VIVF is regarded as the labeled data. As shown in Figure 1, a hypothesis function is set according to the Taylor expansion. Then the information embedded in the measured VIVF is mined by spectrum and correlation analysis to optimize this hypothesis function. The validity and reliability of the optimized hypothesis function is verified by comparative and residual analysis. Finally, the proposed model is achieved.

The modeling process
The VIVF signal is superimposed on a lot of high-order harmonics. The Taylor expansion is used to approximate a nonlinear VIVF function. F h (y, _ y, t) is the hypothesis function to approximately describe the VIVF on a flat closed-box bridge where F h (y 0 , _ y 0 , t 0 ) is assumed to be a constant; linear terms include y or _ y; quadratic terms include y 2 , y_ y, and _ y 2 ; cubic terms are y 3 , y 2 _ y, y_ y 2 , and _ y 3 ; and R n is the remainder of the Taylor polynomial.
Equation (4) can also be written as follows Equation (5) can theoretically describe the VIVF on a bluff body, but it contains too many high-order nonlinear terms. A desirable model should have few parameters that need to be identified and easily measurable quantities describing the physical behavior of the model. 28 Therefore, equation (5) needs to be optimized according to spectrum and correlation analysis.
First, the measured VIVF is analyzed in the frequency domain. In Figure 2, there are a green box and three red boxes. The green box represents a constant, and the red boxes represent three prominent frequency points f , 2f , and 3f , respectively, where f is one of the natural frequencies of the structure. Figure 2 shows that only a few of the terms in equation (5) are enough to capture the main characteristic of VIVF, so it becomes equation (6) F h y, _ y, t ð Þ' 1 2 rU 2 2D ð Þ a 0 + a 1 y + a 2 _ y + a 3 y_ y + a 4 y 2 + a 5 _ y 2 + a 6 y 2 _ y Â + a 7 y_ y 2 + a 8 y 3 + a 9 _ y 3 ð 6Þ  Second, different terms make different contributions to the VIVF. If a term can capture the actual characteristics of the measured VIVF, it has a high correlation coefficient with the measured VIVF. Because moderate and strong correlations are taken into account, a threshold is defined as 0.3. 29 If the absolute value of a coefficient is larger than the threshold, there is a certain degree of correlation between the force variable and its column variable. Table 1 shows the correlation analysis of the measured VIVF and independent variables. Every red number denotes a high correlation between the measured VIVF (F) and its column variables. Column variables corresponding to the red number are kept, and other variables are directly omitted. Therefore, the variables fy 2 , _ y 2 , y_ y 2 g are deleted from equation (6), and equation (6) is simplified as follows In addition, correlations between different independent variables in equation (7) need to be analyzed to keep the independence of variables. Since a strong correlation is harmful to the model, the threshold is set as 0.8. 29 If the absolute value of a correlation coefficient is larger than the threshold, there is a strong linear correlation between two variables. Table 2 shows correlations between different independent variables. The correlation coefficient between y and y 3 is 0.95, and the coefficient between _ y and _ y 3 also is 0.95. Therefore, a simple and effective way is to delete y 3 and _ y 3 from equation (7). Therefore, equation (7) is again simplified Through the optimization process, the proposed model can be expressed as follows where u 1 y represents the aerodynamic stiffness term, u 2 _ y is related to the linear component of the aerodynamic damping, and u 3 y_ y and u 4 y 2 _ y are responsible for the nonlinearity of VIVF.
The dimensionless format of equation (9) can be expressed as follows where m r = rD 2 =m, s = tU =D is dimensionless time, h(s) = y(t)=D is dimensionless displacement, h 0 (s) = _ y(t)=U is dimensionless velocity, and F(h(s), h 0 (s), s) is the dimensionless VIVF.

Parameter identification
Parameter identification is important because accurately predicting a prototype response directly depends upon the accuracy of the parameters identified from the experiments. Equation (10) is regarded as a linear combination of nonlinear basis functions which are h(s), h 0 (s), h(s)h 0 (s), and h 2 (s)h 0 (s). Let x = (m r , m r h(s), m r h 0 (s), m r h(s)h 0 (s), m r h(s) 2 h 0 (s)), and u = (u 0 , u 1 , u 2 , u 3 , u 4 ) T is regarded as regression parameter vector, then equation (10) can be written as followŝ Equation (11) is viewed as a linear function of x. The measured VIVF can be expressed as follows where § is an error vector which is normally distributed with a mean m and a variance s 2 .F andF u (x) are the values of the dimensionless VIVF at the dimensionless time s measured in the test and reconstructed by equation (11), respectively. A cost function based on the least squares theory is defined to describe the distance between the measured and reconstructed VIVFs The minimum of equation (13) is found by setting its derivative to zero. Since the proposed model contains five parameters, there are five derivative equations. Parameters can be obtained by solving the following equations

Simulation results
A training set includes 1024 data from the transient part of the steady stage of the VIV time history at the wind speed U = 9:1 m=s for the case of a damping ratio of j = 0:5%. As shown in Figure 3, Figure 3(a) displays the transient part of the steady state of the measured VIVF time history, and Figure 3(b) and (c) shows the displacement and the velocity, respectively. Parameters of the proposed model are identified by solving equation (14) and are shown in Table 3. The reconstructed VIVF is then obtained by equation (11). Figure 4 shows the weak nonlinear characteristics of the reconstructed VIVF. The proposed model is used to estimate the contribution of every term to the reconstructed VIVF. The percentage of each term of the reconstructed VIVF is shown in Figure 4. The nonlinear part, including u 3 and u 4 , makes up 31.43% of the reconstructed VIVF.

Comparison of VIVFs
In this section, the performance of the proposed model is compared with Scanlan's nonlinear model and Zhu's model. A testing set, including 5000 data from the transient part of the steady stage of the VIV time history, is used to test the performances of the three models, respectively. The parameters in Scanlan's model and Zhu's model are shown in Table 4.    Figure 5 displays the spectrum difference between the measured and reconstructed VIVFs. As shown in Figure 5(a), the reconstructed VIVF has a significant frequency component f and an unremarkable 3f . In Figure 5(b), the reconstructed VIVF has similar frequency components as the measured VIVF. The amplitude of f in the reconstructed VIVF is higher than in the measured VIVF, whereas the 2f and 3f amplitudes of the reconstructed VIVF are smaller than those of the measured VIVF. As shown in Figure 5(c), both the measured and reconstructed VIVFs have the same frequency amplitude at f . The 2f and 3f amplitudes of the reconstructed VIVF are smaller than that of the measured VIVF. In addition, due to the constant term, the reconstructed VIVF has the same frequency amplitude as the measured VIVF at the origin. Therefore, the proposed model suitably describes the measured VIVF.    Figure  6(a) displays the relationship of displacement, velocity, and the measured VIVF. As can be seen in Figure 6(b), the reconstructed VIVF based on Scanlan's model is an ellipse with displacement and velocity, and the shape of the curve is obviously different from that in Figure 6 Figure 6(c) is different from that in Figure 6(a). Thus, the proposed model well describes the VIVF. Figure 7 shows the fitting effect of the three models. The coefficient of determination, denoted as R 2 , indicates a degree that data fit a model. If the model fit the raw data well, R 2 is close to one. Ten testing sets that come from different transient parts of the steady stage of the VIV time history are used to verify the reliability of the three models. It is found from Figure 7 that the proposed model has the highest value of R 2 ; thus, the proposed model is the most suitable to describe the VIV phenomena.
Residual analysis is a method to evaluate the validity of a model. 30,31 The ordinate in Figure 8 is the normalized residuals of VIVF, and the abscissa is the fitted values (the reconstructed VIVF). A blue circle denotes a data point, and the red curve shows the fitting relationship between the normalized residuals and the fitted values. If the residual sequence of a model has a regular distribution or tendency, the model needs to be improved. 30,32 In Figure 8(a), the red parabolic curve indicates that Scanlan's model lacks quadratic terms. Lack of the 2f frequency component may be responsible for the invalidity of Scanlan's model. Figure 8(b) shows that the red line is monotonically decreasing, which indicates  that the residual sequence has a certain tendency. According to Table 2, there is the strong correlation between _ y and _ y 3 in Zhu's model. Thus, the strong correlation may make Zhu's model distorted.
In Figure 8(c), data points are randomly scattered and the red line is almost parallel to the abscissa, which indicates that the residual sequence is irregular. Compared with the other models, the proposed model is valid. Figure 9 shows the validity of the proposed model from the perspective of residuals. The blue bar denotes the frequency of the residuals, the red curve represents the kernel density estimation, and the green curve is the density diagram of the normal distribution. It can be seen in Figure 9 that the residual sequence has a normal distribution. By residual analysis, the validity of the proposed model is verified.
The proposed model contains important terms corresponding to significant frequencies of the measured VIVF, and these terms have high correlations with the measured VIVF and low correlations with the other terms. Therefore, the proposed model is the most suitable model to depict the VIVF on a flat closed-box bridge.

Comparison of displacements
The maximum vertical displacement response is important to engineering applications. Thus, the validity of the proposed model needs to be furtherly verified by a comparison of the reconstructed and measured displacements. The Newmark-b method and the reconstructed VIVF are used to reconstruct the displacement. Figure 10 shows that the reconstructed displacement has the same frequency spectrum as the measured displacement. Figure 11 illustrates that the maximum amplitude varies with the wind speed at the lock-in range for the case of j = 0:5%. It can be found that the two curves are quite close to each other. Before the reduced wind speed reaches 19, both the measured and reconstructed maximum displacements are approximately proportional to the reduced wind speed. When the reduced wind speed exceeds 20, the maximum displacements quickly drop. The maximum of the two curves means that the strongest vibration happens. By the comparison of maximum displacements at different reduced wind speeds, the validity and reliability of the proposed model are further verified.

Conclusion and future works
To better understand the VIVF, supervised learning is used to build a model by deeply mining the information hidden in the experimental data. First, a hypothesis function based on the Taylor series describes the VIVF. Then spectrum and correlation analysis are used to optimize this function. The essence of VIVF signal is obtained by spectrum analysis, and correlation analysis increases the ability to depict VIVFs and keeps the independence of independent variables. Finally, the validity of the proposed model is verified by comparative and residual analysis.
The proposed model has five parameters and is considered as a linear combination of nonlinear functions. The fit effect of the proposed model is better than Scanlan's model and Zhu's model. Due to its simplicity and low-computational complexity, the proposed model is easy to apply in engineering.
In China, the construction of large-span highway bridges and long-span high-speed rail bridges has rapidly developed. Machine learning is conducive to further study the nonlinear effects of VIVF of long-span bridges, so more wind tunnel tests on different crosssectional shapes (including coupled motion and arbitrary motion forms) will be helpful to extend the theoretical model.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was partially supported by the State Natural Science