Investigating the severity of expressway crash based on the random parameter logit model accounting for unobserved heterogeneity

The present study utilized a random parameter logit (RPL) model to explore the nonlinear relationship between explanatory variables and the likelihood of expressway crash severity. The potential unobserved heterogeneity of data brought by China’s road traffic characteristics was fully considered. A total of 1154 crashes happened on Hang-Jin-Qu Expressway from 2013 to 2018 were analyzed. In addition to the conventional impact factors considered in the past, variables related to road geometry were also introduced, which contributed to expressway accidents significantly. The overall stability of the model estimation was examined by likelihood ratio test. Then, the average elastic coefficient of the significant factors at each severity level was also calculated. Several factors that significantly increase the fatal crash probability were highlighted: rainy/snowy/cloudy weather condition, low visibility (100– m), night without light, wet-skid road surface, being female, aged 41+ years, collision with a rigid barrier and some other obstacles, radius and length of horizontal curve, and longitudinal gradient. The parameters of four factors were random and obeyed normal distribution: night without light, being female, driving experience with 10 + years and with large vehicle responsible. These findings provide insights for better understanding of expressway crash severity. Some countermeasures were proposed about driver education, traffic law enforcement, vehicle and road design, environmental improvement, and so on.


Introduction
The high mortality rate caused by road traffic accidents has imposed heavy economic and emotional burdens to the family and society. Tremendous efforts have been made to reduce the frequency and severity of accidents. 1 Some kinds of statistical models were usually built based on crash data to explore the relationship between the accident severity and traffic occupants, vehicles, road infrastructures, environmental conditions. And also, measures have been proposed based on the comprehensive understanding of factors affecting the risks and severity.
The crash rate of the expressway was lower than the other types of roads (such as an urban road). 2 But the expressway was usually ranked first in all road types 1 for the aspects of mortality. 3 According to statistics of Transportation Administration of China, the expressway crashes account for about 5% of road traffic crashes, but the number of deaths accounts for 10%. Traffic safety issue of expressway has been a hot topic of management agencies and researchers for many decades. Most of them focused on investigating the rate of crashes, but the crash severity has not been paid enough attention to our best knowledge. Therefore, it is valuable to conduct a comprehensive analysis of the factors affecting the expressway crash severity.
However, there are differences in the statistical calibers of traffic accidents due to the various stages of development in different countries. The data integrity needs to be improved, especially in developing countries. Injury severity conditional on crash occurrence can depend on numerous factors all of which are most certainly not observed in crash databases. These unobserved factors can moderate the influence of other observed covariates in the model leading to variation in the parameter effects across different observations. These unobserved variations are referred to as ''unobserved heterogeneity,'' which is of considerable importance in injury severity analysis. More attention should be attached to the heterogeneity in the modeling process. Judging from the modeling methods cited by scholars, the random parameter logit (RPL) model is still the most widely used.
The current study intents to optimize the traditional discrete choice models with fixed parameters into the RPL model with some random parameters for quantitative analysis of traffic accidents by considering the unobserved heterogeneity among predictor variables. The factors about driver, vehicle, road, and environment are considered as independent variables. The responsible and accuracy of the model are verified, and then the average elastic coefficients of the significant factors at each severity level of the collision are calculated. Some corresponding improvement strategies are proposed based on the findings.

Literature review
Some researchers have studied the crash severity regarding drivers, vehicles, roads, and the environment conditions in recent years. The severe weather and unfavorable segments would increase the crash severity in the merging and diverging locations of freeways. 4 Many significant factors including the average speed of road section, the average daily traffic volume, time periods, weather conditions, the physical characteristics of accident area, and the cause of accident were identified based on predicting the injury severity level and analyzing the influence extent through multiple logistic regression model. 5,6 The findings showed significant differences in the impact of driving behaviors, environmental conditions, and some other factors on the accident rate under different severity levels. Following from above analysis, we found that many scholars have studied the field of crash severity from various aspects, especially the influencing factors, but there are still some unknown or controversial factors, such as road alignment.
The logistic regression is a promising statistical model that provides significant interpretations in promoting future safety performance based on the results of accident-related data analyzing. 7 Discrete choice models (e.g. Logit and Probit) are more suitable for solving the discrete dependent variable problem of crash severity, and they have also been widely used. 8 The multinomial logit(MNL) model was effective in exploring the risk factors that affect the injury severity of 11,771 traffic accidents in 6 years of Turkey. 9 A joint Poisson regression model was presented with multivariate normal heterogeneities of crash frequency by severity level for freeway sections, considered the presence of common unobserved factors that influence crash frequencies of different severity levels. 10 And also, the random parameter models, 11-13 markov switching approach, 14 ordered logit models, and latent class cluster models were applied to solve the unobserved heterogeneity problem. The RPL model was developed to explore the driver injury severity in single-vehicle crashes by setting the regression coefficients of age and gender as random parameters. 15 The accidents were divided into serious and non-serious accidents and then compared the fitting effects of the fixed-parameter logit model, RPL model, and random forest model. 16 The results showed that the RPL model and random forest model had better fitness. It was necessary to consider the nonlinear relationship between variables and individual heterogeneity ignored in the traditional logit model. By studying the characteristics of traffic accidents in remote mountainous areas, three random parameter models (RPNB, RPNB-L, RPNB-GE) were constructed to solve the heterogeneity caused by missing variables and verified that the RPNB-L model was effective in terms of prediction ability and superiority in fitness. 17 Numerous studies have shown that the combination of steep slopes, horizontal curves, and sharp bends with insufficient visibility in rainy weather would increase accidents. According to the order characteristics of injury severity, Rezapour et al. 18 used ordered logit models to investigate the impact of various factors on crash injury severity of single and multiple-vehicle downgrade accidents. The results showed that there was a significant difference between the two types of accidents. In order to assess the impact of driver's age on the severity of accidents accounting for unobserved heterogeneity and age group differences, Osman et al. 19 constructed a mixed generalized ordered response probit model to analyze the injury severity of commercially licensed drivers involved in single-vehicle crashes considering the discrete ordinal nature of injury severity data.
In addition to the statistical and econometric methods, the data-driven methods such as those relating to data mining, artificial intelligence, machine learning, neural networks, support vector machines, and others, were also widely used in the analysis of traffic accident data. Such methods have the potential to handle extremely large amounts of data and provide a high level of prediction accuracy. But, they may not truly understand the effects of specific factors on the resulting injury probabilities. However, the heterogeneity in modeling has been paid attention to in recent years. Some scholars account for potential unobserved heterogeneity by extending traditional statistical and econometric methods models.
Samples were grouped and the latent class cluster analysis were performed to identify homogenous subgroups for a specific crash type-pedestrian crashes. 20 The influencing factors of single-vehicle crashes severity were investigated by developing a latent class logit model as an alternative to the frequently used random parameters models to account for unobserved heterogeneity across observations. 21 The heterogeneity may be attributed to the interaction between different types of factors, 22 including roadway conditions, 23 demographic and behavioral attributes of drivers, 24 environmental conditions, etc. Ignoring the heterogeneity for modeling may lead to inaccurate parameter estimation and bias in analyzing causal factors.
Each of these methods has an implicit trade-off between practical prediction accuracy and their ability to uncover underlying causal relationships. In particular, these methods such as the RPL models, are less performing than machine learning techniques but have the advantage of numerically expressing the impact of certain factors. In the literature, the potential unobserved heterogeneity was identified as an important issue, but the studies in this area were still rare. These effects gave support to the application of the RPL models in injury severity research and verified the individual heterogeneity based on some factors.

Data preparation
Data pre-processing The data used in this study were about crashes from the Hang-Jin-Qu Expressway in Zhejiang Province of China, which is designed to speed of 120 km/h and has six lanes in two-ways. Totally 1443 traffic accidents from 2013 to 2018 in the 290 km long sections were recorded in the road maintenance management system of Zhejiang Transportation Group. Excluding the accidents in the ramp entrances of the service area and interchanges, the remaining 1154 accidents with complete information were taken as a sample set. In the original traffic accident data, there were four levels of crash severity according to the definition of the China Ministry of Public Security, including no injury (property damage only), minor injury, serious injury, and fatal. The level of injury sustained by the most severely injured vehicle occupant defined the crash severity. 25 Although there were many ways to classify the crash severity, 26 the most commonly used method was to arrange the severity level into three categories: fatal, injury, and property damage only (PDO). 5 Therefore, this study combines minor injury and severe injury accidents into injury accidents for statistical analysis. The additional information recorded in the system includes driver factors, vehicle factors, road conditions, and environmental conditions. The road design data such as horizontal curve properties and longitudinal profile alignment of the expressway were extracted from the construction drawing design documents. The data of horizontal curve properties include radius and length of horizontal curve. The data of longitudinal profile alignment include longitudinal gradient. The other information about driver and vehicle factors, road, and environmental conditions were generated from the reports of expressway traffic police. Because of the record deviation of the police in collecting accident information or the artificial operation deviation of the modeling personnel, the existence of abnormal values is inevitable. Therefore, a distance-based outlier detection algorithm was applied to preprocess the missing values, outliers, and consistency in the data to improve the model accuracy. 27

Data description
The dependent variable was the crash severity level. The independent variables included in the model were divided into discrete and continuous variables according to their attributes. Table 1 shows the explanatory variables selected in the study and their descriptive statistical characteristics. There were a total of 1154 accidents recorded in three categories including 424 PDO accidents (36.7%), 456 injury accidents (39.5%), and 274 fatal accidents (23.7%). Considering the weather conditions, most accidents (85.1%) usually occur on sunny days (982 times); meantime a high level of accident severity being recognized as a fatal accident always occurs on sunny days (237 times). For visibility, the largest number of accidents occur in high visibility (200 + m) condition (968 times). Furthermore, a greater number of crashes occur on dry road surface (901 times).

Model formulation
The RPL model The discrete choice models may be suitable for modeling the injury severity level since it is a discrete outcome. 28,29 The MNL model is the basic model with a simple form and requires a lower sample size in discrete choice models. Some other types of logit models were evolved from the MNL models. The continuous variables can be directly incorporated into model calculations, while the discrete variables need to be preprocessed to adapt to the model. The dichotomous variables in the discrete variables can be coded by 0 or 1 directly. If the number of category is more than 2, we can introduce the dummy variables. When the severity category is I (I ø 3), the probability can be derived as follows: Where, P n (i) is the probability that a crash severity category i occurs in an observed crash n, U in is a linear function that determines the severity of the crash n. As usual, U in can be linearly formed by equation (2).
Where, X n is a vector of explanatory variables (risk factors) that affect the severity level, b i is a vector of estimable parameters, and e in is a disturbance term that takes into account the unobserved effects. Assumed that the error term e in are generalized extreme value distributed independently, a MNL model can be derived to estimate the probability of a crash severity category with the expression in equation (3).
In this case, the error terms (e in ) are supposed to be independent in each of the severity categories. The regression coefficient (b i ) in equation (3) can be estimated by the standard maximum likelihood estimation method. Before using the MNL models for analysis, it is necessary to set the reference category in advance. However, the MNL models was restricted by the independence of irrelevant alternatives (IIA) problem, that is, its hypothesis that there is no correlation between dependent variables. When one category of the dependent variable is added or removed, it will not affect the occurrence probability of other categories. But in practice, this assumption is usually not satisfied. If some severity categories share unobserved effects (i.e. have correlated disturbances), the most commonly used tests are the Hausman and McFadden (HM) test and the Small and Hsiao (SH) test.
In addition to the IIA assumption, the MNL models also assume that there are no difference between the sample individuals, that is, the effects of the variables in the models on the dependent variables are fixed. In order to solve MNL models' failure to consider the limitations of individual differences and IIA assumptions, the RPL models are widely applied in various fields, including in researches about traffic accident injury severity. By allowing the parameters of all or part of the independent variables to vary across the observations, the RPL model can explain the unobserved heterogeneity in the data, and will not be limited by the ''IIA'' assumption. According to random utility theory, the utility function that determines the injury severity of accident can be expressed as equation (2). 22 In order to capture the effects of unobserved heterogeneity due to randomness associated with some of the factors necessary to understand injury severity, the RPL model is generated as equation (4) by extending the MNL models.
Where P n i u j ð Þ is the probability of injury severity level i is the density function of b i and u is the parameter vector with known density function. The probability is calculated by weighted average for different values of b i across observations. Typically, some elements of b i are fixed and some are randomly distributed with specific statistical distribution. If the variance of u is statistically significant, the modeled injury severity levels vary with X across observations. The probability is a weighted average of different coefficient values b i in an accident, which may be fixed or random. If all parameters are randomly distributed, the weighted value is determined by the probability density function f (b i u j ). If all parameters are fixed distribution, the RPL model is equivalent to the standard MNL model. The regression coefficient in equation (4) can be estimated by simulating maximum likelihood estimation. Generally, the probability density functions of normal distribution, lognormal distribution, triangular distribution, and uniform distribution are considered. Previous studies have proved that the normal distribution is the most suitable one for analyzing accident injury severity data. 30,31 In this study, maximum likelihood estimation is performed through a simulation-based approach to overcome the computation complexity of estimating the parameters b i of the RPL model.

Model test and evaluation
Usually, it is necessary to test the validity of the model in a regression analysis. The most common methods are the likelihood ratio test or the Chi-square test, which can be calculated by equation (5).
Where, LL(0) is the initial value of the log-likelihood at zero, namely, the value of the log-likelihood when the model only includes a constant term that without independent variables; LL(b) is the convergence value of the log-likelihood function, namely, the value of the loglikelihood function when all significant independent variables and constant terms are included in the model; K is the number of parameters in the model, x 2 (K) is the chi-squared value (critical value) for a given significance level. If the x 2 is greater than x 2 (K), it indicates that the model with added parameters is better than the model with constant terms only, which means that the model is effective. The Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and McFadden's Pseudo R 2 are usually used to evaluate the fitness of a theoretical model. The calculation methods of each evaluation index are shown in the following formulas: Where N is the number of crashes. The smaller the value of the AIC and BIC indicators, the higher the model fitness. The McFadden's Pseudo R 2 value ranges from 0 to 1, which indicates that the model get better fitness when the value is closer to 1.

Interpretation of parameters
The regression parameters obtained by using the MNL logit model can only qualitatively but not quantitatively reflect the impact of significant independent variables on crash severity. It means that the model parameters cannot be interpreted directly. In order to paraphrase the impact of each independent variable on the injury severity, the normally derivatives of the injury severity probabilities are calculated. When giving a change in an observed factor, X nm , the change of the probability that crash n is severity i can be calculated as equation (9).
If the explanatory variable is dichotomous, we must introduce pseudo-elasticity to assess the effect of individual parameter estimates on crash severity probabilities. The direct pseudo-elasticities can be computed as equation (10).
Where, E P n (i) X nm is the direct pseudo-elasticity of the mth variable from the vector X n , I is the number of possible severity categories, ASC i + b i X n is the value of the function determining the severity category when X nm equals zero, and D ASC i + b i X n ð Þis the value after X nm has been changed from zero to one.

Model parameter estimation
The insignificant variables were excluded successively with the significance level of 0.1 in each utility function. According to the testing results for multicollinearity with Variance Inflation Factors (VIF), the remaining variables were all treated as independent variables. The parameters were estimated by taking the PDO crash as a baseline item. At the same time, a reference item was specified for each multi-classification explanatory variables. The specific reference item was marked and listed in Table 1. In order to obtain more accurate calibration results, the utility functions for different levels of accident severity were estimated. To test the fitness of the RPL model, a fixed parameter MNL model was constructed for comparison. The likelihood ratio statistics were constructed to test whether it was significant or not.
The x 2 value of the MNL model was 1113.256, which was greater than the critical value of 73.683 (taking the significance level of 0.01). The RPL model was constructed and the significance of the model was examined. The utility function was consistent with the MNL model, but some of the fixed parameters were changed into random parameters. The x 2 value of the RPL model was 1903.308, which was greater than the critical value of 78.616 (taking the significance level of 0.01). A total of four variables with random parameters were identified through the RPL model, that was why the freedom of the RPL model was 52 but 48 for the MNL model. The log-likelihood with constant only of the two models were the same, but the log-likelihood convergence value increased from 21438.453 of the MNL model to 21043.427 of the RPL model. The McFadden R 2 value constructed by log-likelihood convergence value and log-likelihood value with only constant term was 0.477, which was higher than 0.279 of MNL model. It indicated that the fitness of RPL model was better than MNL model, which was consistent with previous studies on traffic safety analysis. The results of parameter estimation and model comparison were summarized in Table 2, all the parameters included in the model were statistically significant at a 0.10 significance level or higher. The tabulated values outside the parentheses were the posterior means of parameters, and those inside the parentheses were their posterior standard deviations. Although the effective parameters of the traditional MNL model was less than the RPL model, the RPL model still showed a low AIC value (2190.854) (AIC of the MNL model was 2972.906). The difference of AIC was considered to be substantial, which indicated that the RPL model was more desirable than the MNL model.
As shown in the last three rows of Table 2, the superiority of the RPL model in the fitness was further confirmed by its higher classification accuracy for each severity level and the entire dataset. We identified some significant variables that contributed to injury and fatal crashes in the MNL model. Further comparison between the two models revealed that they were still significant in the RPL model, and their coefficients were relatively similar in the two models. It gave support to the consistency between the two models to some extent.
Parameters producing statistically significant standard errors for their assumed distribution were found to be random, which were: the indicator variables for driver gender (female), responsible vehicle type (large vehicle) for injury crashes, and the indicator variables for driving experience (10 + years), lighting conditions (night without light) for fatal crashes. When the estimated standard errors were not statistically different from zero, the parameters were fixed across the observations. Normal distribution appeared to provide the best statistical fitness for these random parameters. The mean and standard deviations were shown in Table 2. The average elastic coefficients of the significant factors in the RPL model were list in Table 3.

Interpretation of the results
The significant variables for each severity outcome were discussed next. They were categorized into five parts.
Effects of driver factors. Regarding the effects of the driver's gender, the parameter of the indicator variable was found to be significant and random across injury crashes but fixed for fatal crashes. The female drivers were more likely to result in a fatal crash outcome, but for injury crashes, the parameters of driver's gender were normally distributed with a mean of 1.82 and a standard deviation of 1.23. This means that for the majority (93.1%) of the expressway crashes, a female driver would increase the likelihood of an injury outcome, while for 6.9% of the crashes, a female driver would decrease the likelihood of an injury outcome. The probability of injury accidents increased by 14.4%, and the probability of fatal accidents increased by 15.7%. This random effect is likely because female drivers are weaker than male drivers for their inherent personality traits and psychological characteristics. 21,32 Female drivers' psychological endurance, emergency handling ability in the face of crisis, and strain ability are lower than those of male drivers, and their judgment of distance and speed is also less accurate, which results in higher casualty rates of accidents than male drivers. 19,33 The impact of driver's age on the severity of the accident showed a trend of polarization. Compared with 26-40 years, the age of 19-25 years was negatively correlated with accident severity, and the age of 41 + years was positively correlated with accident severity. On the other hand, the accidents with drivers' age of 19-25 years were less likely to result in injury or fatal outcomes, which was 1.1% for injury and 0.3% for fatal crashes. The drivers aged 41 + years would increase the probability by 1.0% for injury and 1.3% for fatal outcomes. That might be due to the risk compensation effects. 34 The unfamiliar with the vehicle, the road situation and the lack of driving experience of the younger drivers would lead to their more cautious driving behaviors, which reduced the crash severity. Regarding the impact of driver age on the severity of accidents, relevant studies have not drawn unanimously recognized conclusions. [35][36][37] With the increase of age, most drivers' physical function and health would not be as good as younger drivers, and they were more likely to sustain a serious physical injury in the accident. In addition, the drivers' reaction time also increases with age. In the high-speed driving environment, traffic accidents are more likely to cause serious casualties. 15 Regarding the years of driving experience, the probability of injury and fatal outcomes were both reduced for driving experience of 22 years and driving experience of 10 + years. For injury level, the reductions were 7.7% and 13.0%. And for fatal level, the reductions were 9.9% and 17.7%. The parameters of driving experience 10 + years in the utility function of fatal level were random. For 89.5% of accidents with drivers' driving experience 10 + years, the probability of fatal level was lower than that with drivers' driving experience of 102 years. The potential reasons are as follows: (a) Drivers with driving experience of 22 years tend to strictly follow the speed limit on expressway, so that their driving speeds are relatively lower. 36 In addition, they concentrate highly on driving task, which gives rise to lower severity of accidents. (b) For drivers with much more experience, the driving skills and risk perception are higher than those with less experience, so the accident severity is also relatively lower. In short, the drivers with medium driving age (3-10 years) will generally cause more serious outcomes than other Table 2. Coefficients (and standard errors) of the logit regression models for injury severity outcomes.  drivers because of their overconfidence in driving skills and risk perception.
Effects of vehicle factors. The indicators of the collision with a rigid barrier or other facilities tend to increase the probability of more severe injuries since the parameters were positive. The probability of injury accidents in the collision with a rigid barrier and other facilities increased by 5.1% and 3.0%, respectively. The probability of fatal crashes increased by 6.8% and 5.4%, respectively. Compared with vehicles, the stiffness of the guardrails or other rigid structures was larger, the energy absorbed by the collision was less, so the vehicles sustain more damage by the impact. When the drivers take some emergency actions, the vehicles may turn sharply to increase the possibility of continuous collisions, and the probability of casualties tend to be higher. The model results also suggested that the type of responsible vehicle was statistically significantly associated with injury severities. Compared with small vehicles, middle vehicles are negatively correlated with the accident injury severity, while large vehicles are positively correlated. The probability of injury and fatal of middle vehicles decreased by 0.5% and 2.5% respectively, and that of large vehicles increased by 14.8% and 16.8% respectively. The coefficients of responsible vehicle type were specified with a random parameter that obey normal distribution, the mean value was 1.44, and the standard deviation was 1.05. From the probability density diagram of large vehicle parameters, it was found that 91.5% of large vehicles have higher probability of injury than small vehicles. It was consistent with past researches. 38,39 The possible reason is that large vehicles have large self-weight, high center of gravity, more blind areas of sight, and they have internal wheel difference and long braking distance when turning, which lead to high probability of heavy truck accidents and more serious outcomes. These large vehicles have stronger collision aggression, which can bring greater harm to other vehicles that involved in the collisions.
Compared with the ''going straight'' state, the indicator of ''not going straight'' state of vehicles were negatively correlated with the severity of crashes, and the decrease rates of injury and fatal were 4.4% and 3.8%. The possible underlying reason is that the driver in the ''not going straight'' state is usually in a high alert state. This can be attributed to the risk compensation effects mentioned before.
Effects of road conditions. The road surface conditions and terrain type were found to affect severity outcomes. Compared with the dry road surface, the wet-skid pavement was more prone to injury accidents, and the average probability of injury accidents increased by 4.9%. It was consistent with some previous studies where normal road surface condition was found to provoke more severe accidents. 39,40 The potential reason is that under the condition of wet-skid pavement, although the vehicle speed generally decreases, the vehicle braking distance increases significantly, thereby increasing the probability of injury accidents. Interestingly, compared with the flat road, the model results showed that the rolling and/or mountainous sections make the probability of fatal outcomes decreased by 5.1%. This result is reasonable, and the driver is likely to be more cautious in the mountainous sections. From the recorded accident data, the mountain accident accounts for only 2.9% of the total number of accidents. For each 1% increase in radius of horizontal curve, the probability of injury and fatal was expected to increase by 1.9% and 2.9%, respectively. For each 1% increase in the length of horizontal curve, the probability of injury and fatal was expected to increase by 1.2% and 5.4%, respectively. The results also showed that the probability of injury and fatal was expected to increase by 4.7% and 5.2%, respectively, for each 1% increase in vertical slope compared with the flat sections. This is also consistent with previous research results. 16, 41 As the above cited works pointed out, steeper slope will shorten the line of sight distance, thereby urging the driver to take appropriate action in response to the upcoming traffic accident time.
Effects of environmental conditions. When the crash happened at night without light, the probability of injury accidents increased by 4.0% and the probability of fatal accidents increased by 15.1%. The parameters of ''night without light'' factor in fatal outcomes were random, and the parameter distribution showed that in the case of 82.2%, the probability of fatal outcomes caused by night without light factors was higher than that in the daylight. It can be seen that the indicator of ''night without light'' significantly increased the probability of fatal outcomes. The potential reason is that the driver only relies on vehicle lights and reflective signs to drive in night without light conditions, which is difficult to meet the requirements of safe driving on expressways. The probability of rear vehicle rear-end collision or side collision increases without timely detection of the presence of front vehicles. In dark conditions, the probability of fatigue driving and inattention increases significantly, and at night it is more likely to appear faster, which aggravates the severity of the accident. In 17.8% of the accidents, the probability of fatal outcomes at night without light was lower than that in daylight, which might be due to the increased driving vigilance of these drivers in the absence of lighting at night and the lower speed of vehicles, which reduces the severity of the accident. When the visibility was 1002 m, the accident rate decreased and casualties decreased. The probability of injury accidents decreased by 8.7%, and the probability of fatal accidents decreased by 4.8%. The possible reason is that when visibility is very low, the vehicle speed decreases significantly. Due to the decrease of vehicle speed, it is not easy to cause serious accidents. The driver is more cautious and the probability of accidents is also reduced. When the visibility was between 100 and 200 m, the probability of injury and fatal accidents decreased by 6.6% and 9.3%, respectively. At the same time, the probability of PDO accidents increased by 5.1%. The potential reason is that when the visibility on the expressway decreases, the driver will generally improve the vigilance and then decelerate, thereby reducing the accident severity. However, on account of the obstruction of environment, the overall accident risk still increases.
Interaction among influencing factors. The coefficient and variance-covariance matrix between random parameters were shown in Table 4. The data outside the parenthesis was the random parameter correlation coefficient, and the data inside the parenthesis was the random parameter distribution variance-covariance.
There was a significant interaction between the indicators of ''large vehicle'' and driving experience for ''10 + years,'' driving experience for ''10 + years'' and ''night without light,'' ''female'' drivers and ''night without light,'' ''night without light'' and ''large vehicle.'' That is, the two factors jointly affect the severity outcomes. The correlation coefficient between the indicators of ''night without light'' and ''large vehicle'' was 0.821, that is, the impact of large vehicle on accidents was higher than that of night without light or large vehicle alone. Night without light conditions will lead to high incidence of accidents, and the characteristics of large vehicle are more likely to lead to serious accidents. At the same time, the correlation between the indicators of ''night without light'' and ''large vehicle'' was the strongest in the four groups. The correlation coefficient between female drivers and night without light was also reaching 0.773, and there was a significant interaction between the two factors. The risk of accidents of female drivers in night without light conditions was higher than that only one of the two factors in the same conditions. Due to the weak psychological quality of female drivers at night, it may be more difficult for them to make timely and accurate decisions. The correlation coefficient between large vehicle and driving experience for 10 + years was 0.318. From the results of elasticity analysis, it can be concluded that driving experience for 10 + years would reduce the severity of accidents, while large vehicle would increase the severity of accidents. From the correlation coefficient of the two factors, when the two factors appear at the same time, the increased risk of large vehicle offsets the risk of driving experience reducing, which was not conducive to traffic safety improvement. However, the correlation coefficient between night without light and driving experience 10 + years was 20.172, indicating that the combined benefit completely offsets the risk caused by night without light, and the driving environment at this time was conducive to traffic safety.

Safety countermeasures
From the parameter estimation and elastic analysis results of the RPL model, it can be seen that several factors have great impact on injury and fatal accidents, including the indicators of driving experience of 22 years, vehicle collided with a rigid barrier, large vehicle as responsible, visibility of 1002 m and so on. Therefore, such risk factors should be paid greater/ more attention when formulating road traffic safety improvement measures. Drivers with driving experience of 22 years should be treated as novice drivers. They just lack more experience for emergency response in high speed driving process, and are easy to be panic in the face of dangerous conditions, which increases the driving risk. But they are more careful than experienced drivers when driving on expressway. 36 Therefore, it is necessary to conform the safety education and training of drivers, especially for female drivers with 3-10 years' experience. For older drivers, it is needed to strengthen the examination of their physical health status. And also, enough attention should be paid to the training of psychological quality, the training of emergency response measures, and the improvement of driving standardization.
When the vehicle collides with a rigid barrier, the damage caused by the strong rigidity of the guardrail and other structures to the vehicle is serious, especially at night and when the lighting conditions are not good, which can make the collision frequency increased. Therefore, when there are rigid obstacles such as piers in the middle zone or the lateral clearance range of the expressway, safety protection, or energy absorption facilities should be added to reduce the severity of the accident after the vehicle collision. When the type of the guardrail in the middle zone is selected, the semi-rigid guardrail should be selected as far as possible to meet the requirements of the protection level. The passive safety protection facilities can be added to the accidentprone sections where the concrete guardrail has been set and installation of reflective film to improve visibility, and reasonable arrangement of isolation facilities; lighting conditions should be improved for accidentprone sections without lighting.
Due to the high center of gravity, heavyweight, and less sensitive braking performance of large vehicles, accidents may occur with a high probability of fatal outcomes. Therefore, the appropriate value and mode of speed limit should be set according to the operation time and characteristics of large trucks and striking speed limit warning signs. Auxiliary stability devices should be installed on large vehicles when conditions are available to strengthen the stability of vehicle operation, strengthen the detection of vehicle airbags and the detection of driver fatigue driving, and improve drivers' safety awareness such as roadside signs, vehicle broadcasting, and increase the wearing rate of drivers' seatbelt through various transmission channels. The seatbelt can also reduce the severity of accidents to some extent when accidents occur.
When the visibility of the expressway decreases or the road surface is wet and slippery, the road conditions and vehicle spacing ahead will be misjudged. Therefore, it is necessary to strengthen the reflective level of road signs at low visibility, enhance the reflective level of vehicle rear reflective signs, and install dynamic variable speed limit signs and warnings to further limit vehicle speed, improve drivers' vigilance and reduce the severity of accidents. With regard to vehicle design, efforts can be made to reduce the collision aggression of trucks and other heavy vehicles. The infrastructure for the expressway should be designed to minimize or reduce the use of steep slopes.

Conclusion and limitation
The present study established a RPL model to analyze the influencing factors of expressway traffic accident severity in China. The dependent variable of the model is crash severity that was categorized into three levels: PDO, Injury, Fatal. The results showed that compared with the traditional MNL model, the RPL model can reveal the effect of various factors on crash severity more reasonably. In addition, since the RPL model can analyze the unobserved heterogeneity and the interaction of various factors on the severity of the accident, it has better fitness and wider application prospect.
The factors affecting the severity of the accident are systematically analyzed from the aspects of environment and road conditions, drivers, and vehicle conditions. The results showed that: (a) When the collision barrier and pier, female drivers, and drivers aged 40 + years are more likely to cause injuries and deaths; (b) When visibility is 2002 m, driver's driving experience of 22 years or 10 + years, large vehicle responsible and the vehicle ''not going straight,'' the probability of PDO accidents increases, while the probability of injury and fatal accidents decreases; (c) The probability of injury accidents increases under the condition of wet-skid road surface, while the concrete guardrail and ''night without light'' conditions are more likely to lead to fatal outcomes; (d) When large vehicles run on the road section without lighting facilities at night, if there is a traffic accident, the severity of the accident is generally more serious.
There are some limitations in this study. We only analyzed the factors that can be collected, but there were still many factors not well collected or that have a significant impact on the accident but were unknown, such as instantaneous speed, real-time weather, road design elements and whether the seatbelts were used. Different sample sizes will affect the parameter estimation of the model. This study does not group the samples in depth, and statistics in the sample sizes applicable to different models may also lead to a certain deviation in the calibration results. In the followup study, the influence of different sample sizes on parameter estimation can be explored in depth, and sample grouping can be carried out to obtain better fitting results. In addition to the RPL model constructed in this paper, there are also other models to analyze the accident severity, such as the nested logit model and the ordered logit model. Comprehensive comparative analysis of the practicability of each model is also the direction of subsequent research.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.