IA2P: Intrusion-Tolerant Malicious Data Injection Attack Analysis and Processing in Traffic Flow Data Collection Based on VANETs

Several studies investigating data validity and security against malicious data injection attacks in vehicular ad hoc networks (VANETs) have focused on trust establishment based on cryptology. However, the current researching suffers from two problems: (P1) it is difficult to distinguish an authorized attacker from other participators; (P2) the large scale of the system and high mobility set up an obstacle in key distribution with a security-based approach. In this paper, we develop a data-centric trust mechanism based on traffic flow theory expanding the notion of trust from intrusion-rejecting to intrusion-tolerant. First, we use catastrophe theory to describe traffic flow according to noncontinuous, catastrophic characteristics. Next, we propose an intrusion-tolerant security algorithm to protect traffic flow data collection in VANETs from malicious data injection attacks, that is, IA2P, without any security codes or authentication. Finally, we simulate two kinds of malicious data injection attack scenarios and evaluate IA2P based on real traffic flow data from Zhongshan Road in Dalian, China, over 24 hours. Evaluation results show that our method can achieve a 94% recognition rate in the majority of cases.


Introduction
VANETs are emerging as an effective new tool to monitor the physical world [1].They gather traffic flow data (GPS, speed measurements, etc.) from sensor platforms in vehicles and relay these data via vehicle-to-vehicle (V2V) and vehicleto-infrastructure (V2I) communication.With advances in wireless communication and sensing, VANETs can be used to solve installation and maintenance problems caused by traditional traffic monitoring infrastructure, such as loop detectors, cameras, and radar.Therefore, more and more studies have suggested expanding traditional traffic monitoring infrastructure to gather the traffic flow data with VANETs in Intelligent Transportation Systems (ITS).
VANETs are data-based networks [2], in which data quality and security are paramount.In VANETs scenarios, each participating vehicle or fixed roadside infrastructure is transformed into a wireless message transmitter or sensing terminal.Some studies indicate that the security of data could seriously influence the performance of VANETs in practice [3,4].In these security attacks, the malicious data injection attack can harm intelligent traffic systems.By injection attack, the malicious data are injected into VANETs and disrupt ITS applications.For example, sending false traffic flow data to emulate traffic jams or accidents may disrupt traffic signal control systems.This could increase accidents, compromising safety.
Conventional approaches against injection attacks are apt to adopt the traditional notions of trust.A variety of research contributions are based on designing cryptographic solutions to offer both Trusted Authority (TA) and Message Confidentiality (MC) for VANETs' applications.To use a cipher for TA or MC, every participator (vehicular or fixed roadside infrastructure) requires some kind of a shared secret, providing various methods of secret key distribution [5,6].However, these researches are suffering from the following problems: (1) it is difficult to distinguish an authorized attacker from other participators and (2) the large scale of the system and high mobility set up an obstacle in key distribution with a security-based approach.
2 International Journal of Distributed Sensor Networks Some studies have addressed data-based security mechanisms against the malicious data injection attacks in new ways [7,8], which is more efficient in the fields of data-based VANETs applications, such as traffic congestion detection and traffic route guidance.The data-based security mechanisms are more resilient to attacks, coming quickly to the correct decision.However, these studies focus mainly on establishing frameworks for data-centric trust rather than linking data characteristics.It is well known that the characteristics of traffic flow data are distinct and regular.Regardless of data characteristics, data-based trust mechanisms are insufficient and impractical against injection attacks.
In this paper, we used traffic flow characteristics to develop an intrusion-tolerant security mechanism to protect traffic flow data collection in VANETs against injection attacks.This security mechanism can be applied in most databased VANET scenarios.Our study is innovative because we (1) develop an intrusion-tolerant security mechanism against injection attacks without security codes or authentication, IA 2 P, and this extends the notion of security from intrusion-rejecting to intrusiontolerant, and, therefore, this approach is more useful in practice than traditional trust establishment based on cryptology; (2) expand cusp catastrophe theory to analyze traffic flow data profiling and this is more suitable for traffic flow data characteristics in most traffic scenarios, allowing for effective analysis of injection attacker's activities; (3) integrate batch estimation filters with coefficient selfadjustment to meet traffic flow time-varying volatility in order to generalize injection attack analysis and processing.
The rest of this paper is organized as follows.Section 2 presents the basic principles of malicious data injection attacks and the model of traffic flow data based on catastrophe theory.Section 3 proposes the IA 2 P mechanism for the injection attack analysis and processing and then improves it with the batch estimation filter for generalization.Section 4 demonstrates the performance of IA 2 P through simulation.Section 5 focuses on related work.

Related Works and Problem Statement
Security studies have produced rich literature in VANETs.As with other applications in DSRC, mobile ad hoc network (MANET) and Peer-to-Peer (P2P), notions of security in VANET are mainly to build trust mechanisms against injection attacks.
Most state-of-the-art studies have focused on designing cryptographic solutions to offer both Trusted Authority and Message Confidentiality and thus protect VANET applications against malicious data injection attacks.These approaches have mainly considered two cases: certification and routing.For example, Lu et al. [6] proposed Trusted Authority with the authenticated recognition to each vehicle in VANETs.Sun et al. [9] proposed an identity-based security system by cryptography to VANETs.Wasef et al. [10] and Schoch et al. [11] proposed a scheme to complement the public key infrastructure to secure VANETs.
Meanwhile, Raya et al. [12] proposed data-centric security mechanisms for data-based trust establishment in ad hoc networks.They concluded that data-based trust mechanisms are more simple and practical than cryptographic trust mechanisms.Furthermore Aslam et al. [13] presented two approaches for reliable traffic information propagation: twodirectional data verification and time-based data verification.With these two types of verification, traffic messages are sent through two (spatially or temporally spaced) channels.The recipient verifies message integrity by checking whether data received from both channels match.
In VNETs, Wu et al. [7] proposed a Roadside-Unit Aided Trust Establishment (RATE) scheme to execute datacentric trust establishment.And Mazilu et al. [8] designed a data-trust security model designed for VANETs, based on social network theories, to compute a trust index for each message according to the relevance of the event, such as traffic congestion and safety warnings.Based on them, Sha et al. [14] proposed RD4, a data-detection and filtering mechanism, to detect false data in VANETs.They focused on false data generated from the unreliable components and untrustworthy data sources.
For literature in other fields, Liu et al. [15] proposed a theoretical model based on data characters to analyze false data injection attacks in the field of electric power state estimation.Roy et al. [16] present a verification algorithm to determine whether the aggregate includes any false data, which are used in wireless senor networks (WSNs).
In conclusion, our study found that the data-based security mechanism we developed is proficient in identifying false data generated by injection attacks.According to tests and simulations of [15,16], protection against malicious data injection attacks is more efficient if the characteristics and disciplines of the data are considered.However, databased security mechanisms considering the characteristics and disciplines of traffic flow data in VANETs against the injection attack were not detected.

Malicious Data Injection Attacks.
VANETs are complex systems connecting vehicle-to-vehicle and vehicle-toroadside infrastructures through transmission and distribution networks across local geographical area.As long as they have legal authority, the malicious vehicles can send messages or data to other vehicles, whether these are unreal or illegal.They can also modify other legal messages or data as relay nodes receiving and transferring from their neighbor nodes [17].
For example, as shown in Figure 1, a vehicle A sends a "Road clear" message to a malicious vehicle B (attacker) and B alters the message as "Traffic jam ahead" and sends it to a legitimate vehicle C. C transfers it to vehicle D. C and D will be affected by this message since they will change the road and be in trouble later on.
Unfortunately, most existing trust mechanisms cannot identify these attacker's illegal activities.Since they are authorized, attackers are often able to bypass safeguards.There are two types of data injection attacks: random false data injection attacks and targeted false data injection attacks [19].The aim of a random false data injection attack is to find any attack vector that can result in a wrong estimation of state variables.The aim of a targeted false data injection attack is to find an attack vector to inject a specific error into certain monitoring variables.
The attacker chooses any nonzero arbitrary vector as the attack target and then constructs malicious measurements.However, a traditional bad measurement detection approach cannot detect them.For example, an approach based on a 2norm of the measurement residual is bypassed because the data appear to be valid.Fortunately, these data do not accord with traffic flow features, especially when they are analyzed with the multidimensional model.
Following are definitions based on descriptions of injection attacks [3,4].
Definition 1 (malicious data).Malicious data is invalid data injected by an injection attacker.It can be divided into two categories: multirepeat data and fake data.(a) Multirepeat data (MRD) is copied directly from valid data and injected regularly into VANETs.Although appearing to be valid, the values of this data are fixed and constant.(b) Fake data (FD) is falsified from valid data or randomly generated.It is unfixed and variable outside of traffic flow laws.

Definition 2 (VANETs Participant ID (VP ID)). A VANETs
Participant ID (VP ID) is a unique identification of each participant in VANETs and does not require special authorization.It can use a MAC or IP address.VANETs Participants include vehicles or Roadside Units, which compose VANETs and exchange messages.In this paper, one selected MAC as the VP ID.

Definition 3 (Injection Attacker List (𝑂 list )). An Injection
Attacker List is a record of the attacker's VP ID, which is stored in each VANETs Participant, meaning that the participant holding this VP ID is an injection attacker and has sent malicious data to the  list owner before. list of each VANETs Participant may be different.

Problem Formulation of Malicious Data Injection Attack
Based on Catastrophe Theory.Nonlinearity is an inherent property of the traffic flow [20].Gazis et al. improved nonlinear follow-the-leader models to describe the traffic flow in 1961 [21], which attracted the researching attention from then on.With the rapid development of information technology, more and more traffic flow data are collected by installing sensors (usually double induction loop detectors) along the road that measure flux and speed at a certain location.The nonlinearity of traffic flow has been proven, and more nonlinear theory and model, such as the fluiddynamical model [22], are improved to describe the traffic flow.
Catastrophe theory is used to explain the natural and social phenomena that occur in the process of discontinuous changes and analyze the noncontinuous characteristics near the critical point.Navin [23] proposed a cusp catastrophe traffic model to explain sudden changes in traffic flow.Hall and others later demonstrated that traffic flow fits the cusp catastrophe surface [24][25][26][27].According to basic cusp catastrophe theory, the total potential energy function of traffic flow ((V)) is as follows: Here, V is the vehicle speed, representing the state variable of .As the control variables of ,  and  are traffic volume and occupancy, respectively.Parameters , , and  are coefficients.In our algorithm, these coefficients will be given which will be described in Section 3.
Based on (1), the manifold function and the bifurcate equation of cusp catastrophe are Based on (2), the relationship of V, , and  is developed.Let  represent the original measurements collected from VANETs, where  = (V, , ).To describe these measurements and represent their relationships, we define the Catastrophe Vector.
Definition 4 (Catastrophe Vector).The Catastrophe Vector (CV) is used to describe the traffic flow measurement with the cusp catastrophe model.The CV of measurement  is as follows, where  = /,  = /, and , , and  are the coefficients of (2): Based on CV, an injection detection model of traffic flow data can be proposed as Here, ℎ is the coefficient, whose value is suggested in (1-1.05)since the error tolerance limit of traffic flow data is ±5%, according to [18].And this error tolerance limit is still used in some popular traffic signal control system, such as SCOOT and SCATS.As we know, the analysis method of the traffic flow data validity in these systems is to detect whether the change in the adjacent data from the same source is within the threshold range, which is similar in [18].So, these are the reasons that we adopt ±5% as the threshold of IA 2 P.
International Journal of Distributed Sensor Networks

Evaluation Function of Malicious Data Injection Attacks.
Malicious data ,  = (V  ,   ,   ), is used for injection into .Let   be the vector of observed measurements, where  has been injected into .
According to the model of (4), each observed measurement   can generate ().Then () can be projected at the two-axis Cartesian coordinates and regarded as a vector.Therefore, the evaluation function of injection attack is defined: Based on this evaluation function, the conclusions of malicious data injection attack are as follows.

Conclusion 1.
The measurement   is clean without injection attack, when Conclusion 2.   is false data added by injection attacks, when Here,  is the threshold of injection detection.The value of  is given based on fluctuation of true traffic flow data : Here, ℎ is the coefficient in (4), and V, , and  are the effective value according to the history data of the valid measurement .For example, they could be expressed by the mean value of the valid measurements.

Malicious Data Injection Attack Analysis and Processing
In this section, we proposed a new malicious data injection attack analysis and processing algorithm, IA 2 P. Firstly, we introduced how to self-adjust coefficients in the cusp catastrophe model based on batch estimation filter to make IA 2 P more practical in most traffic scenarios.Based on it, we described the theory and procedure of the IA 2 P algorithm.

Coefficients Self-Adaption Based on Batch Estimation
Filter.The traffic flow model based on the catastrophe theory can describe the character of traffic flow.However, the traffic flow characteristics for each traffic scenario differ since roadbed construction, transportation infrastructure, and traffic signal patterns are distinct.These can influence variation in traffic flow model coefficients.In fact, adjusting the parameters manually for each traffic scenario does not work in this case.This uncertainty affects injection attack analysis and detection in VANETs.
To solve this problem and generalize injection attack detection, the batch estimation filter can be adopted to actively learn the coefficients of (4) online.According to the valid measurements judged by (4),  and  are calculated with each CV.Equation ( 9) is given: Here  −1 and  −1 and  −2 and  −2 are the parameters of CV( −1 ) and CV( −2 ), respectively.  and   are the parameters of CV(  ) = 0, if the measurement   provides good data.Based on (5), let  =   , which can be adapted based on  −1 and  −2 , and  =   , which can be adapted based on  −1 and  −2 .
It is notable that   and   may be more than one, so that results are calculated in CV(  ) = 0 based on   .The one, which deviates to   and   and is the least, should be selected and used in (9).

Procedures for Malicious Data Injection Attack Analysis and Processing.
Based on the evaluation function of malicious data injection attack and the coefficient self-adaption, we propose a generalized method of the injection attack analysis and processing algorithm, that is, IA 2 P, as shown in Figure 2.
The algorithm is comprised of 4 parts: traffic collection, injection attack analysis, coefficient adaption, and injection attack processing.
(a) Each observed measurement   (V, , ) is collected in traffic collection.When   is collected it is first checked for the first type of malicious data, MRD, through comparison with adjacent data from the same vehicle.If it is fixed and constant, it is discarded as MRD.Then the other types of data are transformed to CV(  ) and sent to injection attack analysis.
(b) In injection attack analysis, CV(  ) is evaluated by the evaluation function (5).Malicious data can be detected, which may be either MRD or FD.
(c)  can be sent to coefficient adaption for adaption to the model's coefficients to fit the variation in local traffic flow.
(d) Malicious data are then sent for injection attack processing so that, for example, the traffic collection portion can add the attacker's ID to  list , preventing further attacks.
Furthermore, we propose a state machine for IA 2 P, as shown in Figure 3. Considering the characteristics of traffic flow monitored in VANETs, the state machine should run a long time and process continuously.Therefore, the state machine is not arranged for the end state.In actual operations, the system must be stopped and restarted manually.detectors on the Zhongshan Road in Dalian, China.Table 1 shows the details of these detectors.The archives contain traffic volume, speed, and occupancy measurements from 12:00 a.m. to 12:00 p.m. on May 7, 2010.

Loop sensor detector
Based on these archives, we reinstated traffic flow using VISSIM, a type of simulation software.Figure 4 displays a simulation of the traffic scene.
In the simulation scene, a Roadside Unit (singed as A) is placed at the middle of the road to collect the traffic flow data according to passing vehicles, which are arranged according to actual traffic data sets.Participants in VANETs, vehicles, and the Roadside Unit are linked by Dedicated Short Range Communications (DSRC) and the communication distance is As the simulation running, the information of vehicles and loop sensor detectors is recorded in special files of NS3.The Roadside Unit extracts the data from these files and composes the traffic flow data sets.The whole processing of achieving the data sets is as follows.
When a sensor data package of a vehicle is achieved, Roadside Unit A identifies the vehicle's position according to the value of position in the package and picks up the value of speed (V) from the package.Meanwhile Roadside Unit A reads the volume () and occupancy () from the corresponding loop sensor detector.As a result, it records them as (V, , ).So, the new traffic flow data sets (V, , ) are achieved.(V, , ) will be converted into CV().And the Catastrophe Vector sets are formed according to data series of CV().
In our experiments, we focus on simulating and analyzing the performance of IA 2 P, so the delay of communication and multihop communication pattern are not considered in this paper although these factors could influence the data set building.

Analysis for Traffic Flow Character Based on the Cusp
Catastrophe Model.This section focuses on the character analysis of traffic flow data based on the cusp catastrophe model.This is the theoretical basis of injection attack analysis and processing in this paper.
According to the aforementioned processing of the data sets, the traffic flow data are collected by the Roadside Unit from vehicles and loop detectors.The means of a, b, c, and d are shown in Figure 5.It is evident that the traffic flow character is nonlinear and a catastrophe.
Figure 6 displays a diagram of the speed-volume.It also displays the cusp catastrophe of traffic flow data.One traffic volume value is versus two speed values (V, V  ), which means that (V, , ) is collected when the vehicle is in the uncongested traffic flow state.The other (V  ,   , ) is collected when the vehicle is in the congested traffic flow state.The alteration of traffic flow from the uncongested to the congested state is not a gradual process, but an instant jump or catastrophe.As a result, according to the data sets, CV and the evaluation function based on the cusp catastrophe model are fitted.

Analysis for Injection Attack Detecting and Processing.
In this subsection, we mainly analyze the performance of IA 2 P.
Because we focus on detecting malicious data injected, we especially compare IA 2 P with the method proposed in [18].
According to the definition of malicious data, we manually alter the data set shown in Figure 5. 1438 data values are picked up.30 of them are altered to be MRD, and 70 are altered to be FD.So a new data set with malicious data is built, named collection with malicious data set, CMD set.
To be guaranteed that the picking method of MRD set and FD set would not affect the performance of IA 2 P in simulations, we pick them randomly and repeat this process 100 times.At last, we build 100 data sets with malicious data.
IA 2 P is proposed to mainly recognize and process malicious data injected by attacker in VANETs.Using the CMD set, IA 2 P is performed, and the results of one data set with malicious data are shown in Figure 7.There are three integer values predefined to represent the kind of data distinguished by IA 2 P. Output = 1 represents the fact that the data is valid; output = 2 states that the data is MRD; output = 3 means that the data is FD.
We repeat this process 100 times according to 100 data sets with malicious data.The performance analysis results of 100 data sets are shown in Table 2.The mean recognition rate of MRD is 83.33%.And the mean recognition rate of FD is 100%, while 6 valid data values are recognized as the FD by mistake.As a result, the mean recognition rate of the malicious data is 95%.
Because of lack of the similar researches to detect the injection attacks from the view of the traffic flow theory, we compare the performance of IA 2 P with the method proposed in [18] which was used to analyze the accuracy of measure data.Ki et al. [18] use the method with the filter to process traffic data.Based on the traffic flow theory, the data was recognized bad data if the data deviation was more than 5% with the former one collected.
Similarly, the CMD set is used, and two integer values were predefined as output to identify whether the data is valid or not; the results of the same set as Figure 7 4.This subsection covers verification of the coefficient selfadjustment of generalized IA 2 P, based on the batch estimation filter.Here, we focus on  and  in (9) since their values shift with traffic flow patterns.In practice, it is an impossible mission to manually set and adjust the coefficients of IA 2 P Table 4: The details of malicious data identified in [18].

Valid data but recognized as malicious data
Malicious data MRD FD CMD set 0 30 70 Result of [18] 131 6 55 for each traffic flow pattern.As the simulations proceeded, we found that this factor could influence IA 2 P's performance.Based on the above analysis, we found that it was necessary to carry out this procedure.First, we set the initialization of (, ) and then recorded their value after each selfadjustment with the IA 2 P running.Results are shown in Figure 9.It is evident that (, ) gradually trends towards a steady state.

Conclusion and Future Work
VANETs are just like an Achilles heel.On one hand, VANETs are considered as a more efficient and convenient method to collect the traffic flow data for ITS application, comparing with the traditional methods.On the other hand, considering the security, a formidable set of abuses and attacks becomes possible and harmful for VANETs, because their networks are wirelessly accessed and exoteric for each participant.
In this paper, we firstly identify a previously unknown vulnerability in the current techniques aimed at security establishment against the malicious data injection attack in VANETs.Then, we investigate the mechanism of this vulnerability, especially for two kinds of the malicious data injection attack: multirepeat data injection attack and fake data injection attack.And then, we propose an intrusiontolerant security mechanism based on the theory of traffic flow and the model of cusp catastrophe, IA 2 P, to protect the traffic flow data collection in VANETs.At last, the simulation results show that the recognition rate of the malicious data is 94%, which is more useful and more practical than the existing methods.
In our future work, we would like to extend our results with thinking about vehicle privacy, because the MAC of participant in VANETs is exposed in this paper, and it is still

Figure 1 :
Figure 1: The scenario of VANETs with attackers.
S0: the Station of Initialization mainly sets the coefficients of the model (ℎ 0 ,  0 ,  0 , , and so on) and then goes to S1. S1: the Station of CV Transformation checks whether a measurement,   (V, , ), is MRD upon collection.If it is fixed and constant, it is discarded as MRD.Otherwise, it is transformed into CV(  ).Then CV(  ) is sent to S2 if ID   (MAC) ∉  list , meaning that   's sender is a valid participant.S2: the Station of Injection Attack Analysis is based on evaluation function (5): if (  ) > ,   is malicious data.This indicates that an injection attack is occurring, and   is sent to S4.However, if (  ) ≤ ,   is safe and valid data.As   is the output of IA 2 P, it is sent to S3. S3: for the Station of System Update, the coefficients of the model are self-adapted based on (9) to retain the traffic flow pattern's variation.Then the station machine goes on to S1 to continue the next measurement transformation and injection attack analysis.S4: in the Station of Injection Attack Processing, the injection attack is recognized.The measurement   is isolated, and the sender's ID is sent back to S1.The Injection Attacker List,  list , is updated and then the station machine goes back to S1.

4. 1 .
Experiment Setup.In this section, we validate the malicious data injection attack analysis and processing through experiments using actual traffic data sets provided by the Dalian Department of Transportation.These data sets were archived from traffic flow data collected by inductive loop z ) ≤ ID x  (MAC) ∉ O list

Figure 5 :
Figure 5: A test example of 24-hour traffic data set from the on-road simulating result.

Table 1 :
[28]ctive loop detector details.Vehicles can broadcast their speed and location by DSRC as a special frequency.10Hzis recommended by the Vehicle Safety Communications Project, which is distributed by the U.S. Department of Transportation[28].The Roadside Unit receives this information by DSRC.Meanwhile, four loop sensor detectors (signed as a, b, c, and d) are placed to collect data regarding traffic volume and occupancy.This data is sent to Roadside Unit A by a transmission interface, such as RS232 and RJ45.Data on vehicle speed is sent by vehicles to Roadside Unit A by DSRC.Then data on attacker activities is analyzed and processed on Roadside Unit A. IA 2 P is installed on the Roadside Unit.

Table 2 :
The performance of IA 2 P.