On the vulnerability of data-driven structural health monitoring models to adversarial attack

Many approaches at the forefront of structural health monitoring rely on cutting-edge techniques from the field of machine learning. Recently, much interest has been directed towards the study of so-called adversarial examples; deliberate input perturbations that deceive machine learning models while remaining semantically identical. This article demonstrates that data-driven approaches to structural health monitoring are vulnerable to attacks of this kind. In the perfect information or ‘white-box’ scenario, a transformation is found that maps every example in the Los Alamos National Laboratory three-storey structure dataset to an adversarial example. Also presented is an adversarial threat model specific to structural health monitoring. The threat model is proposed with a view to motivate discussion into ways in which structural health monitoring approaches might be made more robust to the threat of adversarial attack.


Data-driven modelling in structural health monitoring
Monitoring the health of engineering structures is of critical importance to countless engineering disciplines and applications. Since it is very difficult for most sensors to measure damage, 1 the practising engineer must instead rely on a range of indirect tools and techniques in order to identify, locate and manage damage in structures.
Much interest in the field of structural health monitoring (SHM) has been in data-driven approaches that leverage techniques from machine learning and statistical pattern recognition. 2 Several authors have extolled the virtues of casting SHM problems in this way. 3,4 The modern engineer is fortunate to have at their disposal a litany of tools that are able to perform pattern recognition, and the literature reflects this. To date, authors have employed outlier analysis, 5 neural networks, 6 support vector machines 7 and decision trees 8 among many other approaches. The reader is directed to Farrar and Worden 9 for a comprehensive reference text.
Despite the wealth of progress made in the last two decades, the large-scale deployment of SHM methodologies is still in its infancy. There are many well cited reasons for this, with the scarce availability of damaged training data and issues surrounding environmental variation emerging as key themes. However, innovative solutions to these problems are steadily being found.
In the face of these challenges, there are several large-scale systems that are currently in deployment. Perhaps most famous is the OnStar navigation and diagnostics system available in some commercial vehicles. There is also the integrated condition assessment system (ICAS) 10 deployed by the US navy for the monitoring of hardware, as well as a number of techniques developed for monitoring rotor-craft under the umbrella term of health and usage monitoring systems (HUMS). 11 These systems are all implemented in potential life-safety applications and are therefore potential targets for malicious attack.
It is the opinion of the authors that if SHM frameworks are to be adopted in life-safety or economically critical projects, then it is of utmost importance that the security and robustness of the underlying models are rigorously examined.

Adversarial attacks on pattern recognition models
It is important here to distinguish between the similar but distinct topics of adversarial attack and adversarial machine learning. The latter refers to machine learning methods whereby two learning models are pitted against each other in order to produce generative models 12 that are able to sample from the underlying input distributions.
Adversarial attack, however, refers to the construction of adversarial examples for classification models. These are deliberately perturbed inputs for which the classifier assigns an incorrect label despite small or imperceptible semantic alteration from the true state. Mathematically, for a classification model M that assigns a class label y to input vector fxg An adversarial example fx 0 g will result in an adversarial classification (erroneous label) The adverse label is assigned despite semantic similarity to a human observer fxg'fx 0 g ð 3Þ The vulnerability of neural-network models to adversarial attack was first presented by Szegedy et al. 13 It was later shown that the vulnerability of such models was not limited to the area of neural networks. 14 In fact, a wide array of classification algorithms are susceptible to this type of attack. Subsequently, Carlini and Wagner 15 produced a general approach for the construction of adversarial examples that were difficult to detect and were able to bypass several of the recently proposed adversarial defence strategies.
Since this revelation, the literature has grown rich with contributions exploring adversarial attacks and the generation of adversarial examples. A recent review can be found in Yuan et al. 16 By far, the majority of the case studies and the application papers thus far published on adversarial attack have been concerned with image classification and machine vision. It is in these areas that the majority of the taxonomy has been developed. In this article, the authors hope to demonstrate that adversarial attack is also a real threat to SHM models.
Data-driven approaches to SHM rely at their core on machine learning models that are susceptible to adversarial attack. While some of these rely directly on neural networks, 6,[17][18][19] Papernot et al. 14 demonstrated extensively the vulnerability of techniques beyond neural networks, including support vector machines, decision trees, logistic regressors and others in their highly cited paper. With this in mind, it is clear that the adversarial attack is a threat to a great deal of datadriven SHM.
While no direct consideration of adversarial vulnerability for SHM has yet been presented in the literature, the susceptibility of data-driven approaches has already been demonstrated for the related field of process monitoring. 20 In their paper, the authors demonstrate the adversarial fragility of a deep neural network trained to detect system failures and offer an adversarial training method similar to Madry et al. 21 for hardening the classifier.
With the rapid pace of adversarial attack research in mind, it is the aim of this article to motivate serious discussion into the vulnerability of the learning models proposed for SHM. Presented here are two contributions. The following section envisages an adversarial attack threat model for SHM. The threat model is accompanied by a taxonomy specific to threats arising in SHM frameworks. The third and the fourth sections provide demonstration of adversarial attack on a damage detection model trained on an SHM-benchmarking dataset. It is shown that simple transformations can be constructed that map every true input to an adversarial example, even when the inputs have not been used to train the classification model. The final section outlines the directions for further investigation into ways in which SHM might be made more robust to adversarial attack.
This framework is expanded in Figure 1 to produce a physics-to-label representation of the SHM framework. For convenience, the following definitions are presented: � A SHM framework (the system, Figure 1)  The principal threats in this scenario are thus defined as follows: � The false labelling of the damaged state as undamaged (false negative, y + ! y 0 � ); � The false labelling of the undamaged state as damaged (false positive, y � ! y 0 + ); � Semantic similarity between real and adverse inputs to the model ðfxg'fx 0 gÞ.
Thus, we may define a general objective function for the production of semantically convincing adversarial examples as J ðfx 0 gjfxg, MÞ ¼ F ðfxg, fx 0 gÞ + lGðy; y 0 Þ ð4Þ where y ¼ MðfxgÞ and y 0 ¼ Mðfx 0 gÞ are as defined earlier. The objective function has two terms and a weighting parameter. The first term penalises examples that are not semantically similar to true inputs such as the L1 or L2 norm. The second term is a metric, which ensures that the adversarial example produces adversarial miss-classifications such as cross-entropy loss metrics. The nefarious labelling of damaged structures as healthy is of serious concern. If left undetected, attacks of this type have the potential to result in worsening of structural health or critical failure. The consequences of a false-positive classification may seem minor compared to that of the false-negative classification and in the extreme this is certainly the case. However, unnecessary maintenance and inspection may bear a financial toll and repeated missclassifications may erode confidence in the monitoring system. Unaddressed, either of these eventualities are likely to render the system completely useless in the long term. Furthermore, if the adversarial examples have high semantic similarity to measured healthy data, it would be very difficult for a human to recognise the occurrence of the adversarial examples and troubleshooting the problem may be difficult.
The SHM analogue of semantic similarity from image recognition is not immediately intuitive. It is unlikely that a human would be able to identify damage by observing measured signals from a structure alone. However, there are domains that do present semantic information to the trained eye. For intuition, consider the frequency response functions (FRFs) of a vibrating structure. FRFs are frequently used as damage and condition-sensitive features in SHM as they encode physical dynamic properties and can be efficiently computed. To a trained engineer, the natural frequencies and damping properties of a structure are approximately tractable visually and it would certainly be possible to identify significant deviation from the expected structure of the signal. FRFs, as well as other features such as coherence and acoustic emission signals, contain semantic structure and are all potential targets for adversarial attack.

Threat model
In order to arrive at a coherent adversarial attack threat model for SHM, it is important to first consider the vectors for attack that are present in the system. Figure 2 presents a breakdown of the principal attack vectors into three domains.
The first of these is the data-acquisition domain and is largely concerned with the physical security of sensing equipment. Given access, the attacker is able to adversely influence or gain access to structural measurement data. This includes both online data collection and training data collection. Vectors for attack include the placement of impostor sensing equipment, 'spoofing' sensors with adversarial inputs and eavesdropping on sensor data.
The second domain is related to data storage security. Threat models for generic data storage security have been previously explored in Hasan et al. 23 In the context of SHM, the principal threat is that training and model data might be maliciously accessed with the intent to construct adversarial examples. A data breach could arise as the result of either a physical or cyber intrusion and so hardening the system to these vectors is especially difficult.
The third domain is concerned with the implementation of the model. The principal threat is that knowledge of the structure and parameterisation of the model might become available to the attacker. Risks in this domain are largely concerned with insider threats, whereby a person with trusted access to the implementation and acquisition components of the system is able to act maliciously. However, other means of access, such as social engineering and espionage, are also vectors by which access to model implementation might be acquired. With total access to the system, an attacker would be trivially able to bypass safeguards and affect their machinations.

Threat taxonomy
In light of the earlier discussion, the principal distinction between adversarial attack types is the level of access that is afforded to the attacker. This is certainly the case in the machine learning literature where attacks are characterised as either white-or black-box.
In a white-box attack, it is assumed that the attacker has complete access to the system. This could include model structure, parameters and gradient information, as well as access to the training data, inputs and outputs. This corresponds to access in all three domains of Figure 2. The white-box attack is primarily a simulation of the insider threat. The scenario can also apply to systems whereby the implementation has been made publicly available, for example, if the model had been published in the academic literature.
In a black-box attack, it is assumed that the attacker has query access to the model only, with no knowledge of the model structure, training procedure or access to a training dataset. There is some variation in exactly what is available to the attacker in the literature. Papernot et al. 24 assumed that a very small set (less than 10 per class) of examples from the input domain (but not necessarily the training data) are available. In an SHM context, we include scenarios whereby the attacker is able to query the model and has access to incoming data as black-box attacks.
The black-box attack is indicative of an outsider threat whereby an attacker is able to query the classifier either remotely via a cyber attack or by gaining access to models and data during a physical attack. In order for the black-box attack to be realistic, the number of queries to the classifier must be kept to a minimum. Guo et al. 25 argued that any black-box attack that makes use of many thousands of queries can easily be defeated by query limitation.
Consideration of both the white-and black-box scenarios is important for SHM applications. One question the SHM community is going to have to deal with is how to certify SHM systems for monitoring publicly owned critical infrastructure for life-safety applications. The importance of this issue will increase as SHM research has begun to mature and SHM systems begin to be sold commercially. The complexity of SHM systems is high enough that the community itself will need to provide public authorities guidance to guard against commercial entities selling malicious or sub-standard SHM systems. One way to guard against the use of sub-standard SHM systems is to mandate that the design of these systems be transparent. However, transparency is often not in the interest of commercial entities, and it does make white-box attacks more easy to execute. This tradeoff between security and ensuring the performance of SHM systems must be considered. Furthermore, as 5G networks become more prevalent, they will increasingly be used to implement SHM systems. A number of security risks have been identified with 5G networks (e.g. supply chain, interdependencies and increased overall attack surface) that make whitebox attacks more plausible. 26 As a result, both whiteand black-box scenarios should be considered by the SHM community.
Based on the original work on adversarial examples, 13 a great number of approaches have been developed for the construction of adversarial examples. The recent review by Yuan et al. 16 does an excellent job in recording and categorising the approaches that have thus far been proposed. A brief outline of the key definitions is included here.
Attack scope. An example search is a style of attack that performs optimisation in the input space of the SHM model to specify single or multiple adversarial examples ({x#}) that are optimised independently. This type of attack is most likely to be enacted as a one-time attack.
A training attack (also referred to as data poisoning) is conducted during the training phase of the SHM framework, before the system has been fully implemented. During a training attack, the training data are augmented or appended with adversarial examples with a view to maximise erroneous classification within the model itself.
A universal example search is an altogether different type of attack. Instead of optimising individual examples, the attack is a search for a function ðAðfxgÞÞ that transforms every input to an adversarial example This is often a more challenging task (as the mapping must span the input space) and is more dangerous to the operation of an SHM system as the adversarial examples can be delivered continuously in an online fashion.
Adversarial specificity. In a targeted attack, the adverse class labels are specifically chosen such that, for example, a damaged structure is selectively labelled as undamaged In an untargeted attack, there is no specific attention paid to which label is assigned to the adverse example as long as it is not the true class. In the binary classification case, this is equivalent to the targeted attack Attack frequency. A one-time attack involves the specification and injection of adversarial examples without querying the classifier. Such attacks might be appropriate for attack motives that require only temporary falsification of the SHM system. An iterative attack makes multiple queries to the classifier to assess the effectiveness of the adverse examples, improving their potency iteratively. This attack style tends to result in more semantically convincing examples but comes at the cost of increased computational effort and risk of exposure.

Dataset and classification results
This section details the specification and training of a classification algorithm for the Los Alamos National Laboratory (LANL) three-storey structure dataset ( Figure 3). 27 The primary objective of the classifier will be to accurately predict damage labels despite the presence of simulated environmental variation.
There are essentially two classes of algorithm available for performing this task. Generative models aim to construct the full joint probability of the labels and the data. The advantage is that the algorithm is able to return predictive probability distributions and so uncertainty in the predictions is handled graciously.
By comparison, discriminative models aim to reconstruct only the class conditional probabilities and so only the labels themselves can be returned. For this work, only discriminative models are considered.

LANL three-storey structure dataset
The three-storey structure was conceived as a test-bed for SHM algorithms. Many published approaches to SHM demonstrate their effectiveness on this dataset. This prominence in the literature makes the threestorey dataset ideal for demonstrating the vulnerability to adversarial attack. Another objective of the original report on the dataset is to assess robustness in the face of environmental and operational variation. As such, there are 17 structure configurations (detailed in Table 1) representing either a damaged or undamaged state.
Damage is simulated in the structure by the inclusion of a bumper that acts between the second and third storeys of the structure. The impacting bumper adds significant nonlinearity to the dynamics of the structure. The gap, measured from the equilibrium point to the bumper, is varied to simulate the progression of damage. The bumper engages once the inter-storey displacement exceeds the gap distance. This means that smaller gap distances simulate increased levels of damage. A full account of the configuration of the structure in each of the 17 states is recorded in Table 1.
The dataset consists of the input force and acceleration responses measured at each storey of the structure. The structure is excited with a band-limited (20-

Classification approach
As state labels are available for the dataset, the classifier will be trained in a supervised manner. The extracted features will be the accelerance FRFs estimated by the Welch method (resulting in a real-valued spectral density) with no overlap, the Hanning window and fivefold averages. FRFs are computed for the base of the structure and each for of the three-storeys.
The FRF for the ith storey ðx i Þ is calculated from the accelerations ða i Þ as where W indicates the Welch operation as described earlier. The x i are then concatenated and normalised (zero mean, unit variance) to form training vectors with 1640 points. This reduction in sampling points is due to the five-fold averaging in the Welch operation. The use of FRFs as features is motivated by the observation that variations in the normal condition (masses, stiffnesses) exhibit more variance in the natural frequencies of the spectra, whereas the damage progression is most evident in the variance of the higher frequencies. The FRF is therefore a suitable feature as it is independently expressive in both of these directions.
For further motivation and in order to aid visualisation, Figure 4(a) and (b) depict the first two principal components of the dataset for the time series and the FRFs, respectively. The individual classes are clearly more separable in the FRF basis. In addition, variations in normal condition seems to be approximately orthogonal to the progression of damage. Figure 5 shows the first two principal components of the FRFs recoloured to depict damage labels only.
For the task of damage identification, a multi-layer perceptron (MLP) is trained with a single hidden layer. The structure of the classification model is detailed in  27 Column references refer to the indices described in the original report and in Figure 3. The network is initialised with random weights uniformly distributed on the interval [-1, 1] and then trained for up to 100 epochs using a cross-entropy loss function and the Adam optimiser. The hyperparameters for the optimiser are set to the default values provided in the original study. 28 The training process is repeated for hidden node numbers in the range [1,100]. Figure 6 depicts the training curves for training and validation sets. The figure clearly depicts a stable training regime with excellent validation performance on networks with greater than around 15 hidden nodes. Although many network structures give optimal performance, in order to ensure our demonstration is as realistic as possible, the simplest model with the best validation accuracy is selected. This is a network that achieves a validation accuracy of 99.58% with 16 hidden nodes trained over 79 epochs. The structure of the classification model is detailed in Table 2. Figure 7 depicts the classification confusion matrix on the 500 unseen testing examples that were not used during training. The classifier achieves a multi-class prediction accuracy of 98.60% on the testing data, with only two (0.40%) false-positive and zero false-negative classifications. The remaining miss-classifications (1.00%) are between the three damage classes that have the smallest geometric differences in gap size.

Classification results
Conventional analysis of the capacity of classification models such as the Vapnik-Chervonenkis (VC) dimension 29 places lower bounds on the number of training examples required to ensure generalisation. However, empirical evidence and recent studies 30 show that neural networks and deep learning models routinely outperform the theoretical limits placed on them.    . Despite this, the classifier is able to achieve high accuracy and good generalisation on unseen data. The authors therefore argue that this strong performance on unseen data is ample justification that the training data are sufficient to learn a general mapping from the data manifold to the labels.

Demonstration of adversarial vulnerability
The objective of this article is to motivate serious discussion into the vulnerability of data-driven SHM models by demonstrating the most threatening attacks in both the white-box and black-box threat scenarios. For brevity, we consider only the more challenging task of performing a universal example search. The goal is to identify an adversarial transformation A that produces an adversarial output for every true input, while maintaining semantic similarity between true and adversarial examples Aðfxg, u A Þ ¼ fx 0 g ð 10Þ  where u A are the parameters of the adversarial transformation. Semantic similarity is judged by eye, in the FRF domain. This is conducted by comparing the adverse example fx 0 g to examples from both its true class fxg and the adverse class fx l 0 g. An adverse example will be judged to have semantic similarity if it more closely resembles the true class while still inducing an adversarial classification. For maximum impact, the demonstration is conducted as a targeted attack. The approach here is the more challenging task of aiming for perfect missclassification with every label representing either a specific false-positive or false-negative result. Adverse training labels are constructed by the following targeted scheme where N is the number of training examples. A class label of zero corresponds to the normal condition, whereas a class label of 13 relates to the maximally damaged state. Constructing the adversarial target labels in this way ensures that A represents a transformation that maximises the rates of false positives and false negatives. The demonstrations here are also iterative attacks in that the parameters of the transformation are optimised over successive queries to the classifier.

Demonstration of white-box attack
The first demonstration of adversarial attack on an SHM implementation is conducted in a white-box manner. Since gradient information is available in this context, the parameters of the adversarial transformation u A can be optimised via gradient descent. Several authors have presented efficient methods for conducting attacks in this context. 31,32 However, in order to emphasise vulnerability to naive attackers, the method used here is simplistic and consists of two distinct phases.
For the white-box demonstration, the adversarial transformation A is specified as an MLP with randomly initialised weights (the adversarial perturbation network (APN)). The first phase in the attack is the listening phase whereby Aðfxg, u A Þ is trained to reproduce the inputs from the training data.The purpose of this listening phase is to reduce the total number of queries that must be made to the target classifier. By pretraining the adversarial transformation to reproduce the inputs, an internal model of the input is constructed in the same manner as that of an auto-encoder. This internal model can then be optimised to produce adversarial examples during the second phase.
The second phase is the learning phase during which the network learns a perturbing transformation that maps true inputs to adversarial examples. The exact procedure is as follows. During the listening phase, an MLP (Table 3) with parameter set u A is first trained to reproduce the inputs under a mean-squared error loss function where J 0 ðu A Þ is the adversarial loss. Next, the network is appended to the input of the classifier and trained on the adversarial labels to produce adversarial examples. During the learning phase, training is conducted as a multi-objective problem with constraints placed on both the cross-entropy loss between the predicted and adversarial labels and the mean-squared error between the adversarial example and the true output where M is the number of classes in the dataset and L is the dimension of the input. The parameter l is a hyperparameter that controls the weighting of the missclassification and mean-squared error reconstruction terms. Several approaches to this multi-objective optimisation problem are possible, including placing a minimum threshold on either of the terms. For these demonstrations, it is set to unity in order to give equal weighting to the classification and similarity terms. The parameters of u A are then optimised via the Adam stochastic gradient descent algorithm using the classification training data.
In order to verify that the attack has been successful, the testing set of examples not used during adversarial training is fed through the perturbing network and classified. Figure 8 depicts the confusion matrix of the classifier on the adversarially perturbed validation examples. The confusion matrix represents 99.58% and 100% as false-negative and false-positive classification rates, respectively. In fact, only a single example was  For intuition as to how the semantic structure of the signals remains intact during the perturbation, the first two principal component analysis (PCA) directions of the FRF data are plotted in Figure 10. Overlayed on the figure are 10 of the transformations from true inputs (dots) to adverse examples (crosses) coloured by damage label. In the figure, it can be seen that the perturbation is small in the PCA basis suggesting that the majority of the structure has been maintained.

Demonstration of black-box attack
For the black-box attack demonstration, it is assumed that the attacker only has access to the classifier on an input-output basis and has no knowledge of the inner workings of the algorithm. We permit the attacker access to a set of training examples and corresponding target labels, but crucially not the gradient information. While many black-box approaches to adversarial attack rely on the construction of a surrogate model in order to generate a synthetic classifier that can be attacked as a white-box, the approach utilised here is deliberately more naive. It is the reasoning of the authors that vulnerability to such naive attacks lowers the bar for adversarial attack and further motivates investigation into the ways in which SHM algorithms might be made more robust.
The aforementioned approach consists of two phases. During the listening phase a 1024-512-n deep auto-encoder (DAE) 33 is trained to learn a reducedorder representation of the inputs. While other more sophisticated reduced-order models (such as hierarchical models or modal analysis-based approaches) are clearly appropriate, the objective here is to demonstrate  Figure 9. Adversarial examples arising from the white-box attack. The adversarial input is semantically similar and follows the overall structure of the true inputs more closely than the adversarial target signal. There is however a notable difference in the variance of the signal in the false-negative case: (a) false-negative example and (b) false-positive example.
a naive approach that makes little to no assumptions about the structure of the input. The DAE consists of two components, the encoding transformation f and decoding transformation c where fxg is the reconstructed signal. The parameters of the DAE are first trained under a mean-squared error loss using the same objective function as the white-box attack in equation (12). This step is critical for the black-box attack as it greatly reduces the number of parameters that must be optimised without gradient information. Several authors have studied adversarial attacks on DAEs and other latent space models. 34,35 The approach shown here is most similar to that of Creswell et al. 36 in that the latent space is perturbed directly. However, while the authors specify an additive transformation for AðzÞ, the approach here is a more general n-dimensional affine transformation, where n is the size of the latent encoding. In the latent encoding, the transformation is a 1n a 1n + 1  a 21 a 22 . . . a 2n a 2n During the learning phase, the DAE is prepended to the classifier and the perturbing transformation is inserted between the encoder and the decoder. The forward adversarial transformation is now given by As before, this is a multi-objective optimisation problem that seeks to maximise both semantic similarity and miss-classification. The learning phase objective function is now given by where fx 0 g ¼ cðAfðfxgÞÞ and N, M and L are the number of examples, classes and input dimensions, respectively. However, there are now two hyperparameters that must be specified. These are the size of the DAE latent encoding n and weighting term l which is set to unity.
The size of the latent encoding represents a trade-off between accurate reconstruction of the inputs and the complexity (number of parameters) of the affine transformation. More parameters in the latent encoding will result in higher fidelity representations of the inputs, but will require more parameters to be optimised in order to find the adversarial transformation. The number of parameters in A grows as n 2 + n and so choosing a lower value of n is preferable.
In order to select a value for n, the explained variance of the first 50 PCA components are plotted alongside the classification accuracy of the reconstructed signal in Figure 11. Although not directly related, the PCA explained variance plot affords intuition into the number of parameters that are required to provide an  accurate representation of the input. It can be seen from the figure that only a small number of parameters are responsible for the majority of the variance in the inputs (this is expected from highly structured data such as FRFs). The classification accuracy does not seem to be affected above values of 5. With this in mind, a conservative value of n ¼ 10 is selected. Figure 12 depicts the input and reconstruction for the trained DAE. It can be seen in the figure that the DAE has done a good job of representing the overall structure of the data but struggles to emulate the variance seen in the higher frequencies of the damaged examples. Nevertheless, this representation is clearly able to capture the damage-sensitive nature of the FRFs as the classification confusion matrix in Figure  13 is largely unchanged.
The parameters of the adversarial transformation u A are then optimised to produce adversarial examples. As gradient information is not available, a heuristic algorithm must be used. In this study, optimisation is achieved via a self-adapting differential evolution (SADE) scheme, with a loss function defined in equation (17). The hyperparameters of the SADE optimiser are presented in Table 4, and these parameters have been selected based on the authors' experience with the method. Other than these parameters, the implementation of the SADE algorithm is as described in the original paper. 37 As SADE is an adaptive optimiser, it is already largely insensitive to the selection of the initial values of the learning, mutation and crossover parameters. Presentation of the full iterative scheme is beyond the scope of this current work and the interested reader is directed to the original paper for details.
The parameters of 100 trial transformations are optimised over 10 runs from an initial uniform distribution   on the interval ½�1; 1�. The SADE algorithm is allowed to proceed for 100 iterations per run. After each run, the convergence history is plotted to verify that the optimiser has been successful in finding a global minima. Figure 14 shows the best and the mean cost function values plotted for each run. The lowest cost transformation over the 10 runs is then selected and evaluated on the unseen examples as before. Figure 15 depicts the classification confusion matrix on the unseen data. Although not as successful as the white-box attack, the latent space perturbation has still managed to induce false-positive and false-negative classifications in almost every case. Figure 16 depicts the adversarial examples generated in the black-box attack from the same unperturbed inputs used to generate the examples in Figure 9. However, it is immediately obvious that the black-box attack has failed to produce adversarial examples that maintain semantic similarity to the true inputs. The overall structure of the input has been conserved during the attack and the signals still clearly resemble FRFs. However, the adversarial examples more closely resemble members of the target class, meaning that the chances of such examples fooling a human observer is low. Figure 17 depicts the PCA directions of the dataset for the first 10 adversarial transformations as in    Such an effect would make optimisation on the parameters of A, extremely difficult as the resulting objective surface would be highly non-smooth. A future approach to manage this phenomenon could be to use a linear transformation (such as PCA) into a reduced-order space. Despite these results, some semantically convincing examples can still be constructed by a variant of the latent attack method described earlier. By optimising the parameters of A to transform a single example at a time (example search attack) and tuning the l parameter in the objective function, the adversarial examples in Figure 18 are generated. While not the result of a universal example search, these examples maintain better semantic similarity to the inputs than those shown earlier.
Visually the examples do not show significant change in natural frequency or damping information and it is conceivable that a human observer would be fooled by these examples.
Although these examples were cherry-picked from a large group of potential examples as visually more threatening, the authors would point out that an attacker would also be free to make such a selection. In fact, query limitation would be the only thing preventing an attacker from generating many thousands of trial examples and selecting only the most potent for malicious deployment. In an SHM context, whereby even a small number of false positives (and an even smaller number of false negatives) are enough to render a framework useless, an approach of this type represents a serious threat.   Figure 18. Adversarial examples results arising from the second black-box attack, selected for presentation based on a visual assessment of semantic similarity. The adversarial input is semantically similar and follows the overall structure of the true inputs more closely than the adversarial target signal: (a) false-negative example with l = 80 and (b) false-positive example with l = 40.

Adversarial defence strategies for SHM
Several adversarial defence strategies have already been presented in the machine learning literature and the development of new approaches is a very active area of research. The development of the techniques has thus far been centred largely around image classification tasks and have often been presented in an ad hoc manner, whereby any one strategy is only effective against a subset of attack types and vise versa.
Of the defences thus far presented, two prominent themes are distillation 38 and adversarial training 21,39 type defences. While distillation has been shown to be effective against some common attacks, it has also been shown to be insecure against attacks deliberately designed to circumvent the method. 40 Adversarial training defences generate adversarial examples during the model training process, with the hope that the trained model will develop robustly. While promising progress has been made, the application of this technique in an SHM context is likely to be limited by the availability of damaged training data.
Recently, Ilyas et al. 41 have presented a new approach related to adversarial training based on the theory that adversarial examples arise due to 'nonrobust features' that are present in the dataset. Their approach utilises a penalty term in the objective function that punishes models that rely on the non-robust features.
Another promising direction is the investigation into generative models. Li et al. 42 have recently suggested that generative models may be more robust to adversarial attacks. They present adversarial defence strategies that are able to make use of the full joint distribution in order to detect adverse examples. This is a promising result for SHM, as many modern approaches utilise models of this type, for example, Bayesian networks. 43 However, Gilmer et al. 44 have shown that the presence of adversarial examples in high-dimensional datasets is (at least in a simplified spherical case) independent of the model used for classification.

Conclusion
In this work, it has been possible to demonstrate the serious vulnerability of an SHM classifier to adversarial attack. A universal example search, has been shown to construct semantically convincing adversarial examples that are able to fool an SHM classifier with almost perfect testing accuracy. This was achieved in the whitebox threat scenario, which is the more realistic for SHM applications. Although failing to replicate this feat, a naive black-box attack has also been able to construct individual adversarial examples.
Clearly, robustness to adversarial attack is an open problem that presents a real challenge to the robustness of machine learning applications. This work demonstrates that data-driven SHM methods are not exempted. In order to facilitate further investigation, an adversarial attack threat model for SHM has been proposed. It is hoped that this will serve as a platform for future discussion surrounding the security implications of data-driven SHM approaches. The authors believe that adversarial attack robustness will emerge as a key challenge in the widespread deployment of SHM frameworks.