Machine learning and structural health monitoring overview with emerging technology and high-dimensional data source highlights

Conventional damage detection techniques are gradually being replaced by state-of-the-art smart monitoring and decision-making solutions. Near real-time and online damage assessment in structural health monitoring (SHM) systems is a promising transition toward bridging the gaps between the past’s applicative inefficiencies and the emerging technologies of the future. In the age of the smart city, Internet of Things (IoT), and big data analytics, the complex nature of data-driven civil infrastructures monitoring frameworks has not been fully matured. Machine learning (ML) algorithms are thus providing the necessary tools to augment the capabilities of SHM systems and provide intelligent solutions for the challenges of the past. This article aims to clarify and review the ML frontiers involved in modern SHM systems. A detailed analysis of the ML pipelines is provided, and the in-demand methods and algorithms are summarized in augmentative tables and figures. Connecting the ubiquitous sensing and big data processing of critical information in infrastructures through the IoT paradigm is the future of SHM systems. In line with these digital advancements, considering the next-generation SHM and ML combinations, recent breakthroughs in (1) mobile device-assisted, (2) unmanned aerial vehicles, (3) virtual/augmented reality, and (4) digital twins are discussed at length. Finally, the current and future challenges and open research issues in SHM-ML conjunction are examined. The roadmap of utilizing emerging technologies within ML-engaged SHM is still in its infancy; thus, the article offers an outlook on the future of monitoring systems in assessing civil infrastructure integrity.


Introduction
Machinery equipment and structures, particularly lifelines, fabricate the most critical components in this modern age, and they have become an indispensable part of the present day. In the case of utility lifelines, such as roadways, bridges, and powerlines, any threats that could cause a failure in any part of the system, no matter the extent, can eventually lead to the disruption of a whole city or a country. This means that if it was possible to predict future failures and detect the existing ones, this could potentially lead to a reduction in direct and indirect economic costs and human life fatalities. The key to doing so lies in identifying damage in structures. Damage is typically defined, in simple terms, as any change to the material or geometry, such as the boundary condition that can alter the dynamic properties or the response of the structure, 1 thus adversely affecting the current or future performance of the system. 2 In the past, identifying damage was only based on a periodical inspection either carried out using non-destructive testing /non-destructive evaluation (NDE) or by visual observation. The latter method, although it performs well for straightforward applications, is susceptible to subjectivity, human errors, prolonged duration, and occupant's safety for more complex systems. Prior knowledge of the damaged area is necessary for such techniques which would be impossible for small and unreachable regions without completely dismantling part of that area first. Such damage detection is localized, meaning it cannot represent the global behavior or the system's response.
The impracticality of visual inspection for large and complex civil infrastructures and long biennial inspection intervals has opened up the possibility of incorporating condition-based assessment techniques. As such, structural health monitoring (SHM) has emerged to provide the transition from offline damage identification to near real-time and online damage assessment. In layman's terms, SHM is a damage detection strategy that can observe a structure over a long period using a series of continuous measuring devices. Sensitive features extracted from these continuous measurements and the statistical analysis of such measures can provide the ability to assess the current performance of structures. Figure 1 represents the typical components of an SHM system. It starts with a selection of sensors and the placement of them in strategic locations on the structure. The collected data through the data acquisition system are transmitted to the processing unit and stored and managed in a database system. The evaluation of the collected data and the health state of the system is determined through several techniques and algorithms. In the end, based on the location and severity of the identified damage and how it can propagate in the future, inspection and maintenance during the decision-making process will be decided and carried out.

Model-driven SHM versus data-driven SHM
As stated earlier, to identify the damage, the undamaged state of the structure must either be assumed or developed. Similarly, the extent of the damage is nearly impossible to quantify or assess if the previous "undamaged state" is unknown. Therefore, the ability to identify a damaged structure from the given measurements ultimately lies in realizing the previously recorded information and the pattern of changes it follows throughout the measuring period. In certain SHM applications, a prior model, typically the finite element model (FEM) of the structure, is useful as a baseline. Model updating is then performed, replacing the initial assumptions with the measured values. This is then considered as the original state of the structure. Further updating of the model can, therefore, identify the damage by considering the structural changes. This process of SHM implementation is a model-driven method. Therefore, an accurate analytical model of the structure is required. 3 There are numerous works related to model-driven SHM. To name a few, Cao et al. 4 developed a piezoelectric impedance measurement for structural damage identification through an inverse analysis. Similarly, Moore et al. 5 identified cracks in a thin plate by model updating. Generally, coming up with an accurate model is burdensome. Model discrepancies, especially for complex structures, are inevitable with little to no information about joints and bonds. Such an inverse problem is not well-posed 6 and requires regularization and simplification. 7 An alternative to a model-driven SHM system is a data-driven model. Other than relying on the physical model of the structure, the model construction is dependent on statistical pattern recognition (PR), which is usually applied by machine learning (ML) algorithms.
In contrast to having an FEM and updating the model later, the sensing devices' data from the structures are used more conveniently in the undamaged state and under few circumstances in the damaged state. In cases where insufficient labeled data exists, the data-driven approach can take an unsupervised form, or a hybrid model can be utilized for generating additional data. Augmentation of data-driven SHM systems with FEM can generate labeled datasets for training validation and testing phases. However, it is crucial to highlight that physical models are computationally intensive and need validation with experimental results. 8 On the other hand, not every ML algorithm is capable of damage prognosis, meaning data-driven approaches are not always predictive models. Therefore, the decision between employing model-driven or data-driven SHM systems or both ultimately boils down to realizing (1) the proposed system's requirements, (2) the complexity of the application where the system is deployed, and (3) if the existing data and models can support and provide valuable inferences about the health state of the structure. For example, suppose one prefers a hybrid combination of the two methods. In that case, the system's predictive accuracy depends on the performance of the physics-based model and if the measured data from the data-driven approach is relevant and usable for training and validation.

Damage definition and identification
A vertical hierarchy is typically considered in order to identify damage. A pioneered damage typology scheme was offered by Rytter. 9 Damage state was categorized into four levels, namely: 1. Existence of damage-Detection 2. Position of damage-Location 3. Severity of damage-Extent 4. Prognosis of damage-Prediction In such a hierarchy, knowledge of the previous level is generally essential for complete damage identification. Thus, the success at each level is likely to depend on the performance of the lower levels. With the advent of ML and PR algorithms, a new level can be added to the above. Determination of the type or classification of damage is the level that is possible through the use of ML algorithms. 10 This new step lies between steps 2 and 3 introduced above. Figure 2 depicts the 5-step hierarchical damage identification from detection to prediction transactions.
Given that both damaged and undamaged information is available, a supervised learning algorithm can effectively go through all five damage detection levels. This, as explained before, requires extensive data to be readily available from the sensing systems, the physical-based models, or the experiments. Nevertheless, this is not possible in many cases, and the current information for damage state is limited, if not unavailable. For such situations, there exists a method called unsupervised learning. Instead of learning the models and train based on the data, a relatively simple approach, novelty, or outlier detection is applied. 11 An initial baseline of the model is therefore created assuming normal operative conditions. Later, upon receiving new data from the sensing systems during the operation mode, the algorithm detects any outlier given the set threshold defined by the system. 12 One example of an unsupervised algorithm was tested on an aircraft fuselage and multilayered carbon fiber-reinforced plastic (CFRP) plate for damage detection. 13 Compared to supervised learning, the unsupervised method provides a clear advantage as it no longer requires having prior information about the damaged state of the structure. However, this learning model can only be used to detect and sometimes, but not always, locate damage. 14 In addition, many of the implemented ML approaches for damage detection do not consider environmental and operational factors (EOFs) and only rely on the severe damages that occur on the structures. Temperature effect and traffic loading are a few of the neglected variabilities that, in reality, have a significant influence on the in-service structures' response. 15 Some works [16][17][18][19] have extensively studied the effect of such variabilities for an extended period ranging from 1 to 2 years. Thus, an unsupervised approach cannot effectively be used on its own when external factors' dependency requires consideration while identifying damage. 20 Rather, a coupled approach of model-driven and datadriven algorithms can work together to achieve a reasonable damage identification level. 19

Objectives of this study
The application of PR is not a new topic and dates back to the early 70s and 80s. In simple terms, PR is a tool to represent and recognize regularities in data. Sometimes, simple mathematical models based on a shared domain about a specific application can be used to infer patterns from a set of data and classify accordingly. During the 1990s, however, instead of relying on models derived by an expert (usually researchers) to classify data, machines were used to learn from the data, generate the most probable outcome, and validate the model based on unseen set data. The most likely outcome is a result of statistical PR algorithms, which are generally referred to as ML techniques.
This review aims to generalize these applications harmoniously using ML and SHM frameworks. Many methods with different results exist in the rich body of literature. Several approaches and techniques for feature extraction, data normalizations, and dimensionality reductions are employed for various civil infrastructures. This review brings a systematic collection of different SHM applications compatible with the statistical PR perspective. The readers, therefore, are introduced to the concept of ML and its utilization in the SHM paradigm. Moreover, model-driven and data-driven approaches in SHM will be discussed, but an emphasis will be placed on data-driven SHM approaches. In addition, tables and figures refine the ML taxonomy behind the vast SHM literature complementing the article. Next-generation SHM potentials such as unmanned aerial vehicle (UAV)-assisted SHM, mobile-SHM, and virtual/augmented reality-supported SHM are also addressed in this study together with the digital twin, smart city, and big data era.
For better readability, the abbreviations used in this survey, along with their definitions, are provided in Table 1. In summary, the review aims to consider: 1. The pipeline of ML in each component that makes up SHM systems. 2. The different tools and algorithms used in ML and DL processes for each level of SHM damage identification. 3. The different learning algorithms proposed for context-dependent applications. 4. Extension into IoT age-related and next-generation emerging technologies and data science prospects for SHM.

Comparison of past relevant reviews
Many works in the past years or so have reviewed ML aspects of SHM. Nevertheless, there is a growing need for  21 and Arcadius Tokognon et al. 22 provided a general review of SHM in the context of wireless sensor networks (WSNs) and IoT, respectively. Data acquisitions, processing, and network connectivity were among the topics that were discussed in detail. However, there was no explicit link between SHM and ML in these two articles, although some basic aspects were defined. Gomes et al. 23  Owning to its specificity, major key points that were overlooked by the previous DL reviews were comprehensively covered. Salehi and Burgueño 36 discussed, in great detail, the power of AI in ML, PR, and DL. However, a specific breakdown of ML and PR, especially in SHM, was not explicitly examined. The recent review by Sun et al. 37 is very similar to the previous paper and can be considered a comparable survey to ours. The authors acknowledged the vital role of AI and big data in SHM applications. Analysis of ML and PR procedures is also depicted in a general way. However, alternative applications in this era were not reviewed, and the connection to the ultimate goal of SHM in the context of smart cities and emerging technologies was not recognized. Data-driven SHM damage identification with DL was reviewed in a recent survey by Azimi. 38 The authors in their paper discussed in great length the usage of DL and machine vision and the new methods of monitoring damages, that is, mobile sensors and UAVs in SHM application. Avci et al. 39 reviewed the vibration-based damage detection in the literature while considering ML and DL algorithms. In the paper by Tibaduiza Burgos, 40 a brief overview of data-driven SHM applications and a summary of ML procedures were presented. In contrast to the papers above, the authors discussed some of the implementation steps of data-driven SHM. However, some important aspects of ML processes, such as feature selection and extraction, were not comprehensively analyzed. Hou and Xia 41 reviewed vibration-based damage identification for civil engineering structures in the last decade. A thorough learning algorithms analysis in different steps of MLenhanced SHM system was not present. Sony et al. 42 reviewed the next-generation smart sensing technology in SHM. In their paper, the authors included emerging technologies for collecting data from structures. Alavi and Buttlar 43 overviewed smartphones' deployment in major civil engineering areas. An emphasis was placed on the sensing capabilities of smartphones and their crowdsourcing power in SHM applications.
As summarized in Table 2, the majority of the past surveys did not assess some of the important aspects of ML in SHM systems. Necessary details, systemic explanation of implementation steps, and available methods for an MLengaged SHM system are the notable items that were missing or partially provided by the previous works. Moreover, the shift to the new era of the Internet of Things (IoT) and smart city necessitates a connection to be drawn that links data-driven SHM systems to the future paradigm and emerging technologies. ML, DL, and machine vision can provide the necessary algorithms for condition monitoring of many structures for SHM purposes, with big data and digital twins as the cornerstone of the future. 44 The authors imagine that the roadmap of utilizing new technologies such as ML is not limited to cases where damage is to be detected. The method used for data collection, either through WSNs or UAVs, the novelty of the processing type, and lastly, the expected utilization of the result obtained are the factors that this review article hopes to achieve.
Connecting the paradigm of sensing and processing critical information in infrastructures in a new domain with the features of the current IoT era, such as cloud (edge) computing or Industry 4.0, such as digital twin modeling and blockchain, is the next stage of SHM. In light of these achievements, this article is organized such that: the "SHM and machine learning, a detailed overview" section exhibits a detailed overview of the ML and DL pipeline in each component-level of SHM; ML learning algorithms are discussed in length in the "ML-supported pattern recognition techniques" section and connection to each level of damage identification in SHM is also provided; in the "IoTrelated applications" section, related IoT, big data, and hybrid approaches applications in SHM are reviewed. Similarly, in the "Next-generation SHM applications with ML/DL enhancements" section, three next-generation SHM applications are briefly summarized. Finally, an open research discussion for the future of SHM and ML is provided in the "Open research issues" section, and the "Conclusion" section concludes the review.

SHM and machine learning, a detailed overview
Ubiquitous data is everywhere. Given the amount of data gathered from numerous possible sources, it is essential to understand the pattern that underlines it. Day by day, with the increasing complexity of structures and the sheer  21 3 Gomes et al., 23 Fan and Qiao 24 3
amount of data collected, without automatic (or sometimes semiautomatic) processes to discover patterns using computers, such tasks would be infeasible and impractical. ML is considered a tool to recognize/classify information based on a learned pattern through the use of different algorithms. In general, ML algorithms are based on either (1) statistical, (2) neural, or (3) synthetic approaches. The first two methods are generally considered as the main pattern classifiers for SHM. 8 In detecting damages using ML, initially, a pattern class or category is defined. For SHM, one establishes training data through which all the attributes defining the structure are gathered (sensing). At this stage, depending on the collected data, class labels may or may not be assigned to the data. These data are then pre-processed to remove any noise or outliers and to reduce the dimensions of the damage vectors (pre-processing). The next stage is feature extraction. At this step, damage-sensitive features are selected either based on engineering judgment or mathematical and transformation procedures, or a combination of both. Postprocessing may also be applied after feature extraction to further compress, normalize, or fuse data as needed. After these stages, an algorithm is used to identify the damage state using one or more of the following techniques: 1. Classification: Discrete class label (damaged/ undamaged) 2. Regression (location of the damage, size of the fatigue crack, etc.) 3. Novelty/outlier detection Finally, from the processed output data, one can determine, if necessary, whether a decision has to be made to rectify the identified damages and which subsequent actions to take. Figure 3 illustrates a statistical PR classification model for a typical damage assessment scenario.
The subsections below describe the necessary procedures for any ML application, emphasizing SHM statistical PR. It should be noted that there are several methods available in any of the following procedures; however, only the popular methods are expressed here. Figure 4 summarizes the necessary steps that an ML model for SHM has to go through, along with a few examples of the techniques used in the literature. As shown in the figure, the first seven steps are introduced in this section, whereas step 8 is presented in a more detailed manner in the "ML-supported pattern recognition techniques" section. The material presented here offers the readers the chance to discover some of the common ML methods and techniques utilized in the vast body of SHM literature. Methods and techniques are introduced and summarized such that, in essence, the readers can prioritize and quickly grasp how each part of the ML process is implemented in SHM systems specific to their needs.

Excitation methods
The very first step in detecting the existence of damage is to excite the structure in place. Bridge condition monitoring, seismic performance assessment of bridges, and verification of the numerical models with the measured data are some of the factors, especially in vibrational-based applications, that require careful consideration of the method of excitation. In general, there are two ways to achieve dynamic excitation of civil infrastructures, 45 (1) ambient excitation and (2) measured-input test. Ambient excitation is suitable for real structures, whereas measured-input test is more often restrained to a laboratory experiment. Damage is considered a local phenomenon; however, local excitations tend to have little to no effect on low-frequency global response. Particularly in large-scale structures, ambient excitation is the only source that is capable of providing the necessary inputs that lead to identifying damage in terms of global behavior. However, the problem with such excitations is that, unlike local excitation, they are nonstationary, and the variabilities of their inputs need to be taken into account. Thus, for small-scale structures, measuring devices such as piezoelectric materials that can act as both force transducers and actuators can simplify the first two steps of implementing an SHM system. Dynamic excitation of the civil structures, such as bridges, can be achieved with (1) the ongoing vehicle or pedestrian traffic on the structure, (2) ambient wind and waves excitation, and lastly, (3) seismic ground motion, that is, earthquake or micrometer excitation. These types of excitations are output-only modal-based analyses that can be used to estimate modal parameters such as mode shape or resonance frequencies. In terms of local excitation, popular methods are (1) shakers with a variety of input patterns and frequencies, (2) direct impact to the structure at the point of interest, and nowadays, (3) input-output sensors capable of producing known input forces with high signal to noise ratio (SNR). Often, in these input-output methods, given the nature of the structure, disruption of traffic or on-structure activities are followed. Therefore, it is not practical to implement such excitation methods on a large structure. However, in exceptional cases and depending on the extent of the damage, it may be necessary to perform both excitation techniques in a combined approach. 46 Further advantages and disadvantages of these excitation methods can be found in Farrar et al. 45 and Maas et al. 47

Data acquisition
It is well understood that without sufficient and accurate data, a clear understanding of the damage-sensitive features, excitation methods, types of sensors, and, lastly, sensor configuration, SHM may not reveal the optimal information. For example, the method of global damage detection through standard accelerometers, which often are described as continuous measuring devices, differs from other sources of data collection such as strain sensors, a local measuring instrument, where an average data over a short period is gathered. In other words, monitoring of dynamic parameters may imply different needs compared with the static ones. Thus, it is essential to identify how the data can be collected and utilized. Therefore, the performance indicators of any sensors, as listed below, 48 need to be scrutinized. In addition to the items listed, two essential elements of data acquisition: (1) the number of sensors and (2) their locations must also be optimized. Redundant and unnecessary data would burden the data acquisition system and hinder the subsequent processes of the SHM system. Moreover, these sensors are typically permanently installed, and consequently, any unsystematic approach to designing a data acquisition system would introduce additional challenges.
Traditionally, wired-based sensing equipment was widespread over other means. Especially for important structures and long-term monitoring, expensive wiredbased systems are still preferred over wireless ones. 21 However, with the advancement of technology in many aspects of WSNs (size, energy storage/generation, etc.), they are becoming increasingly attractive to researchers as they provide portable, practical, and efficient alternatives. 49,50 Generally, sensors are categorized into passive or active sensors. Usually, a combination of the sensing technologies is used for SHM solutions. A detailed review of the currently used SHM sensors and nextgeneration smart sensing technologies is given in Moreno-Gomez et al. 51 and Sony et al. 42 Additionally, Abdulkarem et al. 52 discussed the state-of-the-art WSNs in SHM from a different perspective such as academic and commercial wireless platform technologies.
Passive operation of sensors is described as a measurement that inflicts no input energy to the structure. Accelerometers, strain gauges, and acoustic emission are examples of this type of sensor. They only detect damages with no interaction with the actual structure. Usually, these nonstationary sensors cannot precisely determine the dynamic response of the structure 53 that could otherwise be due to EOFs. Additionally, in the early stages of SHM applications, passive sensors had difficulties directly identifying the damage. It relies on variable ambient excitation, which may or may not output the desired data to be evaluated.
On the other hand, very similar to the NDE approach, active sensors localize the excitation tailoring the overall damage detection process. The main advantage over passive systems is that, with known excitation force and location, it is much easier to detect damages and minimize the effect of EOFs. Examples of standard active sensors are piezoelectric ultrasonic sensors which can be utilized either as an impedance-based method 54,55 or Lamb wave-propagation method. [56][57][58][59] ML adaptations in data acquisition systems are generally observed in sensor layout optimization. The goal is to use a minimal number of sensors while ensuring maximum damage sensing capabilities or, as referred by Farrar et al., 10 "maximizing damage observability." A range of input sensors at different locations is trained, and based on the most relevant feature an optimal sensor position is located. Recent studies tackled this daunting task. In the paper by Bigoni et al., 60 the authors utilized a sparse Gaussian process and one-class support vector machine (SVM) for a reduced model. The most widely used ML algorithms in sensor optimizations are genetic algorithms (GA) in coordination with neural networks (NN), 61 as shown in Table 3. Different computational methodologies in optimal sensor placement along with the advantages and disadvantages of each algorithm are reviewed in Bigoni et al. 62 and Soman et al. 63 In practice, ML models are typically adapted to reposition the existing sensor locations to increase overall system performability in terms of damage detection. It offers simple, adaptable, and low-cost solutions over traditional approaches.

Data normalization
Some of the collected data for learning is generated by numerical simulations, which often disregard EOFs (e.g., temperature and traffic load, respectively). Many researchers investigated these nonstationary sources of variations. Several works 15,19,[72][73][74] have shown that the dynamic performance of bridges varies significantly depending on the condition that the bridge is subjected to daily. The effect of these trends on damage-sensitive features could be removed by utilizing different linear and nonlinear correction models. Data normalization at the first stage tries to bring every data at a common scale since the Bigoni et al., 60 Semaan 71 --data from various sensors are inconsistent depending on the size and location of the damage. The simplest way is the Zscore normalization as shown in equation (1), where x represents the original feature vector, x represents the mean of the feature vector, and σ represents the standard deviation. The numerator is also known as DC offset filtering The next stage is to take into account the EOFs. Depending on the presence of the variabilities, either the damage-sensitive features are parameterized as a function of the EOFs (measured and known variables) and later compared with a new set of extracted data, or they are developed indirectly with the help of ML algorithms. Table 4 shows some examples in dealing with EOFs in linear and nonlinear behavior for data normalization. To achieve full data-driven analysis, EOFs have to be analyzed in a data-driven manner as well. However, certain conditions, such as long-term behavior of EOFs, lack of data, and insufficient data normalization, necessitate the usage of model-driven analysis. In hybrid use-cases, the FEM of a structure is used to generate damage scenarios resulting from EOFs. These scenarios are then fed to an ML algorithm to classify the existence of the damage better. Moreover, to validate the results, damage data from FEM are used as test data. Minimizing the misclassification of both Type I and Type II errors heavily depend on the model that is used to normalize the data from EOFs 75 -Type I refers to a false-positive indication of damage, whereas Type II is a false-negative indication of damage. In their article, the authors' nonlinear ML algorithms performed better when compared with their linear counterparts (less than 3% of total error for combined Type 1 and Type 2), meaning it better illustrated the conformity to the nonlinearity of EOFs. It appears that the consideration of EOFs and their patterns appear the be nonlinear in nature. For example, freezing temperature can heavily influence the natural frequency, and therefore, a linear correlation between the EOFs and damage cannot be assumed. The best performance models, according to the works, are those that utilize nonlinearly separable clustering techniques such as KPCA or GKPCA.

Data cleaning
One should not expect that collected data from sensors are always satisfactory and up to par. Loosely mounted sensors or external effects can reduce the quality of the data. Data cleaning, in simple terms, refers to implying hard limits in which the data is not usable and must be discarded (noise/ outlier treatment) or missing data to be imputed. 37,79 In the context of SHM and big data, five distinct quality standard indicators, 80 namely: (1) availability, (2) usability, (3) reliability, (4) relevance, and (5) presentation-quality, must be taken into consideration before any SHM implementation. Unification of data ensures the efficiency and accuracy of an ML algorithm. In SHM, typical data cleaning is performed via either software or hardware filtering methods. 81 Embedded in data acquisition devices, noise rejections with low/high bandpass filters, resampling, or other techniques can be employed. Unusable data can be decimated, and missing data can be statistically imputed, if necessary. Due to the requirement of compression for big data, in many applications, data reconstruction using ML algorithms is utilized to make incomplete data (irregular) to corresponding complete data (regular). DL applications 82,83 in data imputation work great for categorical and non-numerical features such as the case in SHM. However, they are rarely used due to their slow nature when dealing with extensive datasets. Other methods such as k-nearest neighbors (kNN), stochastic regression, extrapolation and interpolation, and many others can be employed to correct or remove irrelevant data. Recently, Tan et al. 84 investigated the effectiveness of multiple supervised learning methods (Ridge, RF, SVR, MLP, and XGBoost) for data augmentation under different missing rates of the inputted database. All models were able to capture the missing trend when the missing data are uniformly distributed, and SVR and MLP performed best on average with root mean square error (RMSE) of less than 2. Some examples of data cleaning and data reconstruction techniques are presented in Table 5.
Perhaps, one of the critical challenges that data cleaning faces is when the process is scaled to large and complex structures. 95 In the context of big data analysis, traditional  75 Figueiredo et al. 20 78 KPCA N/A -data cleaning working sequentially cannot easily be applied to ever-growing complicated structures. 96 Thus, the parallel execution of any method should be in line with the five big data quality standards discussed previously. The data cleaning process is always performed before initiating the subsequent ML processes. However, it may also appear after extracting features as well. In an ideal situation, one data cleaning pipeline is enough assuming the selected features, in the beginning, would lead to identifying damages more efficiently. However, in reality, given the judgment of the engineer and the required outcome from the algorithm, it may be deliberately assumed that certain features, although having passed the initial filtering process, would not be helpful in determining damage. Therefore, the second round of cleaning, referred to as postprocessing, is carried out to ensure that individual decisions about the nature of the ML implementation conform to the model's output. An example of how postprocessing led to better damage identification was presented in the paper by Li et al. 97

Data compression
Structures equipped with SHM consist of tens or hundreds of different sensors. Each produces single or multi-feature data continuously with various sampling rates ranging from around 10 Hz to 10-50 kHz. Over an extended period of monitoring, a multitude of data is generated, although not every generated feature is usable in the analysis. Adding to this, EOFs would also play an essential role in increasing the features' dimensions. 8 In this regard, data compression, or simply put, dimensionality reduction of the features, allows only the most statistically significant and damage-sensitive features to be extracted. One way to tackle this is fusing sensor arrays extracting similar features such as different mode shapes that are collected at each sensor node that is later compressed to produce a low-dimensional feature vector containing only the first few mode shapes.
The most significant limitation in ML algorithms is when they are used to learn from high-dimensional data vectors with limited exogenous variables. 98 Data compression should not come at the cost of losing the ability to learn a pattern. Without enough features extracted following compression, it is not possible to deduce whether an algorithm can serve as a damage identifier. An exponential increase in the dimension of the data would blow up enormously in the amount of training data needed to achieve a reasonable and small error on the estimation; an issue commonly referred to as Bellman's curse of dimensionality. 99 Therefore, with inadequate training data, one cannot achieve an ML implementation with a high degree of accuracy. This effect, however, can be mitigated by implementing linear or nonlinear projection transformation of p-dimensional feature vectors onto a q-dimensional plane. The most classical method used is the linear principal component analysis (PCA). One might argue that, with the addition of EOFs into the data vector, due to the nonlinear behavior of temperature and external loading, linear PCA would not be a feasible solution to reduce the dimensions. This was, however, proven to be false in real-life scenarios as comprehensively analyzed in the study by Van Der Maaten and Postma. 100 In their paper, the authors found out the linear PCA works comparatively better than their nonlinear counterparts. With enough analysis and performance measures of different variations of nonlinear data compression, future research can create better techniques to identify nonlinear behaviors of structures as well as EOFs that can lead to better data compression. Some recent examples of such techniques utilizing ML algorithms are introduced. 77,101,102 Figure 5 depicts how dimensionality reduction models are categorized. In particular, to reflect the nonlinearity of data, there are two approaches where the high-dimensional data is transformed onto a lowdimensional space work. 103 In a local perspective, the local geometry of data is preserved, and the model attempts to map nearby points on a set of closely related points in the Table 5. ML-based data decimation and imputation techniques.

Reference
Decimate/Impute Method(s) Application Ren et al. 85 Impute Bayesian tensor learning Strain and temperature records of a concrete bridge Chen et al. 86 Impute Kernel regression Strain responses between sensors in SHM Chen et al. 87 Impute LQD-RKHS regression Probability distributions of missing SHM data Martinez-Luengo et al. 88 Impute ANN Offshore wind turbine Oh et al. 89 Impute CNN Strain structural response Fan et al. 90 Impute CNN Recovering lost vibration data Li et al. 91 Impute LSTM Stacked DL-based imputation framework for dams Fan et al. 92 Decimate Residual CNN Denoising SHM vibration data Yang et al. 93 Decimate/Impute 1) Chebyshev inequality Wind turbine 2) SVR Batista et al. 94 Decimate/Impute Ten different methods Balancing training models vicinity. In the global approach, in addition to the mapping of nearby points, faraway points are also mapped to faraway points, essentially keeping the geometry intact in all scales.
Typically, there exist three methods to achieve data compression: (1) linear transformation such as PCA, (2) nonlinear transformation such as NPCA, and (3) autoencoders, where it is a nonlinear transformation based mostly on ANN. Table 6 exhibits the most recent utilization of widely used dimensionality reduction models in literature, emphasizing SHM applications. From reviewing the literature, it was found that most SHM applications are utilizing PCA, LDA, QDA, and ICA in the linear approach and NPCA, LLE, and AANN in the nonlinear approach for dimensionality reduction.
There is no clear cut between which dimensionality reduction model is suitable. The variability in the nature of the collected data and the size and resolution of the input signals greatly influence the system's overall performance. For example, very few techniques (PCA and its variants, autoencoders, etc.) are parametric; that is, there is a direct mapping from the high-dimensional to low-dimensional space. Therefore, it enables to verify, to some extent, how much of the high-dimensional space was preserved during the space reduction process. Being that majority of the other techniques fall in the non-parametric domain, it indicates a disadvantage along with other problems with non-parametric modeling, such as the curse of dimensionality. Moreover, the presence of free parameters in nonparametric techniques (learning rate and the number of iterations), which can impact the cost function in the nonlinear convex optimization, introduces another burden. The performance of the dimensionality reduction technique depends on the optimization of the free parameters.
Although, it needs to be stated that these free parameters are actually advantageous in various cases as they promise flexibility in the reduction process. The other issue of nonparametric techniques is in the computational process since they require more data for better performance. One of the important requirements in dimensionality reduction models is the out-of-sample extension abilities. Simply, the out-ofsample extension performs the reduction process on the training set and applies the mapping directly on the test set to lower the dimensions. Such a capability is very crucial to SHM systems, as different signals can be embedded during the monitoring process. This eliminates the need for retraining the whole dataset to learn new mapping functions, which is less computationally expensive. However, not every dimensionality reduction technique contains out-ofsample extensions (Isomap, LLE, etc.). The non-parametric out-of-sample extension is therefore required for all nonlinear models. The approximation in the out-of-sample extensions leads to an estimation error, 104 so great care must be taken in these cases. The out-of-sample extension is not one of the deciding factors in selecting dimensionality reduction models in the SHM application. In fact, only two studies, Langone et al., 105 and Liu et al., 106 have directly considered this capability in their proposed systems.
The majority of the examples of dimensionality reduction techniques used in the SHM domain do not provide a definitive reason as to why one technique was preferred over the other, except those that research and compare different methods for reducing dimensions and filtering unwanted data. Therefore, it becomes increasingly difficult to recommend one technique. Based on the authors' observation, linear and nonlinear variations of PCA can yield, in most cases, an acceptable level of data compression. Nonlinear and clustering approaches are mostly preferred in situations where the data is highly irregular in nature, such as EOFs. Having said that, other approaches can produce better results as this is highly dependent on the dataset and the available computational resources. Interested readers are referred to the recent comparative study of dimensionality reduction techniques by Ayesha et al. 107 for more detail. Finally, the considerations below and the remarks in Table 6 can facilitate selecting an appropriate dimensionality reduction technique: 1. Understanding that some techniques are supervised or unsupervised entailing their own limitations and considerations. 2. Deciding which dimensions to retain and the realization and comprehension of the reduced model and new dimensions. 3. Recognizing that these dimensionality reduction techniques can sometimes negatively impact the performance of the classifier.   76 Liu et al., 106 Güemes et al., 108 Roberts et al., 109 Mei et al., 110 Li et al., 111 Akintunde et al., 112 Deraemaeker and Worden, 113 Kullaa,114 García-Macías and Ubertini 115 PCA Highly interpretable Requires data standardization Assumption of orthogonality Huang et al., 116 Li et al., 117 Zhu et al., 118 Yao et al. 119 ICA Assumption of statistically independent and normally distributed variables order of the independent component is difficult to be determined Zhang et al., 120 Avendaño-Valencia et al., 121 Hu et al. 122 PCR Perform well for highly correlated and colinear data Imposes constraints on the coefficients of nonrelated explanatory variables Jiménez et al., 102 Yanez-Borjas et al., 123 Mboo and Hameyer, 124 Mangalathu et al., 125 Zheng and Qian, 126 Mishra et al. 127 LDA Interpretable Small sample size Requires normal distribution assumption Mangalathu et al., 125 Mishra et al., 127  FA Unable to produce a meaningful pattern for unrelated explanatory variables Guo et al., 130 Zhao and Huang 131 SFA Guaranteed optimal solution Mapping is provided as functions directly Nonlinear Figueiredo and Cross, 76 Dervilis et al., 101 Li et al., 132 Hsu and Loh, 133 Ye et al., 134 Tibaduiza et al., 135 Silva et al. 136 NLPCA Does not require the a priori specification of a time series Performs poor in very large datasets It incorporates nominal and ordinal variables Borate et al., 137 Yang et al. 138 PPCA Extends the scope of conventional PCA It enables uncertainty assessment of the model Fuentes et al., 139 Jeong et al. 140 Isomap Preserves "true" relationship between data points Preserves the global data structure Computationally expensive sensitive to "noise" examples Liu et al. 106 Laplacian Eigenmaps No local optima Less geometrically intuitive Zhang et al., 120 Sun et al., 141 Chaabane et al. 142 Partial least squares regression Can handle multicollinearity Lack of model test statistics Flexa et al., 77 Oh et al., 78 Ghoulem et al. 143

KPCA and its variations
Mapping function does not need to be known a priory Choice of the kernel and multiple refitting are required Dervilis et al., 144 Xiao et al. 145 Nonlinear ICA and its variations Back projection/reconstruction can be implemented More complex than ICA Dervilis and colleagues, 146,147 Sun et al. 148 LLE Accurate in preserving local structure Less accurate in preserving global structure difficulty on non-convex manifolds Flexa et al., 77 García-Macías and Ubertini, 115 Nguyen et al., 149 Zhang et al. 150 AANN Mapping function does not need to be known a priory High computational complexity Autoencoders Ma et al., 151 Wang et al. 152 DA Can find different levels of features Liu et al., 106 Mboo and Hameyer 124 SA Can be inefficient for massive data

Feature extraction/selection
Identifying damage-sensitive features from the collected data is not a trivial task. Not every quantity is significant in indicating the presence of damage, nor do they correlate in any way that leads to detecting damage even with the most advanced ML algorithms. Feature extraction is a process that enables the transformation of the collected data to a form that is more identifiable and quickly picked up by any simple ML algorithm. The most critical aspect of this step of any ML-based SHM implementation is finding ways to extract and select sensitive features that positively correlate to damage. The challenge in this regard is that the extracted features may also be vulnerable to changes in the system's response that do not necessarily relate to damage. Figure 6 illustrates the three different approaches in feature selection/ extraction. Other than data-driven and model-driven techniques, feature extraction can also be done using wave-propagation and impedance-based methods. They are a subset of datadriven techniques, but due to the unique devices and their specific extraction methods, they can be quite different relative to data-driven and model-driven methods. However, there have been many studies in the past that enhanced the capabilities of these systems with ML and DL. A review of the guided wave-based SHM is provided by Mitra and Gopalakrishnan. 153 Extraction of damage-sensitive features in data-driven approaches can be carried out by using (1) time-domain, (2) frequency-domain, (3) time-frequency domain, or (4) ML algorithms. In classical time-series analysis, the variety of changes in the structures can be fitted to a model that identifies damage. Low-order (dimension) time-series modeling, techniques such as autoregressive moving-average model (ARMA), 154 cross-correlation function (CCF), 155 and stochastic subspace identification (SSI) 156,157 can be proven useful in extracting highly damage-sensitive feature vectors. In scenarios where the dimension of the vectors is high, using high-order timeseries modeling to capture the variations may result in fitting the noise in the collected data.
In these cases, extracting features in the frequency domain can be used. Methods such as power spectral density (PSD), 158 cross spectral density (CSD), 159 impulse response function (IRF), 160 frequency response function (FRF), 161 and frequency domain decomposition (FDD), 162 are few of the methods that are frequently used. New techniques other than the two described above are being developed and tested to identify damage-sensitive features better. These methods are based on time-frequency domain, such as wavelet or phase shift in a linear and nonlinear fashion. [163][164][165][166] A detailed walkthrough of the wavelet technique is provided in the paper by Taha et al. 167 The constraints in time-domain, and frequency-domain waveform analysis rises in significance as the dimension of the data of these output-only models increases. Moreover, they do not precisely indicate the location of the damage and require a high quantity of data for sensitivity analysis as the reproducibility of the models in different time frames are inconsistent when factoring the EOFs. These shortcomings can be overcome by incorporating ML algorithms with their inherent features in extracting data (dimensionality reduction) or feature selection (filter, wrappers, and embedded). 168 Model-driven techniques, on the other side of the spectrum, solely depend on accurate physical models to identify damage. They usually fall into the model updating paradigm (linear and nonlinear), where mathematical models with input-output or output-only measurements are used to identify modal parameters (e.g., mode shape, mode shape curvature, and resonance frequencies) given an unknown system and then calibrate the physical parameters. When a high fidelity physical model of the system is established, numerical analysis such as FEM can be used to update the initial model on the grounds of the identified parameters from the system. Model-driven techniques, although proven useful in scenarios where sufficient data is unavailable, would become challenging when EOFs are incorporated into the FEMs as described earlier. In an attempt to develop accurate feature extraction methods, a hybrid model of both data-driven and model-driven techniques is used that could overcome the deficiency of each approach. The two systems can be used jointly to validate the presence of damage, or the model-driven system can be exploited to generate training/testing data for the datadriven system. Through this, a non-modal-based FEM updating process can be developed to provide a better representation of the system from both a global and local perspective. 169 One of the benefits of non-modal-based FEM updating is that inherent to the classical modal FEM updating, the strict assumption that structures must exhibit linearity, reciprocity, and time-invariant properties can therefore be lifted and eased for the systems that show nonlinear and nonstationary response.
After extracting feature vectors that correlate to damage, one must select the most appropriate feature(s) to be transferred to the damage detection process. Though previously an implicit explanation of selection procedures was given, determining a subset of collected data for detection purposes can be established from mathematical models or based on intuitive and engineering judgment. An overview of feature selection techniques is given in the studies by Khalid et al. 168 and Chandrashekar and Sahin. 170 It should be noted that while both feature extraction and feature selection are the techniques used to reduce features and eliminate redundant and irrelevant data, the contrasting point between the two is that the former creates a brand-new set of data. In contrast, the latter creates a subset of the original data. This way, there is no clear boundary between them, and it ultimately boils down to the application domain and the system's requirement.
Moreover, feature selection and extraction can work coherently and synergistically to inform one another in the form of change detectors. 171 Few ML algorithms, such as Lasso or random forest (RF), have built-in selection algorithms that fall under the category of the embedded selection method. Filter and wrapper methods are the other two selection procedures. 172 Filters are employed in producing a most promising subset before passing on to the damage detection process as part of the pre-processing step. Whereas in the case of the wrapper, as the name suggests, the selection procedure is "wrapped" into an algorithm that is trained based on a model. It either begins with no features and with each iteration, features are added that translate to the best performing model (forward selection). Alternatively, the model starts with all features intact, and at each iteration, the least significant feature is eliminated, thus improving the overall model performance (backward elimination). Lastly, a form of the greedy optimization algorithm can be used to rank the feature subset at each iteration (recursive feature elimination). To put it simply, these wrapper methods are used as a search strategy, and the performance of each is dependent on the quality of the given algorithm.
Additionally, these approaches can be combined to form an ensemble learning method built on the output of different algorithms or learners and is believed to produce a better selection process. 173 Though not many cases of ensemble feature selection are present in SHM, the proliferation and advancement in this recent field can enable a selection of features from large and complex structures that comprise different sensors and measurement devices with high-dimensional datasets. Moreover, many feature selection techniques are based on supervised methods of searching for the best subset of features. With no prior knowledge about the collected features, the risk of overfitting the selected features can be reduced if unsupervised or semi-supervised methods can be used instead. 174 Similar to the supervised models described earlier, unsupervised feature selection techniques essentially follow the same routine. A recent review of unsupervised feature selection methods is provided by Solorio-Fernández et al. 174 In the context of SHM and ML, Bull et al. 175 utilized an unsupervised feature selection procedure in ensemble analysis, specifically bagging and feature bagging. An unsupervised filter method, namely unsupervised minimum redundancymaximum relevance (UmRmR), was used in the study by Zugasti et al. 176 to minimize redundant and irrelevant data from offshore wind turbines. Based on our analysis, unsupervised feature selection has not received enough attention. As discussed in this section, classification based on unsupervised learning is one of the hot topics in damage identification in civil infrastructures through ML and SHM. Table 7 shows some of the supervised feature selection techniques used in the literature, emphasizing SHM.

Data fusion
As discussed previously, structures comprise a plethora of sensors. In order to perform global damage identification, sensors are subject to spatial and typological variation. Multisensory systems are receiving increasing attention since they provide a spectrum of advantageous features. Higher SNR, higher data resolution, data redundancy, and complementarity and timeliness are some of the potentials of such systems. 8,195 Additionally, the spatial distribution of these sensors can enable engineers to increase the SHM system's observability. Additional information collected from different sensors located on the structure can lead to enhancement in the identifiability of the SHM system. Data fusion is a technique of combing information in such a way that the aforementioned benchmarks, that is, observability and identifiability, would considerably improve the system performance. [196][197][198] As it can be derived from the norm of the data fusion process, the steps taken to ensure high system performance are correlated to data normalization as well as data cleaning techniques that were discussed in the prior sections. Similar ML techniques can be used in the same way, and interested readers are recommended to read the extensive review of the data fusion process by Wu and Jahanshahi 199 for ML-SHM conjunct review.
There are, in general, two ways to amalgamate data. Multiples of a single chain of data processing scheme, for example, steps 1 to 6 in Figure 4, are fused to have an arbitrary number of chains with each processing their data individually and feed-forwarding their outputs to a PR unit. In the other type, a centralized batch processing system accepts the chains of the sensor up to the data compression unit, for example, step 5 in Figure 4, then feeds all the collected data to a feature extraction unit and finally passes them to a PR step. It is also possible for data fusion to be applied at each level individually or collectively if desired. 8 Raw sensor data can be collected from multiple units and combined to produce a more uniform data set (initial fusion). It may also be applied in the same way as the data normalization or data compression level, meaning the feature vectors can be merged to produce a single vector (feature-level fusion). Furthermore, many feature vectors can be passed to a single PR algorithm (pattern-level fusion). Lastly, damage classifiers resulting in the health state of the structure can be fused to provide a high degree of damage identification confidence level (decision-level fusion).

ML-supported pattern recognition techniques
The prior section described the seven initial steps that are necessary for implementing a data-driven SHM system. As it was previously shown in Figure 4, the last step of an MLaugmented SHM system is the PR, that is, the identification of the health state of the structure. The 5-step damage identification hierarchy was introduced at the beginning of this article. More specifically, in the context of statistical PR, to accurately classify damage, one needs to ensure data availability from either damaged, undamaged, or even both conditions. Additionally, in order to be able to assess and predict the damage caused to the system, the selected damage-sensitive features fed into the PR unit have to correlate to the choice of the algorithm used to learn from the features. In general, there are four different types of learning in the domain of ML, as depicted in Figure 7.
In the framework of SHM and statistical PR, the most common learning algorithms are supervised, unsupervised, and semi-supervised. In rare scenarios where both damaged and undamaged data of a structure is available for engineering structures, supervised learning is the preferred learning method. In this case, group classification and regression analysis are the primary methods of supervised learning. However, for larger and more complex civil engineering structures, unsupervised learning is required due to the lack of damaged data. 20 This is regarded as the go-to method for most civil infrastructures, such as bridges, where there is limited availability of damaged data from the structure, or it is found not feasible to collect data on a global scale in the first place. In this context, the unsupervised method is commonly referred to as novelty detection or outlier detection. 14,200-202 A more recent learning method, semi-supervised learning, has been introduced where data label from damaged and undamaged states of the system is partially available-a very common occurrence with engineering structures. 203 It is essential to understand the criteria for choosing an algorithm. Specific to the problem and the type of data available, different algorithms, or even a combination of algorithms, are required, for example, the effect of EOFs on identifying damages. The number of training data available, the expected reasonable training time, and the amount of accuracy required from a learner are some of the considerations that must be meticulously thought about before selecting an algorithm. Moreover, in a simple SHM for small-scale systems, the number of parameters and features is typically low. However, for many of the implemented SHM systems, these two criteria can ultimately make one algorithm superior to the other. Various papers have shown the different cases of these ML algorithms in their reviews. 36,37 However, a clear cut between the utilization of different learners is somewhat hidden in the core of their review strategy. We, therefore, aim to expand on why such leaners are chosen as the method of implementing a data-driven SHM system. Also, several new methods specific to the deep NN are introduced. Whenever available, the authors will refer the readers to a particular review paper about each type of ML algorithms. In the subsequent sections, some of the widely used ML algorithms are explained. The implementations of each method provided are primarily from mid-2019 onwards. In order to realize how each implementation relates to different SHM damage identification levels, a snippet next to each cited reference is provided, indicating the SHM level (e.g., Levels 1-3). Furthermore, a separate section is dedicated to SHM level 5 (damage prognosis), where the recent ML-supported applications are summarized. Finally, based on the reviewed papers, a recommended SHM system with the best examples of ML and DL for each SHM component up to damage prognosis stage is discussed at the end of the section. A summary of the most common learning algorithms is depicted in Figure 8.

Decision tree (supervised)
Decision tree (DT) is a well-established learning method capable of partitioning datasets from a non-parametric point of view. They are capable of targeting categorical variables (classification: damaged/undamaged) and continuous variables (regression: source signal comparison with the healthy state of the system). A tree starts with the root node representing the input feature(s), such as acceleration data. Given a threshold set by the algorithm, the root node is partitioned into many child nodes (internal nodes). The segmentation of each node is based on the node that results in the most significant information gain, called purity. The process is repeated until the last node (leaf node) is reached such that the node becomes impure. In the context of SHM, the input feature at the node could be wind direction and wind speed, and the testing attribute for splitting each node is the observed PSD of the measured vertical acceleration. This example was demonstrated by Li et al. 204 (Level 1) as a means of classifying vortex-induced vibrations that may result in long-term fatigue damage of long-span bridges subjected to crosswinds. In their methodology, the root node was assumed to be wind speed.
However, in cases where N multiple damage-sensitive features are present, selecting the root node and the internal nodes is not trivial. A random selection can undoubtedly lead to poor results. For such cases, there exist many statistical attribute selection measures that can be used to solve this issue. Entropy, Gini index, and Chi-square are some of the ways to achieve selecting the most feasible starting root node and internal nodes. Typically, in the SHM system, DT learning is of the classification type due to the large highdimensional feature vector and stochastic behavior of structures given ambient excitation. The study by Gordan et al. 205 (Levels 1-2,4) on a slab-on-girder bridge showed that the classification and regression tree (CART) learning method is incapable of competing with AI algorithms such as ANN due to lack of capacity, flexibility, and complexity. Another area that decision trees can be applied to is in combining other ML models such that a higher accuracy can be achieved sequentially by traversing down the tree by assigning a model as a function of the input variables of the preceding node. 200 Zhang et al. 206 (Level 1) demonstrated that for real-time visible fatigue crack growth detection with computer vision, DT performs the best when compared with RF, kNN, NB, RF, NN, and an ensemble model. They concluded that other than DT, the rest of the ML algorithms used tend to overfit. Mariniello et al. 207 (Levels 1-2) proposed DT ensemble damage detection and localization down to the single structural elements. The authors achieved an accuracy score of 90% or more while recording limited localization errors. As it can be observed, with the simplicity of DT and flexibility it provides, damage detection and partial localization can be achieved. However, there is still no research on the utilization of DT when EOFs are included.
The main problem with DTs in a high-dimensional space, such as the cases in engineering structures, is the overfitting issue. In this situation, the model memorizes the training data and returns an unrealistic representation of the domain leading to poor predictions when tested with new data. Although there are methods to overcome overfittings such as cross-validation of a parameter 204 or hyperparameter optimization, a common way to tackle this is to use individual trees as an ensemble, referred to as random forest.

Random forest (supervised)
As explained in the previous subsection, RF is an ensemble of DTs capable of solving regression and classification problems. The issue with DT with extensive features can be overcome by using RF, which works quite well with highdimensional sparse data. The main advantage of RF compared to DT is that each tree is constructed from a random set of training data and the splitting of nodes also happens for a random subset of attributes. In the end, a simple average of all the predictions can be used to find the most probable outcomes of the model. RF works best if the trees are not strongly dependent on each other and show a weak correlation between the attributes selected at the splitting node. An example of RF was demonstrated by Laory et al. 208 (Level 1) They showed the effect of EOFs on the natural frequency of a suspension bridge, where they found RF and support vector regression (SVR) to be more suitable when compared to multiple linear regression (MLR), ANN, and DT. However, the input-output model of a slender coastal bridge presented by Lu et al. 209 (Level 1-2,4) for fatigue damage assessment due to EOFs showed the randomness behavior of RF learner with a high degree of variability (high standard deviations) when compared strictly against the regressor function of SVM and Gaussian process (GP). Relatively speaking, a higher amount of training data and a better choice of the kernel would lead to better results for SVR as it is susceptible to the initial parameters. This could explain the reason why these two papers appear to contradict. Chencho et al. 210 (Level 1-2,4) developed a structural damage quantification based on RF and PCA for dimensionality reduction. The authors achieved an R-score of 89.2 and 95.3% for single-element and two-element damage cases, respectively. Unlike DT, RF is not at all interpretable, and for large datasets, they can take a long time to train.

Support vector machine (supervised)
Perhaps, one of the most widely used types of ML algorithms is SVM, both in its classifier and regressor form. It can map both linear and nonlinear data to an n-dimensional feature vector where the hyperplane separates features into separate classes while maximizing the margin distance between them. Kernel functions achieve the transformation of the data into a higher space. As discussed before, the accuracy of SVM highly depends on the choice of kernel. Generally, in SHM applications where the features are both abundant in amount and exist in the high-dimensional domain, SVM usually outperforms supervised ML algorithms when provided with a suitable choice of kernel functions-which is not always trivial, as demonstrated in the study by Lu et al. 209 A polynomial kernel function was used in Gordan et al. 205 (Level 1-2,4) for damage identification of slab-on-girder bridge. The authors concluded that SVM proved to be superior due to its capacity to perform high-quality predictions compared to a CART method. One of the downsides of SVM is that although the increased size of training data leads to better predictions, the training time increases exponentially as well. In order to overcome this limitation, least-squares SVM (LS-SVM) was proposed. This method finds the solution by optimizing a set of linear equations rather than the quadratic programming (QP) method used in SVM. An implementation of LS-SVM as a hybrid model-driven and data-driven SHM system was presented by Deng et al. 211 (Level 1-2,4-5). In their paper, the authors showed the daily fatigue damage due to traffic using the weigh-in-motion (WIM) and regressor function of SVM for highway suspension bridge hangers. SVM can also enhance DL damage detection. In identifying damage using DL, one of the drawbacks of such a method is the misclassification of the input data of an unlearned pattern as that of a learned pattern. A validation of this method using a shaking table and simulated training data of a steel frame structure using deep NN was established by Kohiyama et al. 212 (Level 1). SVM was used to detect an unlearned damage pattern based on the feature data of a DNN. An efficient and precise nonlinear multiclass SVM (NMSVM) of nonlinear time-varying structures was proposed by Chong et al. 213 (Level 1). Their algorithms were trained using many wavelet-based autoregressive coefficients that were found from applying wavelet transform to signals generated from healthy and damaged structures under random excitation. Other than the above, SVM can also be used in other parts of ML procedures such as sensor placement optimization, data normalization, and feature extraction/selection, as discussed in the previous sections.

k-Nearest neighbor (supervised)
k-Nearest neighbor (kNN) is one of the earliest and simplest supervised learner methods. Similar to the previous algorithms, it works well for regression or classification tasks. It classifies the input training features based on their distance from the testing set. kNN is based on the idea of similarities in properties of features discriminated from the feature space. The selection of how many neighbors to consider is a function of noise in the data. Low-dimensional feature space requires less training data. In SHM cases where the dimension of the features is high, higher training data is required, which results in a computationally expensive process. kNN algorithm with Euclidean distance measurement with six neighbors for damage detection of a scaled-down cable-stayed bridge was demonstrated in Li et al. 214 (Level 1). The authors showed the performance of traditional ML algorithms, mainly DT, RF, SVM, and kNN, against their proposed CNN model. It was observed that the accuracy imbalance of kNN was the most severe compared to others due to the lack of complexity of the learner. The study by Dogan et al. 215 (Level 1-2,4) developed a model for determining post-earthquake damage to RC columns. In the form of cracks, the damage was identified employing a camera and evaluated against the allowable ranges from the building code. kNN, DT, SVM, and LDA were checked against the ensemble of these algorithms. The success rate of each one resulted such that the ensemble method was found to be the best one, followed by kNN with a small margin of error. One area in that kNN works surprisingly well is data imputation in SHM systems. Inherent in its algorithm, it looks for the closest data to infer the missing value as demonstrated in different studies. 216,217 Bayesian (supervised) Bayes analysis is a probabilistic parametric learning method and is considered a statistical learning approach on the basis of measuring conditional probabilities of certain statements given other statements, as shown in equation (2) Posterior ¼ Prior × Likelihood Evidence (2) Bayesian approach is widely used in SHM applications. They are either used alone or are integrated with different ML algorithms such as Bayesian clustering. The Bayesian method can bring about numerous benefits, including probabilistic inferences. For example, a pro-active SHM solution using FEM and Bayesian network was proposed by Sousa et al. 218 (Level 1-4). The authors developed a monitoring solution based on numerical analysis of a real bridge and achieved acceptable performance when compared with actual damage data. Their pro-active tool managed to demonstrate the first four levels of damage identification, which can provide useful information for online bridge management and monitoring.
Naïve Bayes (NB) classification is one of the methods of Bayes' theorem. In this method, it is assumed there is no dependency between the features. In the study by Mangalathu et al. 219 (Level 1), eight different ML algorithms, including NB, kNN, DT, RF, and others, were used to establish a classification model of seismic failure of RC shear walls. It was found out RF had better accuracy while NB classifier fell short and was ranked sixth. They stated that the low accuracy of their parametric methods, that is, NB, was because of the existence of a nonlinear decision boundary between the failure methods. In a similar study by Mangalathu et al. 125 (Level 1), a rapid seismic assessment of a two-span box girder bridge was analyzed based on simulations from Open System for Earthquake Engineering Simulation (OpenSees) platform. In their paper, the authors evaluated their models using NB, kNN, QDA, and RF. Analogous to their previous paper, RF performed the best, while NB performed better while classifying the bridge as unsafe. Although Bayesian analysis is found superior to highly complicated learners, the main disadvantage is that since it assumes no interdependency between the features, the estimation of probabilities may not be accurate if the assumption does not hold.
An extension of NB, Gaussian NB (GNB) was used in the study by Nazarian et al. 220 (Level 1-2,4) for a turn-ofthe-century building structure that was damaged due to settlement of its foundation. The authors employed FEM to generate stiffness and strain dataset and later train GNB in addition to SVM and NN algorithms to effectively find the location and the severity of the damage in each structural component. Out of the three algorithms, NN yielded better results; however, GNB was not far off. Soyoz et al. 221 incorporated Bayesian updating into reliability estimation of bridges through vibration-based SHM readings before and after damage. While conditional independency in NB is imposed only on the selected features in the Bays nets, Bayesian network, on the other hand, assumes an independent relationship in every class. Although NB makes a simplified assumption, both can perform equally in many scenarios, provided that both are used for inference purposes. Lee and Song 222 (Level 1) demonstrated the Bayesian network approach for system identification in a numerical example. Compared with FEM updating and maximum likelihood estimation, the authors' approach provided a more robust and stable system identification scheme. A Bayesian network for near real-time seismic damage assessment was proposed by Tubaldi et al. 223 (Level 1-2,4). It incorporates multiple heterogeneous data sources, namely, ShakeMap, GPS, and accelerometers placed on a structure for the response, damage, and loss estimation by comparing prior and posterior statistical distributions. Each of the three sources of information reduces different engineering demand parameters defined in the article when used individually. However, the proposed Bayesian network data fusion techniques resulted in uncertainty reduction. Similarly, Bayesian techniques can quantify uncertainties of damage-sensitive features, for example, modal characteristics 224 and can merge multiple techniques in an ensemble learning form to reduce algorithm-induced false positives and negatives. 225 For more detail in Bayesian methods, especially considering natural hazards engineering, interested readers are, therefore, referred to the very recent review by Zheng et al. 226 As opposed to the "black-box" nature of other ML algorithms, in contrast, since Bayesian methods assume the prior knowledge or assumption of a hypothesis, this enables for a more transparent statistical inference. Being that it represents a probabilistic distribution for both data and the model, various data types and parameters can be easily integrated for a robust and flexible classifier. Although different Bayesian approaches, as reported earlier, can benefit many SHM systems, it still does not offer out-of-thebox solutions for issues such as subjective selection of prior probability distributions and the computationally expensive procedures needed for integrations over uncertain parameters in the distribution.

Neural network (supervised/unsupervised)
ANN approaches in damage detection take after the working components in a human brain. In general, ANN consists of at least three layers, namely, (1) input layer, (2) hidden layer(s), and (3) output layer. ANNs with one or more hidden layers are called multilayer perceptrons (MLP). ANN can be viewed as an optimization process that identifies a set of network weights that minimize the cost function. 227 Such an approach has been widely used in past works as it allows for various inputs and outputs to be included. In a feed-forward ANN, with independent variables at each input neuron, the processing and the calculation depends on variables at hidden and output layers(s), respectively. The model is trained with an error propagation algorithm. This method of training is considered to be a supervised learning approach. In a recent study, Hekmati Athar et al. 228 (Level 1) experimentally collected sensor data from both contact-based and contactless-based sensors to identify damage on a lab-scaled bridge through an ANN model. In another work, Malekjafarian et al. 229 (Level  1-2,4) proposed a two-stage bridge damage detection based on the response of a moving vehicle. In the first stage, ANN is trained with backpropagation to predict the response of the passage of a vehicle on the bridge. In the second stage, however, with the help of a Gaussian process, the change in the prediction errors' distribution is detected; hence, the damage is indicated. Although MLP and backpropagation learning methods are typically employed in damage detection of civil infrastructure, unsupervised NNs have also been considered. Self-organizing map (SOM) approach is an example of unsupervised NN. SOMs are observed as grids of neurons where they attempt to show highdimensional data in a 2D or 3D map while preserving the original feature(s) properties. 172 The advantage of the SOM learning method in comparison to the more traditional ANN approach is that SOM training depends only on the internal structure of the inputs rather than the input-output samples with error propagation as there is no target defined. Tibaduiza et al. 230 (Level 1-4) used SOM by applying PCA data reduction to classify damage on an aluminum plate in a two-stage validation and diagnosis mode. In another work by Avci et al. 231 (Level 1-2), SOM was applied on a grid structure based on the stiffness reduction and boundary condition changes to identify and quantify the damage. Even with optimized learning methods, a large amount of training time and being computationally heavy can deter the implementation of ANN in some SHM applications.
Other than ANN and MLP, several other shallow and deep NN algorithms exist in the literature, although very few are popular in the SHM community. Perhaps the most common DL approach is CNN. Recently, more research is being carried out in CNN for SHM systems. This is exhaustively covered in numerous review papers, 25,26,[32][33][34][35]41 as indicated before. Therefore, other deep NN algorithms are introduced here. It should be noted that many applications can utilize both CNN and different deep NN algorithms together. The benefits that such combined methods provide are sometimes far more superior when CNN is used alone. Therefore, the authors aim to provide examples of those papers that use deep NN as their core damage detection technique. Sequential data or time series is a major part of many SHM systems. In contrast to the classical feed-forwarding ANN, recurrent neural networks (RNN) can use their internal memory to loop the output back into the prediction continuously. Although the looping can be computationally expensive, it ensures that the sequence of data dependent on each other, unlike ANN, which can prove to be a useful feature in identifying damage in situations where the correlation between multiple sources of signals and direct relation with external factors, for example, EOF exist. Mousavi and Gandomi 232 (Level 1) used RNN to capture and predict temperature variations. The numerical analysis of their nonlinear system showed that damage could be identified when prediction errors of the temperature signal deviate significantly from the expected value of the error. In a similar study by Mousavi 233 (Level 1), damage was identified under the conditions of EOFs when Johansen cointegration of the frequency signals that were used to train an RNN model failed to identify a relationship among the frequency signals. Their implementation performed well under noisy conditions when tested against two experimental samples. However, as confirmed by Zhang et al., 234 RNN suffers from exploding and gradient-vanishing; that is, when the weights are assigned at the node, they are either really small, which effectively stops the training process (vanishing), or the weights become too large which may lead to an unstable network (exploding).
In order to remedy the gradient-vanishing issue, long short-term memory (LSTM) network was introduced. In short, LSTM has a special architecture allowing to remember information for long periods of time, which allows for learning long-term temporal dependencies. The work by Zhang et al. 235 (Level 1-2,4) is of the first implementations of LSTM network with limited data for seismic response modeling of highly nonlinear complex dynamic systems. Due to limited data, K-means clustering was used to partition the dataset for generating training and testing data. Their stacked LSTM network scheme performed well with the prediction error of ±10% with confidence intervals of 91%, 86%, and 84%, respectively. Their approach, however, is computationally costly, with 50,000 epochs for training. A novel deep RRN encoderdecoder approach with LSTM in sequence-to-sequence (seq2seq) modeling was presented by Li et al. 236 (Level 1-2). Their online SHM monitoring under seismic excitation performed reliably to predict dynamic responses subjected to future earthquakes when compared with seven state-of-the-art methods for sequence learning and prediction-reduction in prediction error and the standard deviation by at least 13% and 15%. The major drawback of LSTM is the need for huge resources and long training time to perform well.
Another variant of LSTM architecture, named gated recurring units (GRU) also exist. Having one fewer gate than LSTM makes GRU's internal structure much simpler; therefore, it becomes easier to train with fewer computations. Generally, GRU performs better than LSTM for small datasets. The superiority of GRU for smaller datasets was shown in a recent paper by Choe et al. 237 (Level 1-2). Compared to LSTM and stacked LSTM, GRU managed to achieve 10-30% in accuracy for structural damage detection of floating offshore wind turbine blades. One limitation of RNNs and its variants is unidirectionality of network; that is, the output at a particular time step depends only on the past information in the input sequence. To mitigate this, bidirectional RNN was proposed. To fix the problems of variation in structural response due to initial residual stress, coupling effects of structure damage, and external loads, Tian et al. 238 (Level 1) proposed a global and partial bidirectional LSTM model to relate the girder vertical deflection to cable tension. It was found out that the partial model performed better with relative root mean square error (RRMSE) of 3.24% in addition to performing consistently with noise levels and traffic volumes under normal operational conditions. Although very popular in natural language processing (NLP), it appears that bidirectional deep NN algorithms, even though they provide better flexibility in some aspects of SHM systems, are not widely adapted by the community. This is perhaps due to the issue that the entire sequence must be available before making predictions and high computing cost of running such complicated models.
The final deep NN to discuss is the generative adversarial networks (GAN). The idea of GAN is that two sub-models, namely, generator and discriminator, produce and distinguish fake images given a latent vector and the original dataset. With training, the generator improves and produces images that are more real. The idea of using GAN for SHM systems was studied by Tsialiamanis et al. 239 It was demonstrated that with prior knowledge, GAN can reflect damage characteristics via categorical and continuous variables despite the presence of EOFs. Therefore, GAN can be promising in training large datasets. The current SHM system may benefit a lot by considering GAN in the pipeline. For example, the very recent paper by Fan et al. 240 (Level 1) demonstrated the applicability of GAN for structural dynamic response reconstruction under ambient excitations or seismic loadings. Although damage detection was a preliminary investigation and not the goal of the article, the authors confirmed that such a model can be used for identifying damage in SHM. Their dynamic response reconstruction error was 15.7% compared to the traditional CNN model with 69%.

K-means (unsupervised)
Clustering is a technique in which subgroups are assembled based on either features or samples. It performs a partition of data into K non-overlapping clusters. In an iterative process, each element is assigned to a partition considering the minimum distance between the element and the centroid of each cluster that is either defined or estimated initially. 241 Once the assignment is over, the centroid is recalculated based on the average of all elements in the cluster. Alamdari et al. 242 (Level 1) implemented a spectral-based clustering SHM of the Sydney Harbour Bridge. In their approach, offline datasets were adapted instead of live data streams from the SHM devices due to the challenges in data communication overhead, delay, and the overall system's resiliency. At the same time, multiple nodes indicate the presence of damage. The effect of traffic loading on bridges has been widely studied in the literature. However, a limited number of them have incorporated the collected data from the traffic to enhance SHM applications or fabricate separate sensors dedicated to detecting vehicles. The study by Burrello et al. 243 (Level 1) leveraged the SHM data with an anomaly detection technique to identify traffic load from the acceleration peaks and utilize the K-means algorithm to distinguish amplitude and damping duration associated with heavy traffic and cars, respectively. K-means clustering is heavily dependent on the choice of the K value, the initial value of the centroid, and the distance metrics used to determine the distance between each element and the centroid of clusters. Moreover, implementations with Kmeans clustering with high-dimensional data, including EOFs, can hinder the method's effectiveness. Dimensionality reduction techniques such as PCA or spectral clustering methods 244 are recommended before applying K-means clustering.

Gaussian mixture (unsupervised)
Gaussian mixture model (GMM) is a parametric probability density function similar to kernel density estimates but with a small number of components. In bridge monitoring applications, the model tries to capture the primary component that corresponds to the healthy state of the bridge's condition, even under varying EOFs. In an experimental study on the Z-24 bridge, Figueiredo and Cross 76 (Level 1) applied a mixture of supervised and unsupervised learning approaches for nonlinear long-term monitoring of bridges. The study utilized the supervised NLPCA for characterizing the interdependency of identified features, as well as the consideration of EOFs, and the unsupervised GMM for damage identification based on outlier detection. Similarly, Figueiredo et al. 19 (Level 1) explored the integration of model-driven and data-driven systems in a hybrid approach for damage detection of the Z-24 bridge. The data from FEM was fed to the GMM to improve the damage classification and the validation of the model. GMM of acoustic emission for monitoring cracks in an RC shear wall was tested in the study by Farhidzadeh et al. 245 (Level 1-3). The GMM was successful in clustering two hidden classes of crack mode, that is, shear and tensile.

Association analysis (unsupervised)
After data clustering, the association analysis method can be used to find relationships and dependencies, that is, association rules. 172 Especially in large dataset points, it becomes essential to leverage the scalability of association rules to extract useful information for decision making. One of the most widely used applications of association analysis is the market basket data analysis. It is used to find associations between customer's purchasing patterns that can provide interesting information to market owners to maximize profit and help design enticing advertisements. Jin et al. 246 (Level 1-2,4) employed the Pearson correlationbased association analysis method for performance assessment of an in-service bridge by mapping the structural dependencies such as stress and displacement. Pearson correlation is a simple process that is typically used as a measure of finding linear correlations between a large amount of data in association analysis methods. Pearson correlation only measures the strength of the association rather than the significance. Spearman, Kendall, and Chisquare tests are other measures of correlation where in the final test, one can expect to find the association's significance. Guéguen and Tiganescu 247 (Level 1) considered dynamic changes in a building concerning the temperature effect. The authors analyzed the correlation of resonance frequency using the association rule learning method to detect damage. Finding appropriate parameters, discovering too many rules, discovering insignificant rules, and computationally inefficient numerical data are some of the drawbacks of association analysis.

Semi-supervised
As mentioned in previous sections, it is rare and infeasible to have fully labeled datasets in SHM applications ;hence, supervised learning is illogical. On the other hand, unsupervised approaches could take a great deal of effort to implement. The possibility of having a small subset of labeled data is not far-fetched 203 ; therefore, to leverage both aspects of the data, semi-supervised learning methods can be used to take advantage of labeled and unlabeled data. Bull et al. 248 (Level 1) employed semi-supervised parametric GMM for improving the performance of damage classification in risk-based applications. Rogers et al. 249 (Level 1) proposed a Bayesian non-parametric clustering technique to apply labels online to the clusters in a semisupervised manner with little to no data training. Chen et al. 250 (Level 1) presented an adaptive graph filtering for semi-supervised damage classification in an indirect bridge SHM application with improving classification accuracy. In another indirect bridge monitoring study, Liu et al. 106 (Level  1-2,4) analyzed stacked autoencoders as a nonlinear dimensionality reduction technique and a semi-supervised damage severity estimation model on a laboratory bridge model. The authors concluded that their approach is feasible and applicable to in-service bridge structures.

Blind source separation in SHM
Blind source separation (BSS) aims to separate individual sources of signals from a set of mixed signals with little to no information. It is assumed that the mixture of signals is independent, and there exists no correlation between the components. There are cases in which damage is detected, and it usually is accompanied by other damages. 251 Therefore, the acquired signals from the structures, along with the noise, make it difficult to pinpoint and identify the faulty component. BSS algorithm, in this sense, can be used for output-only system identification and damage detection. 252 Generally speaking, BSS is treated as an unsupervised ML problem dealing with the time or frequency domain. 253 The difficulty in the BSS approach is the uncertainty in the number of sensors relative to the number of sources, especially when there are fewer sensors than there are sources. ML algorithms are developed explicitly for BSS applications to tackle different BSS issues such as model complexity and heterogeneous environment and address real-world applications such as SHM for civil infrastructures. To overcome the high amounts of transmitted data in SHM systems, Sadhu et al. 254 (Level 1) presented a decentralized, high compression sensing tool for data reduction within BSS framework. Musafere et al. 252 (Level 1) developed time-varying autoregressive modeling to obtain the mono-harmonic responses from the vibration data. They validated numerical and experimental studies, as well as on a full-scale earthquake-excited building. Liu et al. 255 (Level 1) developed a BSS ML-based approach for modal identification and validated their results from the SHM data of a cable-stayed bridge under ambient vibration. Ye et al. 256 (Level 1) proposed an integrated ML-based single-channel BSS algorithm for separating deflection components from live load effects, temperature effects, and structural deflection for prestressed concrete bridges. Their approach takes advantage of ensemble empirical mode decomposition (EEMD), PCA, and ICA algorithms. Applications of AI and DL for BSS approaches have also been studied in the past; however, there seems to be very little work integrating them with SHM systems. BSS can be seen as the middleware framework between signal processing and ML. The importance of BBS has led to novel developments in the field.

An insight into damage prognosis with ML
Many of the reviewed works in ML-augmented SHM realize the first two levels of damage identification. Some also consider the extent of the damage to some length, and very few classify the damage. The reason for not including the last stage of damage identification in many of the SHM systems is simply because of the fact that the development of SHM in different papers is not matured. Essentially, understating damage propagation for determining the remaining useful life (RUL) of the structure is not feasible. Moreover, in RUL estimation, the prediction is probabilistic in nature and comes with a certain degree of uncertainty. However, in other applications such as rotating machineries, one can determine the exact future operational and loading factors. This is proved to be very difficult in civil infrastructures where slight deviations or external factors can influence the properties of the whole system. Furthermore, damage prognosis depends on the accurate global and local model representation of the structure. Minimal research studies exist which consider these phenomena. RUL prediction is considered still an emerging technology. In this part, we will introduce a few examples where successful implementation of RUL based on ML algorithms in the context of SHM is provided.
Atamturktur et al. 257 used SVR on historic masonry fort considering different support settlements. The prognostic evaluation of the structural condition is based on an adaptive weighting of the regressor classifier for settlement-induced strains up to 100 mm. The study achieved as much as 50% reduction in the prediction error compared to the vanilla SVR method. NNs are also used for RUL estimation. In a typical static NN, such as MLP, prediction is performed at each time step independently. Although it would be wise to use phase-space representation such as time-windowing for generating a fixed sequence of instances, however, such an approach gives rise to increased dimension and, subsequently, the problem of the curse of dimensionality. In the context of prediction, it is rational that having a history of the past data stored can greatly improve the prediction. Therefore, deep NN can help to enhance the prediction accuracy. This was precisely demonstrated in the study by Wu et al., 258 where the LSTM approach for RUL of engineered systems was proposed. The authors used aircraft turbofan engines datasets from NASA with four damage conditions. The performance of their approach was tested against two other deep NN methods, namely: RNN and GNU. The methodology described in this article can be extended to civil structures as well.
Other than deep NN, there are also opportunities to leverage physics-informed data-driven SHM systems for damage prognosis. For example, such a hybrid approach was realized in the study by Das et al. 259 The authors used dynamic mode decomposition as well as computer vision for the prediction of cracking in a mortar cube specimen. It was observed that with more training frames, the L2 norm is substantially reduced, indicating a more accurate validation of prognosis. Crack prediction is one of the areas that perhaps, RUL concept is more apparent. In such scenarios, the spatiotemporal phenomena that exist among degrading infrastructures, especially bridges, RUL estimation could provide valuable information about resource allocation, planning, and retrofitting when a target risk level is reached. The uncertainties involved with simplification and assumption made during the RUL estimation process, can be rectified by integrating probabilistic measures such as Bayesian statistics. For instance, a two-phase gammas process with Bayesian approach was used to predict the remaining useful life of corroded reinforced concrete beams. 260 For more information about RUL estimation, interested readers are recommended to read the bottom-top review paper by Lei et al., 261 and the deep NN methods for RUL estimation by Zhang et al. 262 However, the biggest challenge in achieving complete 5stage damage identification is creating a pipeline that can run each stage with an acceptable level of performance both quantitatively and also qualitatively. Even with advances in technology, the realization of damage prognosis into civil structure monitoring systems currently does not necessitates spending efforts. However, with challenges being introduced to the already complex structural systems, damage prognosis cannot be left behind.

Remarks on SHM system with ML pipeline
In this section, several of the widely used algorithms in damage assessment in civil infrastructures are demonstrated. There also exist different models in the literature, where hybrid implementation or entirely different algorithms are used (some examples are provided in the "Digital twin and physics-guided ML-based SHM" subsection). Therefore, only the most common models were chosen and compared together. In the future, a more in-depth assessment of damage classification will be studied. A summary of the advantages and the disadvantages of the mentioned algorithms in this article is provided in Table 8. It is quite challenging to justify training time, as indicated in the table, for different algorithms stated in this section as there may be underlying optimization for different algorithms in the literature. The size and the type of the collected data can vary by a significant margin from work to work. Therefore, the training time was rated based on the general observation of the authors while reviewing different papers, as well as the authors' experience in the past. The results of the reviewed ML algorithms for different stages of SHM are depicted in Figure 9, noting that it does not necessarily mean that certain algorithms are incapable of addressing other levels. As it can be observed, Level 1 can be achieved by virtually every ML technique. Levels 2 and 3 are also considered in most of the applications. For the case of Levels 3 and 5, however, many works did not either take into account these levels or believed that the algorithms were not suitable or capable for damage classification and prognosis.
Based on 11 ML algorithms reviewed in this section, only neural network (NN) algorithms can accomplish complete 5-stage damage identification with DL approaches considered as the first choice for several studies. Therefore, based on the reviewed works, the authors recommend an Obtaining noninteresting rules * represents a rating system for training time, * means quick, ** means average, and *** means long.
SHM system with ML-/DL-pipeline, as shown in Figure 10.
It is suggested to utilize multisensory systems with nonlinear data normalization techniques to consider EOFs. Data augmentations can also be employed with DL methods to impute and increase the quality of the data that could have been affected in earlier stages due to loss of signals or delays caused by communication overhead or dense data compression before transmission. Data compression by means of dimensionality reduction is one of the important stages. It is noticed that with the consideration of EOF, nonlinear dimensionality reduction methods perform better compared to other approaches. In the phase of feature extraction/ selection, several methods exist. Although the selected path is data-driven techniques, however, as will be discussed later in the article, physics-guided data-driven models can also be adapted. However, it has been only a few years that this technique has gained popularity, and it is still mostly under-development for large-scale SHM systems. Next, with data fusion techniques, a multitude of collected data can be combined to increase the possibility of detecting patterns and identifying outliers in the last stage of the SHM system. Finally, based on the analysis of 11 ML and DL algorithms, the authors consider that for Level 1-4 SHM, Bayesian and NN techniques can be considered for PR. Whereas, for complete 5-stage SHM damage identification, it is imagined that with the current sensor technology and algorithms and their limitations, only deep NN would be considered as a suitable solution.

IoT-related applications
Together with IoT and smart city monitoring systems, our living conditions in terms of comfort and safety are continually improving. With increasing numbers of ubiquitous connected devices, it is expected to face research challenges simultaneously as innovative solutions are being developed. Computational resources, energy management, optimal sensor placement, interoperability, security and privacy, open-standard protocols, and many others are today's challenges to overcome. For example, for seamless integration of different services and technologies, the everpresent IoT needs to be open and exchange data with other platforms. 263 In today's IoT ecosystem, efforts have to be spent embracing and managing the fast-paced IoT advancement and integrating it with ML to expand the boundaries.

ML for IoT in SHM systems
It can be said that WSNs with any intelligent software for data collection, analysis, enhanced connectivity, and accessibility are considered to be an IoT application by themselves. However, the authors believe that such IoT implementations in SHM only scratch the surface of the capabilities that IoT, in large, can provide. It goes beyond a simple data acquisition system and software solutions for feature extraction and damage prediction. Instead, it should aim for a symbiotic connection between multisensory and city-scale multi-infrastructural network monitoring systems. Such a paradigm cannot simply be achieved with traditional SHM as it fails to provide any ubiquitous services and powerful processing of sensing data stream. This is a prime example of how ML and cloud-processing infrastructure can replace the old systems and achieve a higher level of efficiency. The power of ML for IoT can be realized in many implementations. The consideration of EOFs was discussed at length, and it was proved to be a major challenge and limitation in many SHM systems. Without ML, it would be challenging to isolate external variations that can influence the data received from the sensors. Also, faulty sensors and missing data are other aspects where ML can be beneficial. IoT is about interconnectivity. Therefore, there may be cases where multiple sensors are being used to collect different data in different sizes and resolutions. Interoperating and correlating such as the massive amount of data can be computationally expensive. Here, ML can be used to make inferences and make relations between multiple sources of signals for identifying damage. When discussing ML for IoT, the real challenge is finding a way to seamlessly connect the two together in a unified platform without sacrificing information and data. Nonetheless, throughout this article, it has been realized that many of the proposed ML-conjunct SHM methods do not always meet the first and foremost important aspect of IoT, and that is continuous monitoring. Many examples just introduce new damage identification techniques but fail to provide any objective justification if their system can sustain continuous monitoring. With these obstacles, civil structure asset management becomes really important, and this warns for further research and understanding as systems become more intricate. According to the timeline envisioned by Xu et al. 264 and Figure 11, IoT is currently placed in the middle stage of different SHM platforms. Damage identification started with fully centralized traditional methods, and now it has been rapidly evolving to fully decentralized blockchain SHM systems. ML utilization started in IoT-based SHM systems and will be even more prominent in later stages. Insight on how the stage evolvement occurs can be widely beneficial for design solutions that can meet the future of big data analysis.

Big data and SHM, a symbiotic strategy
There is no doubt that with the sheer amount of data from SHM, many link the big data paradigm to such systems. The misconception with the term "big data" is that only the volume of data plays the most prominent role. With the significant improvement in the current SHM systems, the volume of data is no longer the critical factor. Instead, the following additional 4 Vs combined with the volume of the data make up the big data model. 265,266 1. Variety: type and nature of the data, for example, EOFs 2. Velocity: how fast the data are generated and analyzed, for example, excitation/extraction and sensor optimization 3. Variability: discrepancy of the data, for example, outliers 4. Veracity: the useability of the data, for example, data cleaning and feature extraction methods Big data and SHM share common grounds. Both are considered to convey data-driven findings despite the unprecedented computational expense and features non-trivial to capture. They are interchangeable to some degree, and big data solutions can come in handy for various SHM Figure 10. A recommended SHM system with ML and DL enhancements at each component level.
systems, such as decentralized learning with GPU parallel computing. The pipeline for big data and that of SHM, as was previously shown in Figure 4 and discussed in detail in previous sections, are very similar. Table 9 shows how the steps involved in big data relate to the different stages of the ML-based SHM system. 79 The challenges in any SHM system are inherently some of the significant data issues in the processing domain. Thus, with big data improvement, a similar strategy can be developed for SHM, reinforcing future applications.
In terms of variety, it is rarely the case where incoming data are structured in any meaningful way. They can be semi-structured, like the temperature, wind, and traffic loading (EOFs), or unstructured, like data streams from computer visions from changes in the pixels. Therefore, feature extraction becomes a complicated task. In terms of volume, assume a structure with multiple sensors at different locations, in addition to external monitoring of weather and traffic with sensor and computer vision. Together they can include more than hundreds of sensors, and with 24 h continuous monitoring of structure populations, 267-270 they can generate an enormous amount of data in terms of petabytes per week. Processing this many data, let alone the vast storage required, is a major challenge for continuous SHM monitoring solutions. In terms of velocity, modern SHM data acquisition systems with high-resolution data introduce a data transfer bottleneck and may cause data loss during transmission. Delayed or missing data can heavily impact damage identification pipelines. In terms of variability and veracity, complexity and evolving relationships between the collected data introduce uncertainty and outliers, especially for long monitoring SHM systems. Low data quality, in terms of missing and noisy data, leads to reduced structural integrity decisions inferred from the features. Generally, the challenges in big data and SHM relate to each other, one way or another, as indicated in Table 9. When appropriate data-processing techniques are developed, it is expected that they would bridge the gap between these two different fields, essentially providing value to both systems.
Liang et al. 271 proposed a big-data SHM platform for serviceability assessment of a bridge. In their application, the authors based their big-data system on sensor technology for data mining. Wang et al. 272 approach for big data in SHM was mainly focused on the data fusion (data aggregation in big data) and learning stages (modeling in big data). Their experimental testbed of a 12-story test structure showed promising results in data reduction, energy efficiency, cost, and quality. Ni et al. 191 addressed the variability challenge in big data, for example, outlier and data fusion, based on a DL method. The data recontraction of anomalies after data compression is a significant task as it may pose severe challenges for the high accuracy of the model after reconstruction. Although the root of abnormal data is a complex process, identifying them before data compression is a vital task. The veracity of big data is  usually linked to the feature extraction stage of SHM. Highdimensional features, as present in SHM due to EOFs, can make any system time-consuming and complicated. Entezami et al. 273 proposed an ARMA model to tackle this specific issue related to SHM and big data. For a general overview, interested readers are recommended to read the recent review by Sun et al., 37 which investigated some aspects of big data and AI in bridge SHM.
Smart city and SHM, the bigger picture One of the reasons for marching toward a smart city ecosystem is to use the potential of existing technologies and infrastructures in providing the best utility to users and improving their future. Some aspects of the involvement of SHM in the current smart city era are reviewed by Du et al. 274 In their work, the authors tackled WSNs issues in the monitoring system and posed many open-ended questions regarding the future of smart city monitoring. The components of SHM encourage applications of data-driven smart solutions in the context of the smart city. Together, a smart city is expected to provide a seamless connection between services and citizens, and monitoring is an essential component of this connection. Combined with the power of ML and DL, the adaptation and integrations of smart monitoring applications are of increasing interest in civil engineering. 275 The novel approaches in SHM applications involving ML and AI are becoming the pinnacle of research today. As explored in different sections of this review, what is being seen today is the result of decades of research from which the recent works have at least some elements of the common ground ML as their core. In line with the discussion above, the current utilization of SHM requires a change in definition and architecture. With Industry 4.0 and cloud-based monitoring solution already implemented or on the horizon, a transition toward cyber-physical system (CPS)-based SHM design is envisaged. A decentralized, self-sustaining CPS is said to be the next stage of smart monitoring systems. However, some complications in the design and deployment stage have to be considered and studied beforehand. 276 A cloud-based concept of bridge monitoring was presented by Furinghetti et al. 277 The cloud-computing interface developed for their proof of concept with analysis of the software and hardware requirement was shown to be a practical and appliable approach for the future of smart monitoring. Ozer and Feng 278 proposed a mobile CPS-based SHM system for structural reliability estimation of bridges. As the authors stated, the ultimate goal is to integrate such a design with cloud-computing power to increase efficiency and easy integration with the smart city.
The dense sensor node structure of a smart city brings about some challenges. With the deployment of different kinds of sensors on structures all over the city, it becomes necessary to apply smart asset management such that the critical structures be prioritized in terms of hardware and software allocation. Moreover, in the backbone of a perfect smart city paradigm, it is expected that the services are interoperable with open data standards, making the data interpretation, analysis and sharing seamless. The smart monitoring solutions proposed by different researchers as shown throughout the article, demonstrate, in broad picture, the capabilities of such systems for pursuing the goals of smart infrastructure. The utilization of ML and DL has granted the opportunity to take one step closer to having an autonomous, self-learning, and self-sustaining smart city.

Intelligent transportation system and SHM, a complementary addition
With the help of IoT applications, mobility and transportation are considered to be the key influencing factors in sustaining our surrounding environments, especially those that utilize intelligent transportation systems (ITS). 279 There are two possible ways to merge these two systems into one. The data collected through the ITS can be fed to the SHM system and, in turn, improve the system's reliability: this is commonly known as the ITS-informed-SHM system. An example of this approach was discussed by Lan et al., 280 which showed the impact of traffic load for fatigue damage evaluation on bridges. On the other hand, when providing the data collected from the SHM system to the ITS, information for real-time traffic management can be utilized, especially under critical events such as an earthquake. This form of integration is referred to as SHM-informed-ITS. In this approach, further enhancement can be made when the system is integrated as part of smart cities, 281 where the information could be used for other services provided in this context. This, in turn, enables interoperability, leading to an enhancement in the quality of service (QoS). A smart pavement monitoring system based on a supervised ML algorithm was demonstrated by Praticò et al. 282 Approaches and methodologies taken in this work were based on integration with current or future smart cities with ITS as a backbone for data collection. In the study by Huang,283 different data-driven methods to assess the transportation system's health, efficiency, and safety were used. Using big data and ITS, the author provided decision support for practitioners. Interested readers are referred to the review paper by Khan et al. 284 for bridge conditions assessment integrating SHM and ITS.
Next-generation SHM applications with ML/DL enhancements SHM discipline has come a long way in the past century from conventional to mobile and smart systems. With the continuous improvements in sensor technology fields, unprecedented new techniques have been introduced. Starting with the first-ever SHM inspiration in the late 19th century 8,285 for detecting cracks in railroad wheels to the implementation of the first NN machine by Marvin Minsky, 286 SHM relied on two main tracks of advances, including sensing devices and methods/algorithms. The beginning of SHM systems started with the condition monitoring of rotating machinery with the simple shock pulse method in the 1960s and later vibration-based measurements, with further extension into offshore oil platforms. Beginning with aerospace and civil engineering SHM applications in 1980, major innovations in ML, AI, DL, and computer vision were initiated. The earliest use of PR and ML in civil engineering structures was developed by Adeli and Yeh. 287 Similarly, the first-ever computer vision model for civil structures was developed by Stephen et al. 288 and Olaszek 289 in the 1990s for bridges.
With the introduction of cloud computing in 2002 and Industry 4.0 in 2010, a giant leap toward a new generation of computing and monitoring solutions was taken. The transition to CPS-based SHM design, machine-to-machine communication, cognitive computing, virtual, and augmented reality has enabled a paradigm shift in the last 10 years. The authors believe the future of sustainable SHM systems with damage prognosis capabilities co-exists and co-integrates with smart cities, big data, and services and technologies such as interoperability, 290 blockchain, 291 and digital twin. 292 Figure 12 depicts a summary of the timeline of advancement in SHM and computing algorithms. In the next subsections, state-of-the-art and emerging technologies and services that have the potential to augment the traditional existing SHM systems with nextgeneration sensing and computation advents are introduced. They are not limited to but receiving utmost attention from the SHM community, according to the authors.

UAV-assisted SHM
Drone technology, also known as UAV, has seen a vast increase in usage in recent years due to the advantages it can offer, especially its deployment flexibility. 293 Given their versatility, low cost, and ease of deployment elements of a flying piece of technology, they are becoming increasingly accretive. 294,295 UAVs can easily be integrated into the design phase workflow of civil structures for simple imaging or scene reconnaissance, monitor the work-inprogress and document phases of the construction, and lastly, it can be used for monitoring and inspection. UAVs enable investors to visit hard-to-access areas of many structures, such as tall buildings or bridges with a river flowing underneath. The use-cases of UAVs exceed the imaging and video capabilities. They can also be equipped with other sensors for vibration-based approaches. One of the best applications of UAV is in disaster damage and loss estimation. 31 Disaster mapping becomes essential in an area where no monitoring of critical infrastructure was sought out. UAVs can combine complicated components of a stationary monitoring system and essentially create a mobile and portable mini-SHM. The following list shows different applications of UAVs typically used for damage detection/ localization in civil structures: • Concrete crack detection • Pavement crack detection • Rust detection • 3D reconstruction via laser or light detection and ranging (LiDAR) • Displacement measurement using camera-lens configurations • Displacement monitoring via lasers • GPS for Level 2 SHM, geo-tagging • Ultrasonic beacon for Level 2 SHM, geo-tagging Some of the above applications can take advantage of ML and DL. For example, crack and rust detection via computer vision can be carried out in two different ways, pixel processing or patch processing. A full image is processed in the former and based on edge detection or pixel separation on a single threshold is applied. In the latter, using ML or DL, the original image is segmented into patches, and crack patterns are identified. Vibration-based approaches can also be achieved using displacement monitoring via digital image correlation (DIC). 296 Noncontact reference-free displacement estimation is particularly important for the railroad industry. 297,298 As a low-cost solution, many researchers have directed their efforts in developing new ways and utilizing emerging technologies to capture vital information from railroad bridges. Height of railroad bridges, being located in remote, irregular, and sometimes inaccessible areas, are the issues that the usage of many traditional wired data collection methods such as linear variable differential transducers (LVDTs) are becoming obsolete and replaced by nextgeneration sensing devices such as laser Doppler vibrometers (LDVs). These devices can also be mounted on UAVs. Garg et al. 299 installed LDVs on a UAV to collect displacement measurements of a railway bridge in the event of a train-crossing event. Displacement measurements can be followed by modal analysis and identification of frequencies and mode shapes. With LDVs, amplitude and frequency are extracted from the Doppler shift and enhanced via ML and DL for modal analysis. For example, CNN-LSTM was used to extract natural frequencies from a variety of beam samples using a shaker, and LDVs. 300 Displacement-based measurements using computer vision only consider the plane perpendicular to the camera. The superiority of the proposed technique was shown based on the mean value for mean absolute error (MAE) that ranged from 0.45 to 1.5. Hoskere et al. 301 proposed a novel visionbased data extraction pipeline for measuring modal properties of structures from a UAV. Compared to the fixed accelerometers, the UAVs were able to show good results with 1.6% error in natural frequency and modal assurance criterion (MAC) values of above 0.925.
With the addition of a depth sensor such as an infrared (IR) camera or LiDAR, the out-of-plane direction (the distance from the object to the camera) can also be measured. This provided superior dynamic displacement measurement on UAVs as demonstrated by Perry and Guo. 302 Having a 3D structure model can easily enable local changes to be detected, and damage can be identified. Furthermore, the 3D reconstruction model can produce 3D FEM, which can effectively be integrated with different damage identification methods for a hybrid approach. Conventional 3D reconstruction models require highquality point clouds that are difficult to obtain. Therefore, they may include defects and hence reducing the structural information required to obtain satisfactory results for damage detection. ML and DL can also be applied in these scenarios to overcome the deficiencies above, as demonstrated by Hu et al. 303 for a structure-aware semantic 3D model of a cable-stayed bridge using CNN.
To utilize UAVs and the power of ML together, Perry et al. 304 demonstrated a new approach to bridge inspection. This way, by collecting pictures from a bridge and mapping a 3D point-cloud and photo-realistic model, with the help of computer vision and ML algorithms, it becomes more versatile and efficient to detect faults with little to no human interaction. A new crack detection technique based on the images taken from the UAV was proposed by Lei et al. 305 In their approach, environmental noise is considerably reduced compared with the traditional edge detection methods and the error rate of 5.43% was achieved. Augmentation of the new approach with ML and real-time bridge inspection is the next step of the author's work. Other than point-cloud-based methods of detecting faults, thermal-based imaging is also used to capture information from bridges. Due to the mechanism of horizontal cracking around the rebar level, the change in the thermal properties of a bridge deck can indicate a pattern of delamination. This idea was used by Cheng et al., 306 using UAV with a supervised deep learning approach to capture the changes in the bridge deck. Due to the lack of thermal images, experimental data augmentation was used in this study to enrich the training dataset. Some of the applications above may provide a methodology for damage localization; however, dedicated units can be installed for geotagging identified damages. GPS and real-time kinematic global positioning system (RTK GPS) can be used to locate an approximated position of the damaged location; however, they are limited only to the outdoor environment. In GPS-denied areas such as beneath a bridge, other methods such as ultrasonic beacon systems can be used for locating damage. 307 From the studies above, it becomes clear that UAV-based damage detection techniques can be proven promising to implement on many civil infrastructures, especially bridges. The incorporation of UAVs into civil engineering demonstrates exceptional and practical feasibilities in terms of scalability and automation. Being able to carry different kinds of sensors and devices makes UAVs an attractive lowcost option for rapid monitoring of structures. It is envisioned that with improvements in UAVs in the coming years and their integration into a smart city and becoming autonomous, self-sustaining monitoring solutions, they will be able to provide streamlined as-built critical SHM systems. A summary of the UAV-related sensor technology and its application in SHM is shown in Table 10.

Mobile-assisted SHM
With the advancement in the IoT era, many citizens own a smart device that can easily be integrated into the monitoring solution. Smartphones with built-in cameras and measuring components, such as accelerometers, show great potential in SHM applications. Smartphones can capture video and images detecting faults and deformation on bridges 308,309 ; the embedded accelerometers can identify dynamic characteristics of even very lowfrequency structures. 310 Just like UAVs, SHM can benefit smartphone capabilities in various ways. They contain storage, advanced microprocessors, GPS, and wide ranges of the communication network from the cellular network to Wi-Fi and Bluetooth, in a small form factor. Smartphones are also capable of higher spatial coverage compared to stationary sensors. With data fusion, it would be possible to design portable and massive network of monitoring solutions being that the major advantage of smartphones is scalability. Feng et al. 158 performed small-scale, large-scale, and filed test to evaluate smartphone acceleration fidelity. Regarding small-scale shaking test, they reported accuracy errors in terms of identifying frequency and signal amplitudes for different device generation, where old smartphone generation showed up to 44% amplitude error and 5% frequency error. In contrast, these error ranges reduced to 17% for amplitude and only 1% for frequency in new generation devices. The latter tests only considered frequency evaluation and provided error percentages around 1%. Later on, Ozer et al. 311 expanded the smartphone scheme into crowdsourcing and provided modal frequency estimation of less than 1.3% error, whereas, old generation smartphones were incapable capturing ambient vibrations. For SHM applications, accuracy validation was extended to mode shape identification and modal assurance criteria values near 0.90 with smartphone data 312 and higher values with multisensory data as well. 313 The following lists the potential applications of mobile-assisted SHM: • Crack detection with computer vision • Vibration measurement with embedded accelerometers • Displacement measurement with computer vision • Drive-by sensing for indirect identification • Crowdsourcing for citizen-engaged operation • Load detection from pedestrians using human activity recognition (HAR) • Supplementary information through multisensory heterogeneous data feed Due to their size and portability, smartphones can be attached to moving vehicles and can provide two use-cases.
(1) Using their camera and computer vision algorithms, enhanced with ML and DL, they can identify different types of road damages and measure road roughness. 314 (2) They can also be used for indirect monitoring through vibration measurements and can be integrated with the first use case to include the effect of road roughness on the collected vibration data. 315 In addition to aforementioned mobile sensing paradigms, monitoring of pedestrians can be associated with the dynamic bridge loading and cause different levels of excitation. 316 Therefore, smartphones can greatly help to extract damage-sensitive features by analyzing the pedestrians' body movement and transfer loading mechanism to the bridge with the help of built-in accelerometers for vibrations and gyroscope and magnetometer for direction correction. Applications of ML and DL in mobile SHM systems can alleviate some of the shortcomings with smartphones. In particular, for stationary vibration monitoring with smartphones, sliding motion of accelerometers due to smartphone not being fixed to the ground is an issue that was tackled by Na et al. 317 The authors used SVM, NN, and RNN to detect the sliding motion on a shaking table. With 93% accuracy, RNN was able to classify the sliding motion correctly. Transfer learning is a popular method in DL where the pre-trained model developed for a specific problem is reutilized in other related problems. Therefore, the initial model is trained with a considerable amount of data which can take a long time and later transferred to smaller devices, such as smartphones for damage detection, for example, cracks or concrete spalling in a fraction of a second, as demonstrated by Perez and Tah. 318 Other than the indirect and mobile approaches, smartphones with computer vision can be used for measuring displacement on different structural elements. Validation studies for evaluating mobile-SHM were carried out by Yu et al. 319,320 The results show the suitability of smartphones for mini-SHM systems. A feasibility study of utilizing smartphone cameras for seismic structural damage detection was presented by Alzughaibi et al. 321 With their experimental vision-based solution for in-building damage detection, the authors showed sub-millimeter accuracy demonstrating the feasibility of smartphones for SHM. A vision-based approach with smartphones for obtaining dynamic characteristics of a cable-supported structure according to its dynamic displacement responses in the frequency domain was investigated by Zhao et al. 322 A 3D displacement monitoring system using the DIC technique was proposed by Wang et al. 323 A real-time damage detection solution for masonry buildings using mobile DL was demonstrated by Wang et al. 324 Leveraging the state-of-theart DL technique on historic buildings, the high-precision trained model was ported onto a smartphone and was successful in detecting damage. This shows that with everyday improvement in object detection models and size reduction of the trained models, mobile device-based damage detection with DL can become a new attractive approach in SHM. Some studies also use multisensory capabilities in smartphone devices for monitoring solutions. Ozer et al. 313 proposed a hybrid vibration response measurement and modal analysis system combining embedded accelerometers and cameras. The features and computational power in smartphones can promote long-term monitoring of bridges using smartphones. With extremely low initial and running costs, and the ability to develop custom analysis software, smartphones can essentially provide a complete monitoring solution. Shrestha et al. 325 investigated the feasibility of a long-term bridge health monitoring of Japan's Takamatsu bridge. In more than 1 year operation period, seismic and traffic-induced vibrations from the smartphones were captured and validated against the reference seismometers to verify their viability and accuracy.
On the other hand, engaging consumer-grade devices brings additional uncertainty due to the uncontrolled device operator. 312,326,327 The challenges in big data are comparable to the mobile sensing paradigms. 328 Compiling and analyzing tens of thousands of generated smartphone data, specifically when used in the context of crowdsourcing, can sometimes challenge computational strategies in the field. Despite these obstacles, mobile sensing presents many opportunities. Having a variety of data can increase the observation sources and yield more accurate results, supporting the decision-making tasks with abundant information. New innovative ways of generating data from structures such as drive-by sensing 329 are compatible with the everyday smart devices placed in vehicles. 330 If successful crowdsourcing mechanisms are embedded into vehicular SHM, smartphones are likely to occupy more space in the next decade of SHM research. 311,315,331 Smartphone-based SHM can digitally incorporate up-todate advances in system identification, ML, and data mining that encompasses a fully connected smart city platform. 332 A summary of the smartphone-related sensor technology and its application in SHM is shown in Table 10.

Digital twin and physics-guided ML-based SHM
The connection of the physical system (such as bridge elements, sensors, etc.) and the cyber aspect (such as data management, processing, and communication) is tightly combined in modern SHM formulations. It was shown that the isolation of cyber and physical aspects of a WSN-based SHM solution is suboptimal. 276 With the upcoming Industry 4.0, the Industrial Internet of Things (IIoT) is going to be the next major step for real-time performance monitoring and better predictive maintenance with new solutions in ML algorithms. The concept of the digital twin, as part of IIoT, has granted the ability to achieve a greater level of automation and transparency for infrastructure asset management. A digital copy of the structure in the digital twin domain is created to aggregate, process, and analyze the information and generate new data. 333 In some texts, a digital twin is defined as a digital representation of a realworld object in a CPS context. 334 It is expected that the current WSN-based SHM application is going to be integrated with high fidelity ubiquitous digital twin in the future to eliminate the hurdles in current designs.
Significant applications of digital twin are in manufacturing, smart city, and healthcare applications. 335 There is only a handful of digital twin modeling of civil structures. The necessary capabilities of designing the digital twin for bridges for SHM purposes were discussed by Ye et al. 336 The concept of a digital twin for cable-supported bridges along with a pilot study was introduced by Shim et al. 337 Using a 3D model and UAV with image processing, Shim et al. 338 developed a digital twin model of a long-span bridge utilizing reversed 3D-surface modeling to identify the damage on the bridge. Data-driven analysis is of most significant importance in the digital twin domain. In a similar approach, the concept of the deep digital twin was proposed by Booyse et al. 292 to circumvent the practical limitation of a model-driven digital twin. The authors' generative adversarial network (GAN) as their deep learning framework was used for detection, diagnostics, and prognostic of damage in a gearbox.
There are cases where sole data-driven approaches might be insufficient to meet today's SHM expectations' requirements. One of the major obstacles in achieving pure data-driven monitoring solutions is the lack of training data. While in theory, one can develop high fidelity and interpretable physics-based SHM systems and that the lack of data is not a primary issue to begin with (in contrast to datadriven methods), such a model-driven approach comprises uncertainty and modeling error for simplification and omission due to high computational cost. Therefore, in recent years, there have been strides toward synergistically integrating these two approaches such that they preserve their merits while at the same time lessening the inadequacy in a reasonable manner. Consequently, physics-guided MLbased SHM systems have started to combine both damage detection techniques. For example, in a recent study by Zhang and Sun, 339 FEM updating was used in an interactive manner with NN such that the physics-based loss function determines the difference between the output of the NN and the results of FEM updating. Through this interaction, the NN can learn well and detect damage when tested against unseen data. The authors showed that their implementation could both successfully improve the generality of NN (17% increase) and also enhance the performance of FEM updating by uncertainty reduction. Connecting this new wave to the paradigm of digital twin, in the framework proposed by Ritto and Rochinha, 340 measurements are taken from a physical twin (bar structure) to calibrate a stochastic computational model to simulate the system's response considering different damage intensities and locations. The virtual domain was assumed to be an ML classifier that can detect damage with different classifiers such as SVM and DT.
These two studies have shown that it is paramount for a physics-based physical twin to exist together with an ML classifier in a virtual domain for interpretability, flexibility, and reduced complexity. For further information, readers are recommended to read the recent comprehensive survey about integrating physics-based modeling with ML as presented by Willard et al. 341 In the study by Zhang et al., 234 the authors proposed a physics-guided DL surrogate model for seismic structural response prediction. The authors employed the law of dynamics as their physics models for training CNN on a reduced dataset. In order to alleviate limited training data, K-means clustering was used to partition the available data into training, testing, and prediction categories. The computational efficiency and the high prediction scores can enable developing fragility function for building serviceability assessment. Another use of DL algorithm in physics-guided SHM system can be found in a previous study by the same authors. Zhang et al., 342 proposed a deep LSTM network with the similar physics model approach as before. Analogous to their follow-up study, same testing procedures were applied, and satisfactory performance was achieved. In modeling time series of complex nonlinear dynamical systems, shallow NN such as ANN have distinct limitations. Physics-guided DL model such as CNN, RNN, and GAN may provide a better approach, especially in the cases of constraint data.
In view of the authors, there is much potential in physicsguided SHM with the integration of state-of-the-art ML and DL algorithms. Based on the observation and the provided examples, the progressive improvements of data-driven approaches and digital twin over the years can persuade researchers to design symbiotic systems such that actual characteristics based on physics-informed mathematical models can enhance the digital copy of structures.

Virtual reality and augmented reality for SHM
Virtual reality (VR) refers to computer technologies and interfaces to simulate a 3D and interactive environment; whereas, augmented reality (AR) implies layering virtual information on real-world objects. 343 Initially inspired and developed for the gaming community, VR and AR are now finding their way into other fields, such as nuclear facilities 344 or medical fields. 345 In SHM systems, such novel approaches are becoming a trend. For example, the combined VR and information model (IM) was used to visualize and access SHM data and metadata in 3D. 346 Such a system allows users to intuitively view where SHM data is generated and how it is used to assess the damage. A conceptual seismic impact simulator utilizing SHM, ITS, GIS, and VR was proposed by Büyüköztürk and Yu. 347 In a recent study by Bacco et al., 348 the authors proposed architecture for IoTbased remote monitoring with UAV and VR for locating various sensors attached to a structure and displaying instantaneous and historical records.
Despite being a mainstream condition assessment technique, visual inspections have shortcomings: laborintensive, error-prone, tedious, etc. are to name a few. AR has brought the opportunity to deliver new ways of portraying information that was deemed far-fetched a few years ago. This new content delivery paradigm has enabled engineers to simultaneously immerse themselves in a fully connected physical and digital world. An AR-enabled infrastructure inspection interface was developed by Maharjan et al. 349 The system was coupled with low-cost smart sensors and QR code scanners to assist the workflow of the inspectors. It is intuitive and much easier to apply ML and DL algorithms and display AR devices' results. A proof of concept of application of AR in bridge monitoring presented by Yuan et al. 350 An AR framework was presented by Athanasiou et al. 351 It was shown that with the holographic reinforcement visualization, overall inspection time could be reduced, improving the efficiency in collecting and managing data. The enhancement to the visual inspection brought by AR and its integration with ML and DL is a promising technology for infrastructure inspection based on the review by Mascareñas et al. 352 The enhancements brought by VR and AR can offer effective additions to other next-generation SHM applications, in particular, UAVs and smartphone-based systems, for both visualization and deployment measures. Human computer interaction can benefit from multisensory and interconnected media which can deploy SHM data for the operators, as well as users. With the power of ML and DL, this interaction becomes real-time and more intuitive and brings VR/AR opportunities provided in the context of smart infrastructure and smart monitoring in the future smart city.

Open research issues
With the increased variety of SHM instrumentation and analytics approaches, traditional monitoring solutions are becoming obsolete. The heterogeneous nature of the collected data notifies the need for multivariate and asynchronous processing strategies. One major problem related to data-driven SHM approaches is that it is highly impractical to collect sufficient data to train ML algorithms in real life. To compensate for this, data fusion or sensor fusion has drawn the attention of the researchers. However, multisensory applications still need maturity for widespread use. 353 • Concerning the data volume, big data research needs further integration into SHM to meet the smart city demands and to meet the aggregated knowledge demands from smart cities. Integration data-driven and model-driven approaches still need to find an optimal level of contribution from each other. In that sense, the extent of physics-informed foundations is still yet to be determined. • In addition to the above aspects of the modern SHM paradigm, real-time or online learning, identification, and monitoring, in general, are partially achieved so far and expect further advancements. In parallel, the IoT framework and cloud computing are believed to play a vital role in minimal delays in digital twin's performance. Other than these, the use-cases of VR and AR are apparent to a limited extent, and their purposes other than sole visualization need to be investigated. • Given that each civil infrastructure has a unique presence and complex nature, the uncertainties in structural and material behavior need systematic quantification and reduction techniques. From a present SHM perspective, modern and innovative solutions have to tackle scalability and adaptiveness concerns. • Apart from the classical uncertainty problems associated with different SHM levels, novel SHM paradigm proposing citizen engagement is still insufficiently addressed according to the authors. The majority of crowdsourcing research in SHM arena is still at the conceptual level and there is growing need to establish real-life applications with real citizens. • Despite the fact that the comprehensiveness of the stateof-the-art system identification and damage detection techniques existed as of today, one can observe that fully automated approaches are still uncommon and mere. Even the data-centric methodologies rely on human decisions and interactions in numerous phases of implementations. • Looking at the energy harvesting developments supporting self-powered WSN, compromises can be seen in terms of communication of data and remote processing. More advancement in energy-aware algorithms or routing protocols could effectively reduce nodal and global energy consumption. In addition, power-generating elements can attract more attention, such as solar cells, piezoelectric, or thermoelectric elements utilizing different energy sources such as light, vibration, and heat, respectively. Prediction and optimization of the performance of the energy harvesting system is the trend for WSNs, especially in SHM applications. 354 • Despite tremendous efforts in literature, combined characteristics of damage-sensitive features and EOFs are still partially uncovered. ML approaches combined with long-term monitoring are believed to serve this line of research. As expressed in the emerging technologies discussion, the ubiquitous and remote sensing alternatives diminish practical and financial problems related to the maintenance of permanently installed systems. • The major shortcoming with data-based SHM systems, apart from the influence of EOF, is the data scarcity for supervised PR algorithms. Potential remedies may vary, but often include inefficient or infeasible workarounds. The idea of transferring knowledge between similar and dissimilar structures in the context of population-based SHM has provided solution as to no longer be concerned with lack of data. [267][268][269][270] With being able to generate a relatively complete damage-labeled data from a set of structures in a population and developing an abstract framework of metric space of structures for mapping, knowledge transfer is possible, facilitating the creation of a general ML/DL method for the entire population. This idea is still at its infancy and real-life application should prove its suitability as interoperable SHM solution in the smart city. • With the increase in the number of connected devices, the IoT ecosystem must communicate and exchange information with one another. SHM studies so far are tailored as structure-specific; however, dense networks can enable deducing identification findings in regionscale frameworks. What is more, systemic features of the civil infrastructure population can be grasped thanks to the mobility and abundance of modern measurement devices. Finally, developments of multifaceted technologies and services can open the paths to interoperability and an open standard for data while ensuring total security and privacy measures. • Other than these primary and general directions requiring future attention, partner disciplines can embed recent advances in SHM better. The extension of ML into earthquake engineering applications brings the promise of incorporating physical knowledge into data-driven models in seismic studies. The nextgeneration SHM can be coupled with ML and revolutionize earthquake engineering to solve some of the significant challenges in the field. 355 Figure 13. The system architecture of a ML-enhanced cloud-based SHM-GIS decision-making system for bridge monitoring applications, modified after Malekloo et al. 294 .
• Indirect-bridge monitoring and drive-by sensing have gained popularity among researchers. Previous attempts only considered Level 1 SHM. However, it is envisioned that, with the utilization of ML and DL, higher levels of SHM can be achieved. Concerning drive-by SHM research, one can note that the utmost effort is spent on individual vehicular data which does not fully reflect the smart city theme. Therefore, more research on vehicle-bridge-interaction (VBI) encompassing vehicle fleets is suggested. 356 • For the transition toward the future of sustainable SHM systems, integration of SHM with digital twin plus blockchain is of utmost obscurity. Considering the many aspects of the ecosystem, the coherent and synergetic connection of multiple emerging technologies is detrimental, providing better QoS to the user and increase overall system efficiency and integrity. As with other IoT devices and services, creating a middleware system that enables integration and interoperability of SHM with other parts of the smart city ecosystem is not far-fetched. • The fifth and coming sixth generation (5Gand 6G) mobile network is expected to be the center of the emerging IoT devices in the near future. With the everincreasing applications in cloud computing and smart devices, 5G promises to address the current issues of telecommunications. 5G integration with SHM is becoming widespread, although security aspects such as integration need to be further evaluated. 357 Mobile edge computing and fog networks as part of the capabilities brought by 5G and 6G enable on-site device deployment of models and algorithms used for rapid assessment of civil infrastructures. 358 • Versatile SHM systems require a robust data management scheme. It is one of the topics that receive very little attention from the community. The continuous increase in the volume and types of data received from many multisensory SHM applications overwhelms the current capabilities of data acquisitions' storage size and computational power. Cloud computing with a NoSQL database has granted the ability to manage massive structured and structured SHM data, 359 and provide the necessary graphical processing unit (GPU) power with parallel and multithreaded computation. The system architecture of a cloud-based bridge monitoring system is depicted in Figure 13. • Multisensory SHM systems are closely related to the 5 Vs of big data. In this case, time synchronization of different sensors despite clock imperfections, becomes essential. Although this topic has been studied for stationary WSNs in SHM systems, with the introduction of next-generation SHM modules such as smartphones and UAVs, time synchronization becomes critical due to lack of centralized data acquisition. Transmission delays due to communication overhead and the differences between the target and achieved sample rate introduced in the SHM system are exacerbated with multisensory systems.

Conclusion
This article provided an extensive overview of the MLengaged SHM systems with connections to the new technologies rapidly growing in the latest decade. A detailed breakdown of techniques, methods, and algorithms from the literature is presented and examined, emphasizing ML and the data-centric advancements occupying the current research trends. The survey included a systematic discussion of the steps taken to implement an ML model for SHM with pathways, taxonomies, and breakdowns. Moreover, the most common algorithms proposed for context-dependent applications were overviewed. The survey revealed that the extension of ML in SHM dramatically increased the system's capabilities, providing innovative solutions for different research challenges. The ML pipeline and corresponding algorithms have the potential to uncover the influence of EOFs due to their multivariate encapsulation capabilities. EOFs, a longlasting problem in the SHM community, is one step closer to a solution with ubiquitous data and their digital extensions. Moreover, ML solutions also draw a pathway to addressing nonstationary and nonlinear sources of variations, and compression/dimensionality reduction brings gigantic inverse problems into solvable stages.
Forthcoming mobile and noncontact technologies are arriving with their digital counterparts. They do not only offer new sources of physical parameters being observed, but also have their own embedded intelligence from consumer-grade smart devices to UAVs. Likewise, IoT is no longer a futuristic theme; it became a reality with the rapid distribution of low-cost headless computers all over the world. However, the community still has an unclear understanding of how these breakthroughs can serve the smart city agenda as well as sustainability on the monitoring side. The next decade is expected to provide alternative aspects, which attracted rare attention, such as visualization and interfaces.
Despite the intrinsic progress in ML, DL, and AI, there is an apparent gap in unsupervised SHM frameworks. Unseen conditions of real damage obstruct training possibilities, which can be barely fulfilled by synthetic datasets or physical-based realizations. Nevertheless, further advancements with label-free approaches such as populationbased SHM, can find remedies to the ongoing learning problem in SHM systems. It is obsolete that a fully automated SHM relies on this direction yet has a long way to propose its globally accepted frameworks.
In conclusion, to understand where the next-stage SHM is placed, this survey looks at the parallel developments in the multidisciplinary world of SHM from the microelectronics advancements to communication and from citizen science to cloud and edge computing. Needless to say, uncertainty reduction is boosted by revolutionary advances in regressors, classifiers, and detectors. It is the authors' opinion that the new norms in SHM unite all aspects of the digital revolution and Industry 4.0 together with the traditional lines of system identification, advanced modeling, and damage assessment.