Validating a Physics-Based Automatic Classification Scheme for Impact Echo Signals on Data Using a Concrete Slab with Known Defects

Impact echo (IE) is capable of locating subsurface defects in concrete slabs from the vibrational response of the slab to a mechanical impact. For an intact slab (“good” condition), the frequency spectrum of the IE is dominated by a single peak corresponding to the slab’s “thickness resonance frequency,” whereas the presence of subsurface defects (“fair” or “poor” conditions) could manifest in various ways such as multiple distinct peaks at frequencies higher, or lower, than the thickness resonance. In previous research, the authors have proposed a frequency partitioning of the spectrum for IE signal classification. Firstly, the thickness resonance frequency band is identified using a data-driven approach and then the IE signals are represented by their energy distribution in three bands—frequencies less than, within, and greater than the thickness resonance. Following this feature extraction, an unsupervised clustering approach is used to identify the centroids for each signal class—good, fair, and poor—which are further used to classify any test signal into one of the three aforementioned classes. The classification is developed by training on unlabeled IE signals from real bridge deck data (the Federal Highway Administration’s [FHWA’s] InfoBridge dataset) without making use of any labeled data. This study aims to validate the proposed methodology on a labeled dataset of eight reinforced concrete specimens constructed at the FHWA Advanced Sensing Technology Nondestructive Evaluation laboratory having known artificial defects. Our findings indicate that the physics-based feature definition and the method developed on real bridge data are robust and can classify IE signals in the labeled data with moderate accuracy.

Condition monitoring and evaluation of bridge decks are imperative to maintaining structural reliability during their service life.Concrete bridge decks are traditionally evaluated using visual inspection methods.Nondestructive evaluation (NDE) techniques have been increasingly popular for the condition evaluation of bridge decks to indicate volumetric damage and deterioration that might not be visible from the surface (1)(2)(3)(4)(5)(6)(7).NDE techniques rely on various physical phenomena (acoustic, electric, thermal, etc.) to identify specific indicators of the deterioration processes.For instance, concrete durability assessment with respect to corrosion can be done using electrical resistivity (ER) and half-cell potential (HCP) tests (8,9), while the electromagnetic properties of the material, the location of reinforcement, and the presence of voids can be learned using ground penetrating radar (GPR) (10).Of particular interest are ultrasonic or acoustic methods, such as impact echo (IE), that can identify and accurately detect the presence of discontinuities such as debonding or delamination within a material (11)(12)(13)(14) (say, concrete bridge decks).
IE excites and records the modal frequencies corresponding to the thickness resonance of the slab, geometric boundaries, and crack-like defects running parallel to the surface (delamination) under a mechanical impact.IE responses (or signals) are typically analyzed in the frequency domain (using the fast Fourier transform algorithm) and the spectral responses are interpreted to assess the condition of the test slab and identify potential defects.See Figure 1 for an illustration of the IE signal-spectrum pair corresponding to each condition-good, fair, and poor.The IE spectrum for a slab in ''good'' condition, that is, free from damage, exhibits a unimodal spectrum with a single-frequency peak corresponding to its thickness resonance frequency (f h ), since most of the impact energy is reflected from the slab backwall (15).Similarly, severely deteriorated slabs (''poor'' condition) are identified by a dominant lowfrequency peak caused by the flexural vibrations of the debonded concrete.However, the frequency associated with these vibrations is considerably lower than the thickness resonance frequency (f h ).In contrast, moderate damage (''fair'' condition) or deep delamination typically introduces additional higher frequency components (f d greater than f h ) in the spectrum caused by the reflections from the deeper delaminations.Therefore, the distribution of energy across frequencies varies depending on the type and extent of damage, suggesting the appropriateness of spectral analysis for IE signal interpretation.However, Fourier transform-based approaches have known limitations for analyzing non-stationary IE signals.Therefore, a two-dimensional joint time-frequency analysis (say, using wavelet transform) may be used as an alternative for feature extraction.
Although effective in detecting damage and delamination in concrete slabs, manual IE data interpretation has limited utility in bridge condition evaluation practices.Moreover, there exists limited research on data-driven approaches to automatically classify IE signals.Prior research on automated IE signal analysis relies on machine learning (ML) algorithms such as the support vector machine (16) and deep learning (DL) (17)(18)(19)(20)(21)(22).However, DL models require large data volumes for their training, which is often not the case for labeled IE signals.Limited availability of labeled IE data could lead to model ''overfitting''-a phenomenon where the model fails to generalize its performance to ''unseen'' data.As demonstrated by Dorafshan and Azari (21), transfer learning is an efficient way to tackle this problem.Therefore, the limitations of DL and the scarcity of labeled IE data motivate the development of an unsupervised modeling approach, as proposed in previous research (23).Based on the distinct spectral characteristics of the three conditions (good, fair, and poor) in that study, the authors have partitioned the frequency spectrum into three non-overlapping bands having frequencies less than, within, and greater than the thickness resonance frequency.The energy in each of the bands is used as a feature definition in that study.Subsequently, IE signals are classified into one of the three categoriesgood, fair, or poor-based on their spectral signatures.This classification scheme was developed using unlabeled IE signals from the Federal Highway Administration's (FHWA's) Long Term Bridge Performance (LTBP) InfoBridge dataset.The objective of this study is to validate the classification scheme on a set of experimental slabs with ''known'' subsurface damages (therefore labeled IE data) from the FHWA Advanced Sensing Technology (FAST) NDE laboratory.Subsurface defect classification for bridge decks using IE from the FAST dataset has also been demonstrated (21,22) using DL models.The remainder of the paper is organized as follows.Firstly, the IE data used are described, followed by the thickness resonance estimation and classification methodology as proposed by Sengupta et al. (23).Next, we present the validation results on the experimental concrete slabs, followed by a discussion on the robustness of the model.

Data Description
In this study, the results of IE tests performed on eight reinforced concrete slabs at the FAST NDE laboratory are used for validation purposes.Each concrete slab, 3.0 m long, 1.0 m wide, and 0.2 m thick, is constructed with ''known'' artificial defects-shallow delamination, deep delamination, honeycombing, voids, and a transverse vertical crack, as shown in Figure 2.These defects extend 0.30 m along the length of the specimens and 0.20 m along the width of the specimens.The concrete slabs are constructed by a normal-weight concrete mix with a water-tocement ratio of 0.37 and have a 28-day compressive strength of 27.6 MPa, with two mats of uncoated steel reinforcements of 15.8 mm diameter at 203 mm spacing in both the lateral and transverse directions.Several NDE techniques including IE are used to understand their efficacy and feasibility in detecting subsurface defects.IE tests are performed at a regular grid on the concrete slabs with a grid spacing of 100 mm, both longitudinally and transversely.The test setup consists of an impact source (11 mm diameter steel spheres on spring rods) and a receiving transducer or sensor (e.g., an accelerometer).

Methodology
This section provides a brief summary of the IE signal classification scheme as proposed in previous research (23).

Estimating the Thickness Resonance Bandwidth
To characterize IE signals, it is necessary to establish the spectral features for ''good'' signals (recorded at slab locations free from damages), where most of the impact energy is reflected back from the bottom of the deck and the frequency spectrum is dominated by one frequency peak corresponding to its thickness resonance.This allows us to automatically identify the thickness resonance frequency (f h ) of the slab.Although there exists an analytical expression (Equation 1) to calculate the thickness resonance frequency for a given slab thickness (h) and P-wave velocity (V p ), these parameters might not be readily available as with publicly available databases such as the LTBP InfoBridge: where C is 0:94 À 0.96 for concrete slabs Therefore, a data-driven approach was developed to estimate the thickness resonance frequency.The IE spectra for all points in the test slab are analyzed to identify the peak frequency corresponding to the maximum energy content in each spectrum.The peak frequency values for all test points on the slab are aggregated to identify the most prevalent frequency peak, which is an indicator of the thickness resonance frequency for each bridge deck.
The inherent assumption in this approach is that the thickness resonance is excited at most test points, except at extremely deteriorated points, where the flexural mode of vibration dominates.To estimate the thickness resonance bandwidth, the histogram of peak frequencies is parameterized with a Gaussian mixture model (GMM) assuming that all data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters.The parameters of the GMM are estimated using an expectation-maximization technique, while the number of Gaussian components in the mixture model is decided based on the Bayesian information criterion (BIC) score.The BIC score provides a weighted measure of the change in the log-likelihood function when different numbers of GMM components are considered.The lowest BIC score provides the model that would be closest to the model with the maximum loglikelihood value, and therefore is chosen as the final model.Once the GMM is fitted with an optimum number of mixture components, the Gaussian distribution with the maximum mixture proportion is chosen as a representation of the thickness resonance.The mode (f m ) of this distribution and f m 6 3s are used as the point estimate and interval estimate of the thickness resonance frequency of the slab, respectively.

Partitioning the IE Power Spectrum
IE signals corresponding to different conditions (good, fair, and poor) have distinct spectral patterns, that is, the distribution of energy among frequencies differs based on the type and extent of damage.Therefore, the energy content in each frequency band might be an indicator of the damage state represented by the IE signal.Once the thickness resonance band of the deck is estimated, the IE spectrum for each signal is analyzed to calculate the energy in the three non-overlapping bands-frequencies less than (Band 1), within (Band 2), and greater than (Band 3) the thickness resonance frequency.Therefore, each IE signal is represented by a three-dimensional vector of the proportion of energy in each band in order, that is, ½p 1 , p 2 , p 3 , where p i refers to the proportion of energy in the ith band.For example, for a ''good'' signal, since most of the impact energy is concentrated within the thickness resonance band (i.e., Band 2), it is expected that the energy proportion is shifted toward coordinate ½0, 1, 0 in the feature space, while a feature representation of a ''poor'' signal is skewed toward ½1, 0, 0.

Classification of IE Signals
Each IE signal is represented by a three-dimensional feature vector of energy proportions in the three bands: Bands 1-3.The feature vectors are input to an unsupervised clustering algorithm, that is, k -means (25) with k = 3.The k -means clustering is performed on the feature vectors from all slabs to partition the feature space into three distinct regions and identify the centroids corresponding to ''good,'' ''fair,'' and ''poor'' signal classes.As a result, each signal in the training dataset is assigned to a particular condition class.The classification of a test (unseen) signal then becomes a supervised learning problem.During the inference phase, a test signal is first transformed into the feature vector and then classified into one of the three classes (good, fair, or poor) using a k -nearest neighbor (k -NN) algorithm with an Euclidean distance metric.Specifically, we use the centroids of the clusters obtained from the k -means algorithm to perform the nearest neighbor search in the classification.Notably, the performance of the clustering approach depends on the observed representation of points in the feature space.Limited data points in the feature space might adversely affect the clustering results.Here, we aggregated IE signals belonging to different conditions (intact and all defect types) for all eight slabs to have sufficient dispersion in the feature space.

Results and Discussion
This section details the data analysis and results on the eight concrete slabs.As stated in the Data Description section, the slabs include known built-in artificial defects: shallow and deep delamination, honeycombing, and voids.For each slab, the peak frequency is identified for all IE spectra.Following previous studies on this dataset (21,22), the portion of the IE spectra corresponding to frequencies greater than 20 kHz was discarded as noise.
The histograms of the peak frequencies in the IE spectra for each of the eight slabs are shown in Figure 3.We observe that each histogram is characterized by a distinct peak indicating that the majority of the test points on the bridge deck have peak frequencies near 10 kHz.To quantify the thickness resonance bandwidth, Gaussian mixtures are fitted to parameterize the histograms using the BIC score to determine the best number of distributions.Notice that while three Gaussian distributions well characterized the IE spectra for most slabs, for slabs 1, 5, and 8 only two distributions were needed.The Gaussian distribution with the maximum mixture proportion (highlighted in red in the figure) is used for estimating the thickness resonance bandwidths, which are given in Table 1.Using the analytical approach (Equation 1), the thickness resonance frequency is calculated in the range of 8400-10,800 Hz.These values correspond to P-wave (V p ) velocities between 3500 and 4500 m/s, using the known slab thickness of 0:2 m.Therefore, the estimated bandwidths using our proposed data-driven approach are found to be in good agreement with those obtained analytically.
Having the thickness resonance bandwidth for each slab, we can now partition the IE spectra into three bands corresponding to lower than, within, and above the thickness resonance band.Next, the proportion of energy in each band is calculated and is used as a 3 3 1 feature vector.Note that each value in this feature vector corresponds to the percentage of energy in Bands 1-3. Figure 4 presents the feature space representation of the IE signals belonging to each slab for illustration purposes.Each data point is labeled with the corresponding condition of no defect, shallow delamination, honeycombing, void, and deep delamination.The data points corresponding to shallow delamination and void defects are observed to fall in the lower left-hand side of these figures.This corresponds to IE signals that have the largest energy content in the frequency band less than the thickness resonance, that is, Band 1.Similarly, moderately deteriorated cases (deep delamination and  honeycombing) are generally found toward the upper middle part of these figures.This corresponds to having a considerable amount of energy in Band 3 as well, which arises from additional high-frequency components.As expected, spectra corresponding to intact test locations with ''no defect'' have most energy in the thickness resonance band, that is, Band 2. However, a considerable number of test points labeled as ''no defect'' have energy concentrations also in Band 3, which could be because some damage occurred during the construction process.Although alternative NDE techniques have not been utilized in this study to identify potential damages in the slabs, research (22,24) on damage identification using IE on the same dataset has highlighted the presence of additional damages in the slabs.As we observe from Figure 4, except for shallow delamination, IE signals obtained from different damaged portions of the slab are not localized; rather, they show significant scatter in the feature space.This could be because of the limitation of the proposed spectral approach for delamination characterization to distinguish between different damage types.Next, a k -means clustering is performed to identify centroids corresponding to different conditions of the slab.To have sufficiently large number of points in the feature space, a total of 2016 IE signals from all eight slabs are combined and the k -means clustering is performed on the combined dataset to generate three clusters, assumed to correspond to good, fair, and poor conditions, and the centroid of each cluster is identified (see Table 2).To better understand the variance in data across different datasets, the cluster centroids obtained from the experimental data are compared with that obtained from the LTBP dataset (used by Sengupta et al. [23]), as shown in Table 2. Interestingly, the centroids obtained from the experimental data and LTBP data are in good agreement.Some differences are observed,  especially for the signals labeled as ''good.''It can be seen that centroids corresponding to ''good'' signals have higher energy in Band 3 as compared to that obtained from the real bridge data.
Using the centroids of each cluster, each signal in the dataset can be classified into one of three classes-''good,'' ''fair,'' or ''poor''-using a k -NN classification (see Figure 5).Further, IE signals corresponding to different defect types in the laboratory slabs may be assigned a label of ''good,'' ''fair,'' or ''poor,'' as shown in Table 3.Notice that the features of the IE signals recorded on portions of the slab with shallow delamination and void defects make up the majority of points labeled as ''poor'' in Figure 5.This is confirmed in Table 3, which suggests that more than 80% of the IE signals corresponding to shallow delamination or voids fall in the ''poor'' category.In the cluster labeled ''fair,'' a mix of IE signal features from portions of the slab with no defects and deep delamination or honeycombing are found.This is again confirmed by Table 3, which shows that the IE signals corresponding to the no defect regions or deep delamination are categorized as ''fair'' at least 50% of the time.In the ''good'' cluster, there are mostly IE signal features from the no defect portions of the slab, which is again corroborated in Table 3.
The IE spectra corresponding to voids have significant energy in the frequencies lower than the thickness resonance, which is similar to that of shallow delamination.On the other hand, spectra corresponding to deep delamination and honeycombing have a peak frequency centered around the thickness resonance with additional higher frequency components.Therefore, based on expert opinion (and the clustering results in Table 3), the accuracy of the IE signal classification is evaluated assuming that shallow delamination and voids represent the ''poor'' condition, deep delamination and honeycombing are the ''fair'' condition, and no defects are the ''good'' condition.The classification accuracy based on the centroids identified from the laboratory data is shown in Table 4.Note that in the confusion matrix, the term ''True'' refers to labels that have been assigned based on expert interpretation.From Table 4 it can be seen that the accuracy of classification for the eight slabs is around 60% on average, ranging from 46% for Slab 4 to 79% for Slab 7. Interestingly, the model performs with a lower accuracy on Slabs 1-4, compared to Slabs 5-8.Regardless, most of the misclassification is caused by a significant number of test points with ''no defect'' having energy in Band 3 and therefore being classified as in ''fair'' condition.Figure 6 presents IE signals and their corresponding spectra for two selected locations from Slab 1 that presumably have no defects and two selected locations from Slab 1 that are known to have deep delamination.As can be seen from the figure, spectra corresponding to ''no defect'' have significant energy in the frequency range greater than 10 kHz, in addition to the peak frequency centered around 10 kHz (thickness resonance).A very similar IE spectra is observed for deep delamination with significant energy in the frequency range greater than 10 kHz, as can be seen in Figure 6.Visual inspection of Figure 6 confirms that some locations with no defects show the signatures of deep delamination in their IE spectra, and the expert classification of these points is ''fair.''Therefore, although these locations have no built-in defects, the spectra indicate the possibility of a certain degree of subsurface damage possibly generated during the construction process.Similarly, Dorafshan and Azari (22) and Lin et al. (24) have also emphasized the discrepancy in accuracy between the two sets of slabs (Slabs 1-4 versus 5-8), suggesting the presence of subsurface damages in areas without built-in defects.
Moreover, the classification performance relies on how signals are grouped into the three classes.As we observe from Figure 5, honeycombing and deep delaminations span through both ''poor'' and ''fair'' classes.In fact, there seems to exist a fuzziness to the degree of association to both classes.The centroids obtained from the LTBP dataset are used to classify the experimental data in a similar fashion to compare the transferability of results across different datasets (see Table 5).The   results suggest that the classification accuracy remains similar regardless of the dataset used to identify the centroids.Using the signal classification scheme (and experimental data centroids), the condition maps for the eight slabs are generated as shown in Figure 7, where the different in-built defects can be clearly identified.However, as before, deep delamination is not accurately identified in all cases.Moreover, the condition maps generated using LTBP data (Figure 8) centroids are similar to the ones presented using experimental data.
The proposed method uses an unsupervised clustering on energy proportions in different frequency bands as features to identify class centroids.In addition, the frequency bands are defined relative to the thickness resonance of the deck slab.Therefore, this approach does not depend on the specifics of the bridge deck itself, that is, signals with the same condition would still be identified correctly even if the thickness resonances of the decks are different.Therefore, the proposed methodology is robust and suitable for transfer learning.

Conclusions
IE is an acoustic nondestructive testing method that is used to identify defects in concrete structures such as bridge decks.Researchers have proposed various datadriven models for autonomous interpretation of IE signals to infer subsurface defects that might not be visible from visual inspection methods.IE signals belonging to different defect locations show distinct spectral characteristics that hint at the degree of subsurface damage.In Sengupta et al. (23), the authors proposed a frequency partitioning of the IE spectrum for autonomous classification into three classes-''good,'' ''fair,'' and ''poor.''This classification scheme was developed using real bridge data from the FHWA LTBP dataset that were unlabeled (i.e., without ground truth).In this study, we present a validation of the physics-based classification scheme on an experimental dataset with known defects.Results indicate that the classification method is robust and can classify signals with moderate accuracy.As expected, shallow delamination and voids are most accurately identified, while honeycombing and deep delamination are misclassified.These defects are often difficult to isolate from ''no defect'' points because of the limitations in the spectral approach.Often, these defects have different degrees of association with the three classes and therefore assigning a particular class is not well justified.In such cases, a fuzzy classification approach could be used as an appropriate alternative, which will be explored in studies.

Figure 1 .
Figure 1.Impact echo (IE) signals and spectra corresponding to three different delamination conditions: good, fair, and poor.
IE signals are recorded at a sampling frequency of 200 kHz for a duration of 10 ms at every point of the grid.There are 252 recorded IE signals per laboratory concrete slab.Further details on the construction of concrete slabs and the testing can be found in Lin et al. (24).

Figure 2 .
Figure 2. Schematic diagram of the concrete slabs with artificial defects.

Figure 3 .
Figure 3. Histograms of peak frequencies with the fitted Gaussian mixture models for test slabs 1-8 (color online only).

Figure 4 .
Figure 4. Feature space representations of impact echo signals in test slabs 1-8.

Figure 5 .
Figure 5. k -means clustering on the combined dataset.

Table 1 .
Estimates of Thickness Resonance Bandwidth for Each Test Slab

Table 2 .
Comparison of Centroids Obtained using Laboratory Data and Long Term Bridge Performance (LTBP) Data

Table 3 .
Clustering of Impact Echo Signals from Different Defects

Table 4 .
Confusion Matrices of Impact Echo Signal Classification using Laboratory Data Centroids Figure 6.Impact echo signals and spectra for regions with ''no defect'' and ''deep delamination'' from Slab 1.

Table 5 .
Confusion Matrices of Impact Echo Signal Classification using Long Term Bridge Performance Data Centroids