Statistical Validation of the Grand Rapids Arch Collapse Classification

Background: The Grand Rapids Arch Collapse Classification system was devised in 2011 to assist physicians’ and patients’ understanding of the mechanisms underlying arch collapse. Five types of arch collapse are described, based on which part of the foot or ankle is affected. The purpose of this study was to determine the inter- and intrarater reliability of this classification system when used by physicians with various levels of training. Methods: A senior author identified a stratified selection of 50 patients (10 per classification type) who presented with foot/ankle pain and suitable radiographs. A survey was designed using prediagnosis radiographs and clinical synopses of the patient’s chart. The survey consisted of a description of the classification scheme and the 50 cases in a randomized order. Eight weeks later, they repeated the test to analyze for intra-rater agreement. Results: Of the 33 physicians who received the survey, 26 completed the first round (16 attendings, 4 foot and ankle fellows, and 6 residents). Overall, there was substantial agreement among raters in all five types. Kappa scores for each type were 0.72, 0.65, 0.72, 0.70, 0.63, respectively. The combined kappa score was 0.68. After 8 weeks, 13 of the 26 participants repeated the study. A Kappa analysis was once again performed for the 13 respondents, which produced a substantial level of agreement with a value of 0.74 for intrarater reliability. Conclusion: The Grand Rapids Arch Collapse Classification system was designed to provide an accessible mechanism for physicians to consistently describe arch collapse, its effects, and the conditions associated with it. The utility of this system is wholly reliant on the repeatability among clinicians. This study has demonstrated that the classification system has substantial rates of reliability among physicians of different levels of experience and training. Level of evidence: Level IV.


Introduction
In 1989, Johnson and Strom proposed that posterior tibial tendon dysfunction led to degenerative arch collapse. 5,18 In their model, there were 3 stages: stage 1 has no fixed deformity of the foot and ankle; stage 2 has a dynamic deformity of the hindfoot; finally, stage 3 has a fixed hindfoot deformity. Myerson later added a fourth stage to account for valgus tilt of the ankle. 23 Since this model was developed back in 1989, foot and ankle surgeons have developed a better understanding of not only posterior tibial tendon dysfunction, but also the pathophysiology underlying many common conditions associated with arch collapse. With this deeper understanding, there comes a need for a more updated model to explain arch collapse and the conditions associated with it. The senior authors have published many studies delving into the conditions associated with arch collapse. 2,6,15,16,21,24 The Grand Rapids Arch Collapse Classification System accounts for many common degenerative conditions associated with arch collapse.
The Grand Rapids Arch Collapse Classification (GRACC) was devised to describe the gradual progression of arch collapse due to tensile failure of the plantar apparatus of the foot. The classification system was designed around the premise that one of the major forces behind degenerative arch collapse is gastrocnemius contracture-defined as <10 degrees of ankle dorsiflexion with the knee in full extension. 2,8,14,20 Type I deformities result from gastrocnemius equinus; they are characterized by normal radiographs, including no arthritic changes, a normal talonavicular coverage angle, a normal midfoot linear relationship, and a normal sesamoid position. [10][11][12][13]28 A type II deformity results from progressive medial column incompetence with weight bearing, leading to elevation of the first ray with an overload of the lesser metatarsal heads. These conditions can lead to the processes ( Figure 1). They are characterized radiographically by normal navicular-cuneiform coverage with evidence of forefoot deformity and an elevated first ray. 3,4,8,17,30 A type III deformity results from further dorsal compression leading to midfoot arthritis. On radiographs, type III deformities demonstrate second and third tarsometatarsal arthritis and medial dorsal navicular-cuneiform arthritis; however, talonavicular coverage remains normal. 21 In type IV deformities, there is failure of both the posterior tibial tendon and/or spring ligament. Patients with type IV deformities typically present with hindfoot valgus, talar head uncovering, and subtalar joint subluxation or arthrosis. Ultimately, deltoid ligament attenuation can lead to type V deformity involving valgus tilting of the ankle and tibiotalar joint arthropathy. [25][26][27]30 The authors hoped to develop an instrument that could be reliably used by physicians with varying experience. Its simplicity and potential for reliability make it an excellent resource for patient education as well as clinician decision making. In order for the GRACC to be considered a viable, useful classification system, it must be consistently reproducible by physicians. The primary goal of this study was to examine the inter-and intrarater reliability of the Grand Rapids Arch Collapse Classification.

Materials and Methods
The study was determined to be exempt by the Spectrum Health IRB. An a priori power analysis determined that 50 clinical cases were required to achieve significant reliability among our cohort. 1,7,20 Cases were randomly selected by a medical student who had no prior understanding of the classification system using CPT codes to filter patients. Most notably, 27687 for gastrocnemius recession was used to include patients who had undergone a gastrocnemius recession as this is involved with all 5 types of the GRACC. Charts were reviewed from 4 foot and ankle orthopedic surgeons from a single practice to identify 15 patients for each deformity type as documented in each patient's encounter. The senior author then randomly selected 10 patients for each deformity type who had all necessary radiographs and physical exam findings documented. A survey was constructed using deidentified pretreatment radiographs and clinical synopses of corresponding patients ( Figure 2). All subject radiographs included weightbearing anteroposterior and lateral foot as well as anteroposterior and lateral ankle views ( Figure 3). Clinical synopses included pain location along with a list of relevant objective findings in the evaluation of foot and ankle pain. The clinical synopses included result of Silfverskiöld test, presence of a hypermobile first ray as determined by assessing motion at the first tarsometatarsal joint, presence of any lesser toe deformities, presence of posterior tibial tendon weakness. First ray hypermobility had been assessed by grasping and squeezing at the tarsometatarsal articulation and assessing the degree of movement with dorsal pressure to the first ray. PTT dysfunction was tested using a single-leg heel raise test. All tests had been performed by foot and ankle fellowship-trained orthopedic surgeons. Deidentified radiographs and chart summaries were presented in a randomized order. For each patient, they were asked to designate type I, II, III, IV, or V or Not Applicable if they thought the patient did not fit any of the designated types of deformity.
This test was sent out to 33 physicians of varying experience. Of the 33 physicians who received the test, 26 completed the first round (16 attending surgeons from various subspecialties, 4 foot and ankle fellows, and 6 senior orthopedic surgery residents) to evaluate interrater reliability. Of these 26 physicians, 13 completed the second round 8 weeks later in order to evaluate intrarater reliability.
An independent statistics group determined the reliability rating of each type of deformity for the GRACC using Cohen's kappa coefficient analysis. The level of agreement for each kappa [k] value was determined based on the recommendations by Cohen and Landis (Table 1). 9,19,22 Reliability was determined both for inter-and intrarater observations for all 5 deformity types.

Results
The interobserver reliability average for all five deformity types showed substantial agreement with a k value of 0.6839. The k value for each deformity type was 0.7164, 0.6510, 0.7219, 0.7013, and 0.6291, respectively ( Table 2). The intraobserver reliability demonstrated substantial agreement with a k value of 0.744 (Table 3).

Discussion
Results of this study showed substantial reliability among physicians of varying experience for interrater reliability. This reliability supports the authors' hypothesis that the GRACC would be easy to adopt and implement. The consistency of agreement in this study demonstrates the consistent application that is crucial to the utility of any classification system. Furthermore, there was substantial intrarater reliability, indicating the ability of this classification to monitor a patient's deformity progression over time. Future work can focus on measuring the validity and responsiveness of this classification scheme. In this case, validity would require comparing responses to gold standard responses. Responsiveness would be measured by observing how well this classification system works in patients whose deformity type changes over time. 29 The surgical and nonsurgical treatment options depend on the arch collapse type. Patients with a type I deformity can be   Figure 4). Treatment with the above options for each deformity type has produced good outcomes. 2 This study was limited by its relatively small sample size of physicians. Because of the low sample size, a greater number of patients than initially anticipated were required to achieve sufficient power. The large number of clinical vignettes that each physician had to work through could have led to fatigue by the end of the survey; randomization of the order of cases for each physician was used in an attempt to account for this bias. The study is further limited by its retrospective nature and any bias introduced by the creation of the clinical vignettes and image selection. These were minimized by having an inexperienced medical student extract the cases and using the same radiograph views for each patient. Recall bias Figure 4. Examples of surgical templates used by a tertiary foot and ankle specialty clinic for over 18 years. These templates are used for patients classified with a type II, type III, and type IV deformities, respectively. was minimized for intrarater reliability by using a standardized 8-week delay in repeating the test. Finally, this study is limited by physical exam findings being given to the clinicians as opposed to evaluating themselves in the clinic. Ideally, clinicians would be able to independently evaluate each patient to obtain objective information; however, this was not possible in a study of this nature. Factors such as first ray mobility and, to a much lesser degree, gastrocnemius equinus can be somewhat subjective on clinical exam and likely represent a spectrum as opposed to a binary variable. However, for the purposes of this assessment, they were treated as binary variables to facilitate classification. In summary, this study has demonstrated that the GRACC has substantial reliability among physicians with proper training on the classification system. This classification offers potential as a beneficial tool in guiding treatment and decision making for management of patients with adult acquired arch collapse.