Clinically Relevant Molecular Biomarkers for Use in Human Knee Osteoarthritis: A Systematic Review

Objective Biomarkers in osteoarthritis (OA) could serve as objective clinical indicators for various disease parameters, and act as surrogate endpoints in clinical trials for disease-modifying drugs. The aim of this systematic review was to produce a comprehensive list of candidate molecular biomarkers for knee OA after the 2013 ESCEO review and discern whether any have been studied in sufficient detail for use in clinical settings. Design MEDLINE and Embase databases were searched between August 2013 and May 2018 using the keywords “knee osteoarthritis,” “osteoarthritis,” and “biomarker.” Studies were screened by title, abstract, and full text. Human studies on knee OA that were published in the English language were included. Excluded were studies on genetic/imaging/cellular markers, studies on participants with secondary OA, and publications that were review/abstract-only. Study quality and bias were assessed. Statistically significant data regarding the relationship between a biomarker and a disease parameter were extracted. Results A total of 80 studies were included in the final review and 89 statistically significant individual molecular biomarkers were identified. C-telopeptide of type II collagen (CTXII) was shown to predict progression of knee OA in urine and serum in multiple studies. Synovial fluid vascular endothelial growth factor concentration was reported by 2 studies to be predictive of knee OA progression. Conclusion Despite the clear need for biomarkers of OA, the lack of coordination in current research has led to incompatible results. As such, there is yet to be a suitable biomarker to be used in a clinical setting.


Introduction
Osteoarthritis (OA) is one of the world's most prevalent diseases. In the United Kingdom, 10.9% and 18.2% of the population are estimated to be affected by hip and knee OA, respectively. 1 Despite the early onset of symptoms, there has been no widely accepted intervention for altering disease progression, and the nonsurgical treatments for OA are largely based on symptom relief. From a pathophysiological perspective, both microscopically and macroscopically, OA is a highly heterogeneous disease. This heterogeneity makes it difficult to formulate diagnostic and classification criteria for OA. Diagnostic criteria must be sufficiently broad to incorporate all phenotypes, but accurate enough to only identify people with the disease. Clinically, OA is diagnosed through a clinical history and physical examination. When subjects are being enrolled in a research study, OA is diagnosed and classified radiographically, mostly through the use of the Kellgren-Lawrence (K-L) framework. The K-L framework is based on subjective analyses and thus predisposes the results to observer bias. It is also not analogous to pain and function of the patient. For this reason, it has been postulated that measurable molecular biomarkers could provide a novel, and more objective method for diagnosing and monitoring treatment effects in patients with OA.
A biomarker is defined as "a characteristic that is objectively measured and evaluated as an indicator of normal biologic processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention." 2 In 2006, Bauer et al. 3 proposed the BIPED system for classifying molecular and genetic biomarkers in OA. The acronym stands for; B-burden of disease, I-Investigative, P-prognostic, E-efficacy of intervention, D-diagnostic. In 2011, another category was added-safety (S). This classification system was designed to help direct research into biomarkers for use in clinical trials.
Having an objective method of staging, predicting disease progression and identifying OA patients would be an invaluable asset in a clinical environment.
A comprehensive review of biomarker research was published in 2013 following a meeting of the European Society for Clinical and Economic Aspects of Osteoporosis and Osteoarthritis (ESCEO). It was concluded that no biomarker investigated had shown sufficient evidence to guide clinical trials or be used in a clinical environment. The review included a description of areas requiring further research and development to facilitate the use of biomarkers in OA. 4 This systematic review aims to provide an up-to-date list and analysis of molecular biomarkers in knee OA.

Search Strategy
A literature search was performed on 2 electronic databases: MEDLINE (1946 to April 2018) and Embase (1974 to April 2018). These databases were selected so as to remain consistent with the ESCEO review. The terms "knee osteoarthritis," "hip osteoarthritis," and "osteoarthritis" were combined using the "OR" function. These terms were then combined with "biomarker" using the "AND" function. All subheadings were included for each of the search terms.
Results were then limited to human studies in the past 5 years. These results were reviewed by title and abstract using the inclusion and exclusion criteria ( Table 1).

Assessment of Study Quality
Quality of the studies was assessed by one reviewer using the National Institutes of Health (NIH) Study Quality Assessment Tool. This tool had subsections that were applicable for assessing meta-analyses and case-control, cohort, and crosssectional studies. It uses a series of questions to help the user asses the internal validity of a study and to what extent the results of the study can be considered valid. 5 The total number of yes' is then interpreted to give an overall quality rating for the study. For cohort and cross-sectional studies, there are 14 questions: 0 to 4, poor study; 5 to 9, fair study; and 10 to 14, good study. For meta-analyses, there are 8 questions: 0 to 2, poor study; 3 to 5, fair study; and 6 to 8, good study. For case-control studies, there are 12 questions: 0 to 4, poor study; 5 to 8, fair study; and 9 to 12, good study.

Data Extraction
Data extracted from the studies included the markers studied, the BIPEDS class investigated for the biomarker, source of biomarker, biomarker analysis method and statistical data, including P-values, odds ratios (OR), and correlation coefficients, as appropriate. Statistical significance was set at P = 0.05.

Data Presentation
Initially the molecular biomarkers identified in the studies were categorized into 4 broad subgroups: matrix degrading enzymes, matrix molecules, regulatory molecules, and other molecules. The 4 groups are presented in individual tables in this review. Relevant statistical information to support/ oppose the BIPEDS classification is included in the tables. A further table listing the algorithms identified in the studies is also presented in this review. Studies that found no significant connection between the marker and OA are included in Supplementary Appendix 1.

Literature Search
A total of 80 studies were identified in total that fit the inclusion and exclusion criteria to be included in the review ( Figure 1). The NIH score for each study along with biomarker source and method of laboratory analysis are listed in Supplementary Appendix 2.

Matrix Degrading Enzymes
Eight molecules were identified that were appropriate for this category (Table 2). With regard to the BIPEDS method, 9 were investigated as burden of disease markers and 6 as diagnostic markers. Li et al. 6 provided evidence for a disintegrin and metalloproteinase with thrombospondin motifs (ADAMTS)-4 and ADAMTS-5 that demonstrated they were present in significantly different concentrations in early osteoarthritis (eOA) than in later stages of OA in serum. Matrix metalloproteinase (MMP)-1 and MMP-3, the most studied marker in this category, were shown to be significantly elevated in OA patients compared with healthy subjects and eOA patients. MMP-3 also had an area under the curve (AUC) value of 0.690 when a receiver operating characteristic (ROC) analysis was carried out for its diagnostic ability. In this study, eOA patients were defined as Kellgren-Lawrence (K-L) grade 1/2. 6

Matrix Molecules
A total of 21 markers were grouped as matrix molecules (Table 3); 20 were investigated as burden of disease markers, 17 as diagnostic markers and 15 as prognostic markers. The Foundation for the National Institutes of Health (FNIH) OA biomarkers consortium evaluated the ability of 14 biomarkers in serum, urine or both to predict case status at 48 months and differentiate between 3 progressor types; pain progression, joint space loss progression and pain and joint space loss progression over 48 months. Twelve-and 24-month time integrated concentrations (TICs) of urinary Col2-3/4 C-terminal cleavage product of human type II collagen (C2C) predicted progression in all 3 progressor types. 7 C-telopeptide of type II collagen (CTXII) was shown to have the best predictive ability of case status and progression. CTXII was the most studied biomarker from all 4 groups. With regard to knee OA progression both serum and urine CTXII concentrations were shown to predict this. 8,9 Despite being investigated by 11 studies this was the only parameter that was investigated and reported as being statistically significant in 2 sources. Kraus et al. 7 showed the ability of urinary and serum NTX-1 concentrations at 12 and 24 months to predict 48-month case status. Using K-L grade to define OA, He et al. 10 reported a significant difference in C-Col10 between K-L grade 0 and K-L grade 2 (P = 0.04). Serum concentrations of hyaluronic acid (HA) were correlated with progression of joint space narrowing in patients classified as K-L grade 0/1 (β = 0.15, P = 0.021). 11

Regulatory Molecules
A total of 35 regulatory markers were identified in the studies (Table 4). With regard to the BIPEDS method, 33 were investigated as burden of disease markers, 21 as diagnostic markers and 6 as prognostic markers. β-catenin was significantly reduced in eOA compared with late/intermediate stage OA (P < 0.05). The same study also demonstrated that serum concentrations of transcription factor 4 were significantly higher in eOA patients when compared with healthy controls (P < 0.002). Classification of stage of OA was carried out for 32 patients using the Mankin scoring system following a TKR. 12 Indian hedgehog (IHh) protein was elevated in SF in eOA patients, classified as patients with Outerbridge scale 1/2 cartilage breakdown, compared with healthy controls (P < 0.001). 13 Using K-L grades 1/2 as a definition of eOA, serum concentrations of angiopoietin-2, IL-8, follistatin, granulocyte-colony stimulating factor (G-CSF), vascular endothelial growth factor (VEGF), and hepatocyte growth factor were shown to be significantly different in eOA than in HCs. 14 Synovial fluid and serum concentration of VEGF have also been reported as being correlated with K-L grade in 2 separate studies. 14,15 Other Molecules A total of 25 markers did not fit into the other 3 categories (Table 5); 18 were investigated as burden of disease markers, 12 as diagnostic markers and 6 as prognostic markers as per the BIPEDS method. None of the markers in this category have been verified as potential biomarker candidates by more than 1 study. Two studies investigated amino acids. The study by Chen et al. 16 that investigated alanine and taurine reported an AUC = 0.928 and AUC = 0.920, respectively, when used to diagnose OA in a study sample of 67. Arginine, investigated by Zhang et al. 17 had an AUC = 0.984.

Biomarker Panels
A total of 11 biomarker panels were identified in the literature included in this study ( Table 6). The source of all biomarkers for use in algorithms was either serum or urine and their use was demonstrated for predicting disease presence, severity, and progression. Saberi et al. 18 presented an algorithm that consisted of patient demographics, biomarkers, and radiological input. The algorithm was developed using patient data from the Rotterdam study cohort, which consisted of 1335 patients. In this cohort, the algorithm had an excellent ability to predict disease progression over 2.5 years (AUC = 0.872). Of the 12 algorithms described below, 2 specifically targeted the early diagnosis of OA. 19,20 Both of the studies used the same methods of patient recruitment and sampling. To be deemed as having eOA, patients had to have new onset knee pain, normal radiographs, and Outerbridge grade I/II. The algorithm consisting of citrullinated proteins (CPs), hydroxyproline, anti-CCP antibody, age and gender had the following statistics when distinguishing eOA from healthy controls and inflammatory arthritic diseases; AUC = 0.86, positive predictive value (PPV) = 0.733, and negative predictive value (NPV) = 0.885. 19 The second algorithm for diagnosing eOA was intended for use after an individual had been excluded from the healthy control group. It combined anti-CCP antibody with biomarkers of protein oxidation, nitration, and glycation to give an AUC of 0.98. 20

Discussion
Using serum and urine to detect markers is advantageous because obtaining them is relatively non-invasive and samples are readily obtained. However, they effectively sample the whole body making disease localization difficult; some biomarkers, for example, regulatory and matrix molecules, are unusable as they are neither organ nor disease specific; particularly in early disease the dilutional effects of blood and extracellular fluid make the sensitivity of detection beyond that practicable. Examination of synovial fluid has the advantage of being much more specific, and with higher biomarker concentration, but in early disease synovial fluid can be difficult to obtain. To this end, it would be pertinent for future studies that analyze synovial fluid/bone/cartilage to also consider its relationship with the marker in serum/urine. A strong correlation between the two regarding the same parameter would be invaluable for the marker's clinical applicability going forward as it would allow the reliable use of a more easily accessible source. Many markers, such as VEGF and CTXII, have demonstrated this correlation which would suggest that they warrant further investigation.
The 4 groups used to stratify the biomarkers were chosen because they represent different therapeutic pathways for research. There is evidence that supports the use of biomarkers as therapeutic targets in the development of disease-modifying osteoarthritis drugs (DMOADs). Clinical trials have used bone morphogenetic protein-7, fibroblast growth factor, and β-nerve growth factor (β-NGF) as targets in an attempt to develop new OA drugs. 21 Tanezumab, a monoclonal antibody against β-NGF, reduced knee pain while walking by between 45% and 62% compared with 22% by placebo. 22 Ideally, OA would be detected before it became symptomatic so that necessary measures could be taken. However, without symptomatic osteoarthritis it is very unlikely that one would contact a clinician. Bearing in mind the relative frequency and morbidity of OA, an argument could be made for a screening program of "at risk" groups along the lines of those used to detect colorectal and breast cancer. Therefore, markers that can identify eOA patients are important for a number of reasons. Having a robust and quantitative method for classifying eOA patients would provide an adjunctive outcome measure for clinical trials to measure the efficacy of disease-modifying osteoarthritis drugs or adjunctive physical therapy. This would have significant clinical relevance in everyday practice.
IHh was studied in the SF of patients classified by the Outerbridge classification. Interestingly, the study provided evidence that IHh was elevated in eOA patients (Outerbridge 1/2) and not in the control group (Outerbridge 0) or late stage OA patients (Outerbridge 3/4). 13 If this relationship was further investigated and shown to be significant in other independent studies then it would have positive implications for diagnosing eOA. Perhaps other biomarkers may follow the same pattern as IHh and are only dysregulated during early stages of OA.
Multiple biomarker and algorithmic approaches to investigating OA have shown promise. The algorithm consisting of CP, Hyp, anti-CCP antibody, age and gender had high specificity for diagnosing eOA. 19 Using patient demographics within the algorithm is an efficient method of increasing the algorithm's predictive ability. It would therefore be interesting to evaluate the predictive ability of a combination of the single eOA biomarkers identified in the review. IHh protein and IL-8 both performed well as single biomarkers so perhaps their combination along with patient demographics would create a highly sensitive and specific algorithm. Due to the heterogeneity and complexity of the disease, it is likely that an algorithm will be a more effective method for making a diagnosis.
The issues surrounding the definition of eOA will continue to prove difficult unless addressed. A convincing argument put forward by Kraus 23 suggests that for an eOA marker to be truly effective it must represent a state of preclinical OA. Preclinical OA is the stage before OA is detectable by MRI or other sensitive imaging modalities. This is the optimum time for identification from both a clinical and research perspective as it would allow early lifestyle changes and a better understanding of the efficacy of potential DMOADs, respectively. Discovery of such a marker would require a time-consuming and likely expensive follow-up of a large cohort of people if primary OA was to be the indicator. However, using patients that have experienced a knee injury and that are likely to develop secondary OA over the next 10 to 15 years may provide a solution.
In the future, development of a universal criterion for diagnosing OA to standardize recruitment in clinical trials would be extremely helpful. A universal consensus of nomenclature will help to add strength to studies and allow results to be more easily validated. This will inevitably speed up the process of validating and qualifying biomarkers for use in clinical trials and in a clinical environment. While single marker studies using enzyme-linked immunosorbent assay (ELISA) and immunohistochemistry (IHC) are important, the novel, more sensitive discovery-type techniques, such as sequential window acquisition of all theoretical fragment ion spectra-mass spectrometry (SWATH-MS), would be well employed in hunting for significant biomarker panels.
An interesting observation from the results is the number of biomarkers investigated for each BIPEDS category. Most were investigated as burden of disease followed by diagnostic and prognostic markers in that order. In the "Hypothetical development of biomarkers" laid out by Bauer et al., 3 B, D and P categories are included in the stage  before E. To reiterate the conclusion of the 2013 ESCEO review, no biomarker has yet been sufficiently qualified to aid in clinical trials-it would seem that there is still yet to be a marker sufficiently qualified for researchers to use for this purpose. This study has presented biomarkers that have shown statistically significant results in over 10 studies and biomarkers with AUCs of over 0.9. However, there is a huge variety of parameters being used to test these biomarkers in a variety of patients. A universal agreement on the most important parameters to be investigated for each of the BIPEDS categories would surely propel biomarker research forward considerably. After nearly 2 decades of molecular biomarker research it seems that the bottleneck is coming from a lack of coordination.
The main limitation of this study is that resources were collated from two databases only. It is possible that potentially applicable studies have not been identified from the search.

Conclusion
In the past 5 years, research into biomarkers in osteoarthritis has continued to gain momentum. However, there is a lack of consensus on definition and methods of diagnosis and classification which is creating obstacles to research. A clear definition of eOA and a decision on important disease parameters will facilitate more appropriate research and allow the coalition of laboratories boasting different strengths. While many of the aims set out by the ESCEO have stipulated it clear research direction there are currently no single biomarkers that have been sufficiently validated for clinical use. Biomarker panels may provide a promising avenue for further evaluation.