The reliability of pressure pain threshold in individuals with low back or neck pain: a systematic review

Background and Objective Low-back and neck pain affect a great number of individuals worldwide. The pressure pain threshold has the potential to be a useful quantitative measure of mechanical pain in a clinical setting, if it proves to be reliable in this population. The objectives of this systematic review are to: (1) analyze the literature evaluating the reliability of pressure pain threshold (PPT) measurements in the assessment of neck and low-back pain, (2) summarize the evidence from these studies, and (3) characterize the limitations of PPT measurement. Databases and Data Treatment Relevant literature from PubMed and the Web of Science electronic databases were screened in a 3-step process according to inclusion/exclusion criteria. Relevant studies were assessed for risk of bias using the Quality Appraisal of Reliability Studies (QAREL) tool, and results of all studies were summarized and tabulated. Results Of 922 citations identified, 11 studies were deemed relevant for critical appraisal, and 8 studies were deemed to have low risk-of bias. Intra-rater reliability, reported in all studies (n = 637) and inter-rater reliability, reported in 2 studies (n = 200) were consistently reported to be good to excellent (ICC 0.75–0.99 and ICC 0.81–0.90, respectively). Studies were also found to have significant variation in PPT measurement procedures. Conclusions Though intra- and inter-rater reliability was found to be high in all studies, the variation in PPT measurement protocols could affect validity and absolute reliability. As such, it is recommended that standard guidelines be developed for clinical use.


Introduction
Low-back and neck pain account for 75% of the total years lived with disability caused by musculoskeletal disease, 1 and are challenging to treat even with combined therapies. 2,3Enhancing diagnosis and treatment of neck and back pain is crucial to improving quality of life for countless individuals, ultimately reducing the financial burden these conditions pose to society. 4 Pain assessment tools enable clinicians to monitor the progression of these conditions.Since no valid and reliable biomarker for pain exists, most currently available tools rely on qualitative measures. 5Selfassessments, for example, are useful for their simplicity, but their subjective nature can introduce error. 6he need to shift pain assessment to a quantitative measure is crucial.
Pressure pain threshold (PPT), the point at which a pressure stimulus becomes painful, 7 serves as a quantitative measure of pain. 8PPTs are measured using an algometer, which records the pressure applied to a given area as the subject notes the stimulus as painful.The desirable cost and short duration of time required to administer PPT are ideal; however, factors inherent to its measurement introduce the potential for bias error.PPT is a psychophysical measurement relying on perceptual input from the patient and proper technique by the observer. 8The variability in rate and angle of application between care providers also affects PPT values. 9,102][13] Recent studies have provided further evidence of reliability in occupational groups such as vine-workers and officeworkers, who are at high-risk of low-back and neck pain, respectively. 14,15However, there has yet to be any risk of bias assessments conducted to establish study quality.
This novel systematic review focuses on low-back and neck pain due to their disproportionate burden on the healthcare system as compared to other areas of non-specific pain. 16This focus increases the number of stakeholders, as these conditions are managed by many different healthcare professionals. 2,3he purpose of this study is to (1) analyze the literature evaluating reliability of PPT measurements in the assessment of low-back and neck pain, (2) to summarize the evidence from these studies, and (3) to identify limitations of PPT reliability.The results of this systematic review will help to better define the clinical treatment and management landscape for these non-specific and persistent syndromes and inform future studies investigating applications of PPT for clinical use.

Methods
Studies were included with participants over 18 years of age presenting with pain affecting the low-back and neck.The anatomical back region is defined as the posterior aspect of the trunk inferior to the neck and superior to the gluteal region. 17The anatomical neck region connects the base of the skull to the torso and consists of the cervical portion of the spine. 18Since the aim of this study is to address relative reliability and not absolute reliability, confounding factors due to the type of pain are being considered negligible.The use of pain as a general term also allows us to explore PPT reliability as broadly as possible and highlight any niche areas in which PPT is less reliable.

Eligibility criteria
Studies were included or excluded according to the following criteria.To be included in this systematic review, studies must have fulfilled the following criteria: (1) written in English; (2) published from January 1 st , 2000 to January 1 st , 2021; (3) published in a peerreviewed journal; (4) manuscript available in full; (5)  participants must be 18 years or older with a pain syndrome affecting the anatomical back or neck.For studies that also evaluated regions outside of the anatomical neck or back, results had to be stratified by anatomical location to be included; (6) PPT was measured using a standard algometer (manual or electronic) equipped with a rubber tip.
Studies that met any of the following criteria were excluded: (1) publication types including: books, commentaries, conference proceedings, consensus development statements, dissertations, editorials, government reports, guidelines, lectures and addresses, letters, and meeting abstracts; (2) studies containing less than 20 human participants; (3) studies with only healthy participants; (4) studies that did not contain appropriate measures of statistical agreement and did not publish data adequate enough to calculate these measures; (5) studies in which the measures of statistical agreement were only noted in the healthy control group and not those with pain symptoms.

Data sources and searches
The search strategy was developed with the aid of a librarian that specializes in information literacy and research consultation.The electronic databases PubMed and Web of Science were searched for terms relevant to PPT of the back or neck and reliability, most recently on August 23, 2022 (Table 1).The Web of Science all database search function was used, thus incorporating nine additional databases, including but not limited to BIOSIS and SciELO (Table 2).Findings were extracted to Microsoft excel, and papers were included in the study if they met the inclusion-exclusion criteria.

Study selection
Study selection was conducted in three stages (Figure 1).Stage 1 consisted of exporting search results to excel to identify duplicates.Duplicates were removed and articles were sorted into a table consisting of (1) title; (2) abstract; (3) rater evaluation.In stage 2, abstracts were screened by two raters (AB and LH) based on the inclusion/exclusion criteria, identifying papers as either relevant (Y), or irrelevant (N).Upon identification of relevant articles, we proceeded to stage 3, in which the raters revaluated the remaining articles in full.This evaluation followed the same relevant or irrelevant identification scheme.Comments were recorded for all irrelevant articles to track reasons for exclusion.At both stages, disagreements were discussed by evaluators (AB and LH) to reach consensus.Where consensus could not be met, a third independent evaluator was consulted (PN).

Assessment of risk of bias
Two reviewers (AB and LH) rated relevant studies (Table 3) using the Quality Appraisal of Reliability Studies (QAREL) checklist. 19The purpose of this tool is to standardize the evaluation of diagnostic reliability studies.Studies were then classified as low-risk or highrisk based on findings from the QAREL checklist, and high-risk studies excluded from the primary analysis.
The relevance of each question in the QAREL evaluation was assessed by AB, JS, and PN prior to conducting the evaluation.

Summary of evidence
Both raters (AB and LH) collaborated to create a summary of evidence table, which reports the data items of interest extracted from each study (Table 3).The summary includes details about (1) study design (test-retest/intra-rater, inter-rater or both); (2) sample

Analyses
The primary analysis of this review excludes those studies which were identified to have high-risk of bias.A sensitivity analysis was performed that included the findings from studies found to have high risk-of-bias, to ensure the conclusions of the study are robust to this accommodation.

Reporting compliance, protocol, and registration
This systematic review complies with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA).A separate review protocol was not prepared but is well described in the preceding section of the manuscript.This review was not registered, though no systematic reviews of PPT reliability in our target population exist to our knowledge.

Study selection
A total of 922 citations were identified through PubMed and Web of Science (full archive).Five hundred and twenty-six duplicates were removed leaving 396 citations for screening.In the primary screening stage, 368 citations were identified as ineligible, leaving 28 citations for full article screening.Seventeen (17) citations were deemed ineligible, leaving 11 citations for summary of evidence and critical appraisal.Findings from a total of n = 350 lowrisk of bias subjects, and n = 119 high risk of bias subjects are summarized in Table 4.
The results of the study selection process are thoroughly summarized in Figure 1.

Critical appraisal
Eleven (11) studies were critically appraised for risk of bias using the QAREL evaluation tool.Three studies [20][21][22] were deemed to have high-risk of bias because the characteristics and/or qualifications of the raters were unclear.Information regarding representative raters was also missing from the article by Balaguier et al., 14 however, when contacted for more information, authors clarified rater's qualifications, and raters were deemed to be representative.The remaining studies were considered to have low risk of bias.

Study characteristics
4][25][26] Two studies evaluated both testretest and inter-rater reliability. 27,28One study was conducted in each of France, Portugal, Denmark, Korea, Canada, and Finland.Two studies were conducted in Brazil.All studies were published between 2007 and 2020, and no studies were excluded due to their year of publication.One study recorded PPT from patients with low-back pain, 14 five studies assessed patients with neck pain, 23,[25][26][27][28] one study assessed patients with either low-back or neck pain, 24 and one study assessed patients with myofascial pain syndrome in the cervical spine and back regions. 15ll studies recorded PPT measures using a digital algometer; however, procedures were not consistent.The size of the rubber tip varied, with five studies using a 1 cm 2 tip (n = 5), 14,[26][27][28][29] one using a 0.5 cm 2 tip (n = 1), 25 while two groups did not state the size of the tip (n = 2). 15,23As variability in tip size remains consistent within each study, bias is prevented, and tip size did not disqualify a study.Additionally, all studies, regardless of tip size, were found to have good to excellent reliability, further supporting the conclusion that tip size did not have an impact on reliability.
Algometry was performed at a constant rate within each study (n = 8), 14,15,[23][24][25][26][27][28] but the rate at which it was applied between studies varied.Two studies administered pressure at a rate of 30 kPa/s (n = 2) (Ferreira et al. originally reported as 3 N/s), 14,25 another at 49 kPa/s (reported as 0.5 kg/cm 2 /s), 27 another at 50 kPa/s (n = 1), 28 while two studies administered pressure at a rate of 1 kg/cm 2 /s (98 kPa/s) (n = 2), 15,24 and the last at 100 kPa/s (originally reported as 10 N/s). 26One study did not report the rate of pressure application (n = 1) 23 but we did not exclude it as relative reliability would not be affected since the rate was consistent within the study.All studies used qualified raters of varying levels of experience, with seven studies using healthcare providers as raters (n = 7), 14,15,[23][24][25][26][27] and one using a combination of a clinician and physiotherapy students, all of which were novice raters (n = 1). 28Every rater was representative of those who typically utilize PPT in a clinical setting.
The time between measurements also varied.In studies evaluating test-retest reliability, time between measurements ranged from 30 s, 25,26 to 1 week. 27One study did not clearly report the time between measurements, but it was inferred based on the description of the test procedure that measures were repeated immediately (n = 1). 23For inter-rater reliability, the time between measurements was 3-5 min for one study, 28 and one week for another. 27The differences in rest period were not considered to be of concern if the time between measurements was consistent for every patient within a study.

Assessment of risk of bias
Studies that were considered to have low risk of bias adhered to the following criteria: (1) study objective clearly defined; (2) representative sample; (3) representative raters; (4) blinded to the findings of other raters; (5) test applied correctly; (6) appropriate statistical measurements.][25][26][27] Studies that did not clearly answer a particular question (NC) were counted as "no" for the purposes of limitation quantification.A summary of the risk of bias assessment can be found in Table 3.

Test-retest reliability
Eight studies evaluating test-retest reliability were found to have high degrees of reliability.ICCs ranged from 0.75 27 to 0.99 14 demonstrating good to excellent reliability across the back and neck. 30Minimal differences in ICCs were observed between individuals with low-back (0.86-0.99) 14,24 and neck pain (0.75-0.96) 27,28 suggesting that PPT reliability did not change based on the location of pain or measurement.While studies that used patients with back pain had a slightly larger range of variability, at the extremes their ICCs were still rated as good or excellent. 14One study utilized Cronbach's α as a statistical measure of reliability, with values ranging from 0.934 to 0.980. 15This range indicates a high degree of reliability 31 and is consistent with the observations from other studies.

Inter-rater reliability
Inter-rater reliability evaluations were only conducted in two studies addressing neck pain (n = 2). 27,28ICCs in symptomatic individuals were reported to be between 0.81 28 and 0.86. 27This indicates good to excellent reliability in the measurement of neck pain. 30gh-risk of bias findings Of the high risk of bias studies, ICCs corresponding to test-retest reliability ranged from 0.71 21 to 0.98 21 indicating a moderate to high degree of reliability. 30or two of the studies, 20,21 it was unclear whether they used the same rater for both measurements.ICCs corresponding to inter-rater reliability, clearly reported in one study, 22 were also excellent.All three studies reported good to excellent reliability in all but four measurements, [20][21][22] and their findings are summarized in Table 4.These findings support the results of the low risk of bias studies.

Discussion
Patients with low-back or neck pain contribute significantly to the total YLD. 16The burden on these patients is high and clinical outcomes remain poor. 2,3mprovements in pain assessment tools enable healthcare professionals to diagnose and treat these individuals more effectively, as their progress can be more accurately monitored.The PPT is a cost effective, clinically feasible assessment outcome with features that make it easy to adopt within a clinical setting for the management of musculoskeletal pain.This is the first systematic review to our knowledge to assess the reliability of PPT in neck and back pain subjects.The findings of this review suggest that PPT is a useful and reliable tool in clinical practice for pain assessment to monitor patient progress.This measurement can also contribute to standardization of pain assessment across providers using a psycho-physical outcome over the commonly used subjective pain scale. 8These findings also support the continued use of PPT in quantitative sensory testing, 32 a reliable tool used in the measurement of two clinically correlated syndromes: myofascial pain syndrome, 33,34 and central sensitization. 35,36This puts PPT in an adventitious position to play a larger role in the treatment and management of low-back and neck pain.

Strengths and limitations
Our systematic review had several strengths.First, the PubMed search was conducted with the aid of a specialist in information literacy to ensure our search was as broad as possible.This ensured papers that had some relevant data could still be included in the review if the results were stratified properly.Additionally, the study quality assessment was conducted using a validated guideline (QAREL) and the risk of bias assessment was guided by multiple experts in the field.Lastly, the screening steps, as well as the QAREL assessment, were conducted by two raters to minimize potential bias.
The findings of this review should be interpreted in the context of several limitations.Only two search engines were used, which leaves the possibility that unindexed low-risk papers may have been missed.Given the low degree of variability across papers which met the quality criteria, it is unlikely this has a large impact on the findings.Additionally, risk from this has been mitigated using the Web of Science extended database search which included nine additional databases.The search strategy did not specify specific neck or back syndromes, in order to identify all relevant studies.Secondly, only papers in the English language were considered.PPT reliability studies in other languages may exist and would have been excluded.Key PPT papers that were found were published in English, and the low degree of variability in the findings of listed studies suggest it is unlikely studies in another language would largely affect the findings of this study.Two articles that were found in our initial search but were excluded due to the language requirement would have been excluded as irrelevant to the research question upon further investigation.Thirdly, our search excluded studies published prior to January 1 st , 2000.This was done for practicality reasons, however, numerous studies from before 2000 also found a high degree of reliability for PPT, [37][38][39] thus any relevant articles from before 2000 are unlikely to skew our findings.Additionally, standardization of algometers technique, and the availability of standardized algometers (of which many are still available for sale) has expanded drastically, thus studies from 2000 onwards offers an added level of consistency in instrumentation.Although using the pre-determined methodology and QAREL analysis indicated a high degree of reliability, the continued exploration of inter-rater reliability given the initial findings from this study is a recommendation of the authors.Another assumption of our study was that the pain response to rate of pressure applied is linear.The rate response curve has not been established to the best of our knowledge, however, only studies that used a constant rate of pressure application were included.Many of the studies included also have small sample sizes which can introduce bias.Given the consistency of findings between studies with highly variable sample sizes, it is unlikely to have a major impact on our study, although a meta-analysis and larger scale studies are recommended in future investigations.Finally, the QAREL assessment guidelines were determined by expert opinion in addition to consulting current literature, however, there is little to no literature regarding the limitations of the risk of bias assessment that was described above.

Conclusion
The findings of this study demonstrate that PPT has a high degree of intra-and inter-rater reliability in individuals experiencing low-back or neck pain in a variety of clinical settings.Further studies describing the reliability and validity of PPT in different pain syndromes will help to further identify the optimal role of PPT in clinical settings.Additionally, given the variability across studies included in this systematic review, it is recommended that standard guidelines be developed to address (1) size and material of tip; (2) rate of pressure application; (3) angle of application, and (4) anatomical identification of measurement points.Although our systematic review has demonstrated that intra-and inter-rater reliability remains high regardless of these variations, these may affect absolute reliability and validity.We also recommend further exploration of inter-rater reliability of PPT given the high degree of reliability initially demonstrated.Future studies should investigate the validity of PPT.Standardization of PPT application will contribute to the timely and urgent priority of developing accurate and reliable biomarkers and reference standards in pain management.

Figure 1 .
Figure 1.PRISMA 2020 flow diagram for new systematic reviews which included searches of databases and registers only.

Table 1 .
List of search parameters for PubMed database search and number of results.

Table 2 .
List of search terms for web of science full archive database search and number of results.

Table 3 .
Risk of bias for scientifically admissible reliability studies based on the QAREL criteria.Y

Table 4 .
Evidence table for studies that assess the reliability of PPT measurements of the back and neck.

Table 4 .
at a rate of 40 kPa/s.A limit of 600 kPa was used.The order of testing position is randomized, however, the muscle groups are always tested in the same order.Fourmeasurements were taken at each site, with at least 5 s between measurements, and the