Implementation of the Norwegian school meal guideline: Development and reliability of two questionnaires to measure adherence

Aims: This article reports on the development and reliability of two questionnaires that measure adherence to the Norwegian National Guideline on Food and Meals in School among primary schools and after-school services. Methods: Questionnaires for school principals and after-school leaders were developed systematically, using the following steps: (a) selection of scope, questions and adherence values; (b) face validity testing through expert review of initial drafts; (c) content validity testing through 19 cognitive interviews; (d) assessment of test–retest reliability in samples of principals (n = 54) and after-school leaders (n = 47); and (e) development of adherence indices. Results: The cognitive interviews led to substantial revisions of the draft questionnaires, increasing content validity through improved relevance and clarity. Test–retest assessment showed that Cohen’s κ ranged from −0.03 to 1.0 for principals and from −0.05 to 0.98 for after-school leaders, with 64 and 53% of values rated as ‘substantial’ or better. Percentage agreement averaged 85% among principals and 82% among after-school leaders. Intraclass correlation for the adherence index scores was 0.84 for principals and 0.91 for after-school leaders. Guideline adherence had a wide range in our samples, with an average of 71% for schools and 76% for after-school services. Conclusion: The questionnaires for measuring adherence to the national school meal guideline among primary schools and after-school services are sufficiently reliable for future use in public health evaluation and research.


Background
School food environments influence children's diets [1]. One means to improve these environments is school food guidelines, frequently shown to be effective in improving food availability and children's dietary intake [2,3]. However, low implementation of school food guidelines and policies is commonly reported [4,5]. Moreover, the lack of valid and reliable assessment tools for evaluating food environments is well documented [6][7][8], with more work needed, particularly in schools [7]. No validated tools measuring school-level adherence to a comprehensive national school food guideline have been identified in recent reviews [6][7][8].
In Norway, a revised advisory guideline for food and meals in schools was launched in autumn 2015 [9]. The guideline aims to ensure favourable eating conditions and high nutritional quality of the food and drinks on offer. Norwegian primary schools are obliged to offer after-school care services for schoolchildren in grades 1-4. The food and meal guideline for primary schools applies equally to after-school services. Its 21 recommendations cover organizational aspects of mealtimes (time to eat, supervision, physical and social environments), the nutritional quality of food and drinks on offer, food safety and environmental considerations. Most primary schools offer no food and drinks beyond subscription schemes for fruit and milk, but most afterschool services serve one daily meal [10].
School meal practices have been monitored regularly in Norway since the early 1990s through comprehensive mapping surveys issued by the Norwegian Directorate of Health [5]. These were not, however, designed to measure guideline adherence and their psychometric properties were not investigated. Furthermore, the response rates among primary school principals and after-school leaders dropped to 32% in the last surveys in 2013 [5,10], questioning the value of future similar surveys. Shorter and validated questionnaires measuring guideline adherence could potentially increase response rates and would generate valuable data for school nutrition policy making at national or municipal levels. Furthermore, psychometrically sound questionnaires could allow empirical testing of the relationship between school food environments and nutrition outcomes [7].
Guidance on comprehensive approaches to developing questionnaires, including various qualitative and quantitative methods, is available [11]. Cognitive interviewing is a method for improving the content validity of questionnaires by identifying and revising challenging questions through interviews. Interviewers explore whether the information collected reflects the intended content and revise accordingly. Wording, content and design of the questionnaires thereby improve in an iterative manner [12]. Test-retest studies assess the reproducibility of answers in questionnaires. Cohen's κ and intraclass correlation (ICC) are common reliability parameters for categorical and continuous variables [11], taking variability in the sample into account. The agreement parameter measures only the degree to which scores are identical [13]. by assessing both agreement and reliability, the questionnaires' potential use in both evaluation and research may be explored [13].
This study aimed to develop two valid and reliable, self-administered, web-based questionnaires to measure adherence to the National Guideline on Food and Meals in School among primary schools and after-school services in Norway.

Methods
The process for developing the questionnaires was guided by De Vet et al. [11] and involved both qualitative and quantitative methods ( Figure 1). The various study samples are described below. Permission for the study was granted by the Norwegian Centre for research (NSD) (ref: 52003). All participants received written information about the study, including the right to withdraw at any point. Signed consent forms were obtained from all interview participants. Test-retest participants were informed that answering the questionnaire meant consenting to take part.
Step 1: Determining scope, questions and adherence values To limit the response burden, the questionnaire for the principals contained only questions applicable to SelecƟng scope, quesƟons & adherence cut-off values • The ar�cle authors Step 1 Step Step 3   2 Step 4 Step 5 all Norwegian primary schools, irrespective of food provision. These questions thus covered organizational aspects of mealtimes, access to drinking water, subscription schemes, food safety and hygiene, and availability of unhealthy food and drinks. Similarly, the questionnaire for after-school leaders focused on nutritional quality, food availability, food safety and sustainability. existing tools were reviewed to guide selection and formulation of questions. Some relevant examples from other countries were identified [14,15], but none aimed at measuring adherence to a food and meal guideline. Some questions from previous Norwegian questionnaires were used in revised versions. For each guideline recommendation, one to seven questions were developed, all with specified cut-off values for adherence. both questionnaires were in Norwegian.
Step 2: Expert review of initial drafts To improve face validity, initial drafts were presented at a 1.5-hour workshop at the Department of Child and Adolescent Health in the Norwegian Directorate of Health and, after revision, were assessed by two experts in food and nutrition in schools.
Step 3: Qualitative pre-testing with cognitive interviews To improve content validity, individual cognitive interviews were conducted with principals and afterschool leaders in two consecutive rounds of pre-testing. The 'probing' technique was utilized, in which interviewers ask follow-up questions during an interview conducted shortly after the participant has completed the questionnaire [12]. Strategic sampling was used to ensure diversity with respect to school size, structure and urban/rural profile, and that schools were recruited from various municipalities in two selected counties, all within a 2-hour drive of Oslo. Schools were invited via a telephone call to principals, who, upon agreeing to participate, received an information letter and were asked to invite the after-school leaders. In the first round of pre-testing (pilot 1), of the 29 schools contacted in buskerud county nine agreed to participate. Two sites participated with just the principal, and two principals and one after-school leader were excluded because they had not reviewed the questionnaire before the interview. The final sample in pilot 1 consisted of seven principals and six after-school leaders. In round two of pre-testing (pilot 2), the three schools approached in Akershus county all participated with their principals and after-school leaders, yielding 19 complete interviews in total in the pre-testing.
Semi-structured interview guides were developed based on the literature [12,16]. We asked participants to note challenging parts when completing the questionnaires before the interview. During the interviews, we asked them how the questions were interpreted, to elaborate on survey responses and to provide feedback on the challenging parts. Several questions started with: 'How do you understand. . .?' and 'How do you interpret. . .?' Instead of asking about TV viewing, screen-time or reading out loud when exploring activities during the meal, we asked 'What activities, if any, take place during the meal?' The interview guides were revised after pilot 1, to focus on new and adapted questions and response options in pilot 2.
To make commenting easier, paper-based questionnaires were used in pilot 1; in addition, to test functionality, web-based surveys were used in pilot 2. Two researchers participated in all of the 19 complete interviews, one as the moderator with the other taking notes. Participants were informed about the purpose and procedures of the study and the opportunity to withdraw at any time. All participants signed the consent form and agreed to audio recording. We emphasized that our aim with the cognitive interviews was to receive honest feedback on the drafts in order to improve the questionnaires, and not to assess their schools' adherence to the guideline. After the validity testing, the principal questionnaire had 47 items and the after-school leader questionnaire had 54.

Step 4: Assessing test-retest reliability
The final pre-tested questionnaires were assessed for test-retest reliability in a nationally representative sample of schools, drawn from an official list of 2392 primary schools. Schools with fewer than 10 children (n = 78) and schools that participated in the qualitative pre-testing (n = 12) were excluded. based on the general advice of having about 50 respondents [11], knowledge of typical response rates in test-retest studies and consultation with the Oslo Centre for biostatistics and epidemiology, 21% of the 2302 remaining schools were randomly selected from each of Norway's 19 counties, totalling 483 schools. email invitations were sent to principals, who, if agreeing to participate, were asked to forward the invitation to the after-school leaders. The invitations explained the purpose of the study, including why we needed answers to the same questionnaire twice, 8-10 days apart. We explained that participation was voluntary and confidential, and that by answering the questionnaires they consented to participate. Two days after the one-week deadline to respond to the test, a reminder was sent to principals of schools where neither the principal nor the after-school leader had responded, allowing 3-4 more days to respond. To ensure voluntary participation, schools where only one participant had responded by the deadline did not receive reminders. In the retest, both principals and after-school leaders were emailed directly and reminded once if the reply was not received within a few days after the deadline.
Step 5: Index reliability and adherence levels To assess the reliability of the questionnaires as a composite score and to determine guideline adherence, a two-step scoring system was developed. First, each respondent obtained a score between 0 and 1 for each relevant recommendation, based on 1-7 questions with equal weighting. Next, these scores were summarized to equal a guideline adherence index score. Schools could reach a maximum score of 12 and after-school services of 15, based on the number of recommendations covered by each questionnaire. by dividing the index score by the number of relevant recommendations, the degree of guideline adherence was determined.

Analysis
Processing and analysis of the cognitive interviews followed a six-stage model [17]. relevant sequences of interview data were transcribed from each participant, organized by question, and then compiled for all participants. This procedure was followed in the two pilots for both questionnaires. both researchers took field notes during the interviews and filled in a structured logbook after each interview, providing contextual information.
Statistical analysis of test-retest reliability was conducted using the program IbM SPSS Statistics 24. reliability was assessed by calculating Cohen's κ for nominal variables and quadratic weighted Cohen's κ for ordinal variables [11]. The κ values can be considered almost perfect at 0.81-0.99, substantial at 0.61-0.80, moderate at 0.41-0.60, fair at 0.21-0.40, slight at 0.00-0.20 and poor if < 0 [18]. Percentage agreement was also calculated for each question. Percentage agreement is considered to be acceptable at ⩾ 70% [13]. Finally, the ICC for absolute agreement (ICC A ) was calculated to assess the reliability of the adherence index scores. ICC is considered to be acceptable at 0.70 [11].

results
The expert review workshop generated two main pieces of advice to improve validity: to reduce the overall scope, and to expect less detailed knowledge from the principals about classroom practices. The two nutrition experts found the revised questions relevant and adequate to cover the guideline's recommendations but suggested a revised order to improve the flow.
The cognitive interviews lasted around 45 minutes. As shown in Table I, they resulted in many changes to the questionnaires. More revisions were made after pilot 1 than after pilot 2, with the exception being the high number of questions that required both reformulation and new response options in the principal questionnaire after pilot 2. Most of these were minor changes, however, such as changes linked to the splitting up or merging of questions (six cases) or reordering of words in a phrase (three cases). Some questions were deleted after pilot 1 because they were perceived to be unclear or irrelevant. Two examples of questions with poor clarity were: principals' interpretation of the teachers' roles during supervision, and whether the rooms used for eating were 'physically suitable'. In the after-school services, a question on serving milk with hot meals was perceived as irrelevant because nobody did it. Some unclear phrases were also identified in pilot 2, for example the notion of 'unwritten rules' on food brought from home in the principal questionnaire, which was rephrased to 'oral communication'. Some questions were revised because they presupposed too detailed a knowledge, for example a question to principals on classroom screen use during meals. This was revised twice before a promising solution was identified in pilot 2. In one instance, in after-school services, two rounds of rephrasing could not resolve an interpretation problem, namely that of using 'lean meat and meat products'.
The two rounds of pre-testing resulted in questionnaires with 47 questions for principals and 54 for after-school leaders. Of these, 27 and 33 questions were used to calculate the adherence indices. The remaining questions were comprised of school background, the respondent's job position and introductory inquires leading up to the adherence questions, some of which addressed reporting needs among respondents.
In the test-retest study, response rates were 19.3% (n = 93) for schools and 18.8% (n = 91) for afterschool services in the test. Of these, 58 and 52% responded to the retest, respectively, yielding a final sample for the test-retest study of 54 principals and 47 after-school leaders. both questionnaires had respondents from 18 of Norway's 19 counties. The average school size was 175 children (range 13-670), which is slightly lower than the national average of 220. The average size of the after-school service was 94 children (range 8-400). loss to retest was equally distributed geographically across the counties. Only at 14 sites did both the principal and the after-school leader respond. Average administration time in the test and retest study was 12 and 13 minutes for principals and 15 and 11.5 minutes for after-school leaders. Most respondents (80%) in each sample were principals and after-school leaders.
As shown in Table II, κ ranged from −0.03 to 1.0 for the school questionnaire. The reliability rating of the κ values was distributed as follows: 34% perfect or almost perfect, 30% substantial, 24% moderate and 6% fair. No values were rated as slight, two were slightly negative and one could not be calculated. Percentage agreement was ⩾ 70% for 80% of the items, with an average of 85% (range 54-100%). For the after-school questionnaire (Table III), κ values ranged from −0.05 to 0.98 and were rated as follows: 18% perfect or almost perfect, 35% substantial, 25% moderate and 9% fair. Two were rated as slight, two were slightly negative and one could not be calculated. Percentage agreement was ⩾ 70% for 84% of the items, with an average of 82% (range 44-100%). Table IV shows the average adherence scores per guideline recommendation, based on the answers from the two samples at test and retest. It illustrates which recommendations are covered by each questionnaire and identifies the most and least adhered to recommendations.
Among principals, the average obtained adherence index score was 8.

Discussion
This article reports on a comprehensive approach to develop, validate and test the reliability of two questionnaires for measuring adherence to the national school meal guideline in Norway. Cognitive interviews with the target groups increased content validity through improved relevance, wording and user friendliness. The test-retest study demonstrated acceptable reliability for both questionnaires: most items obtained substantial or better κ values, > 80% of items obtained a percentage agreement of ⩾ 70%, and both adherence indices obtained an ICC A > 0.80.
Although some question the extent to which cognitive interviewing may improve validity [12], others suggest that, by identifying faults and improving user friendliness of questionnaires, the method leads to fewer measurement errors and lower response burden [19]. We believe that the types of revisions resulting from our cognitive interviews, coupled with an administration time in the range of 12-15 minutes, provide evidence of increased content validity.
In reliability assessment, reporting both κ values and percentage agreement is recommended [20]. However, De Vet et al. [13] also describe different implications for use of the two parameters; evaluation questionnaires need good agreement and discriminatory questionnaires need good reliability. This is because only κ considers variability, which is important for questionnaires designed to differentiate between units in the sample. In evaluations,    measurement error, but not variability, is very important [13]. The κ values demonstrated a large range, reported also in similar studies [14,15,21]. As κ values are heavily affected by skewed prevalence [11], they may be very low despite high percentage agreement. Our results illustrated this: across the two questionnaires only three κ values affected by skewed distributions had an agreement of < 90%. As our questionnaires measure adherence to a guideline that schools should already be implementing, a high number of compliant answers, and thus skewed distribution, are expected. In addition, many questions had few response options, which may reduce the κ values [22]. The few items obtaining both low κ values and agreement should be revised before future use. The higher ICC A for after-school services may reflect the larger number of items in that index, because κ values were slightly better for the school questionnaire. Overall, reliability assessment supports future use of the questionnaires in both research and evaluation.
The main strengths of the present study include the substantial involvement of the target groups in improving content validity, and the relatively large and nationally representative sample of respondents in the test-retest study. A recent review [23] confirms our contention that no prior study has tested the reliability of questionnaires to assess the degree of adherence to a comprehensive national school food guideline. Furthermore, the wide range of adherence levels demonstrates the questionnaires' potential use in research with discriminative purposes.
The study also has several limitations. First, although cognitive interviewing improved content validity of the questionnaires, additional methods, such as criterion validation through observation, would have assessed validity more robustly. This could have uncovered possible social desirability bias and investigated the extent to which school leaders have sufficient knowledge about classroom practices. Second, although test-retest reliability is essential for new questionnaires [6], an interrater assessment would have been particularly valuable in the absence of a criterion validation. Third, the response rate in the test-retest study was low. However, the repeatability of answers is more important than a representative sample in test-retest studies, as long as the respondents are similar to the intended target group. Finally, the questionnaires for the schools and the after-school services covered only 12 and 15 of the 21 recommendations and therefore did not measure adherence to the entire guideline. Future studies should look at associations between the two adherence indices at each site and between adherence and the school's socioeconomic profile.

conclusion
The results show that the new questionnaires for measuring adherence to the Norwegian National School Meal Guideline are concise, relevant and user friendly, and sufficiently reliable for use in both research and evaluation. Although cognitive interviewing increased the content validity of the questionnaires, firm conclusions about the overall validity could not be drawn. Table IV. Average score for each guideline recommendation in test and retest of the school and after-school questionnaires to assess adherence to the Norwegian National Guideline on Food and Meals in School [9].

The guideline's 21 recommendations
Average score per recommendation