A Social Work Education Outcome Measure: The Evaluation Self-Efficacy Scale–II

The Evaluation Self-Efficacy (ESE) scale was designed as an outcome measure for evaluation research courses in social work. A revised version of the Social Cognitive Theory–based ESE (ESE-II) was developed and evaluated in the current study including both new and revised items. The ESE-II was evaluated in a final sample of 168 masters level students using a pretest–posttest design. Exploratory factor analysis revealed a single-factor structure underlying the 14 self-efficacy items at both assessment points. Cronbach’s alphas for the ESE-II were high at pre- and posttest. An argument underpinning content validity was developedand convergent validity was demonstrated. The ESE-II was also sensitive to change over time both at the item and scale level. The current study provides evidence supporting select psychometric properties of the ESE-II and the flexibility of self-efficacy as an outcome measure for social work education.

This body of research suggests the versatility of self-efficacy as a construct (cf. Collins, 2015). It also supports our contention that the construct of self-efficacy is a strong basis for a wide-ranging, flexible system within which to assess outcomes in social work education (and likely other fields).

Replication
The importance of replication has been emphasized for many years (e.g., Ahlgren, 1969;Bornstein, 1990;Rosenthal, 1966Rosenthal, , 1990Smith, 1970). Yet, the prevalence of replication studies in social science journals in general and social work journals in particular remains low. The logical result is that calls for greater emphasis on, and more acceptance of, replication continue to be advanced across fields (e.g., Lindsay, 2015;Makel & Plucker, 2014;Warne, 2014).
Where does the current study fall in the range of potential replications? Fabrigar and Wegener (2015) noted, [p]erhaps the categorization that has been most prevalent is the distinction between exact (often referred to as direct) and conceptual replication. The term exact replication typically refers to a study aimed at repeating as closely as possible the procedures used in a prior study. Conceptual replication is generally aimed at reproducing an effect from a prior study using different operationalizations of the independent (predictor) and/or dependent (outcome) variable(s) than in the prior study. (p. 68) The current study (compared with the original ESE studies) assessed the same construct, in the same setting, with the same course instructor, over the same pre-post time frame, using the same research design and same scale format but with revised scale contents and a different sample.

Study Purposes
The current study serves two purposes (cf. Kaplan, Brownstein, & Graham-Day, 2017). First, it extends the development of an outcome measure designed for accreditation purposes. Second, it seeks to provide faculty with a tool for curricular development activities beyond accreditation. To our knowledge, no other theoretically derived outcome measures are available that assess the outcomes of a course in the evaluation of social work practice. These ESE scales were designed to assess social work students' self-efficacy regarding their ability to carry out evaluations of social work practice (e.g., quantitative, qualitative or mixed methods evaluations of social work practice). At the time the original ESE was created, the American Evaluation Association had not yet endorsed a group of competencies that could be used for item creation (Galport & Azzam, 2017). Because the objectives of any professional field or faculty within that field are subject to change, educational outcome measures must be able to efficiently adapt to those changes and not remain arrested over time. Given that the objectives of evaluation courses have evolved, the need to revise the original ESE to more effectively meet outcome assessment criteria became apparent.
This report details the results of a study of this revised ESE scale (ESE-II). Similar to earlier research on the ESE, the current study used a single-group pretest-posttest design, but recruited a different sample of students and tested the self-efficacy scale with revised and new items. Although the ESE was revised to accommodate course changes, we expected that the ESE-II would perform very similarly to the ESE upon which it is based. More specifically, given our prior experience with self-efficacy scales in general and social work self-efficacy scales designed to assess social work educational outcomes in particular, we sought to address the following research questions and predictions.
1. What is the underlying factor structure of the 14 selfefficacy items? As the factor structure of the ESE was not previously examined, we explored the underlying structure. 2. What is the reliability of the ESE-II? Similar to the original ESE studies (range α = .94-.96), we predicted we would find high Cronbach's alphas at both pretest and posttest for the ESE-II. 3. Validity of the ESE-II: (a) Do the ESE-II items cover the domain of evaluation research in social work, thus supporting content validity? (b) Is there evidence of convergent validity regarding the ESE-II?
We sought to answer this question in two ways. First, the conceptual connection between empowerment and self-efficacy (including the use of self-efficacy as an indicator of empowerment) has a long tradition of discussion in social work (e.g., Evans, 1992;Gutiérrez, 1991;Gutiérrez & Lewis, 1999;Gutiérrez, Parsons, & Cox, 1998;Morton & Montgomery, 2012) as well as other fields (e.g., Crondahl & Karlsson, 2016). Based on our prior work with the ESE and the Social Worker Empowerment (SWE) scale (Frans, 1993), we predicted that the ESE-II would have a small to moderate positive association with the SWE (comparable with the correlations found between the ESE and the SWE in the two prior studies). Second, we included three test self-efficacy items derived from the Council on Social Work Education's (CSWE; 2008) Educational Policy and Accreditation Standards (EPAS) for social work programs in the United States. Although we did not have prior data on the associations between these three items and the ESE, we assumed the correlation between the mean of these three items and the ESE-II total scale score would be positive and substantial. 4. In that the ESE was sensitive to change in the two prior studies, we predicted that the ESE-II would perform as well for both individual items and the total scale score.

Study Design and Participants
This study used a single-group (across multiple cohorts), pretest-posttest design using a convenience sample acquired from 15 sections of a compulsory graduate course on evaluation. The sections were taught between the fall 2011 and summer 2015 semesters by the lead author in a large, urban, social work program in the Northeastern United States. From a group of 210 participants whose anonymous scores were obtained at the pretest or posttest, a total of 168 complete usable sets of both pre-and posttest responses were available for psychometric analyses (i.e., no data were imputed; each respondent completed both pretest and posttest). At both pretest and posttest, students were presented with the option of participating, the procedure was explained and then the lead author left the room, while students did or did not complete the survey to maintain anonymity. No financial or other inducements for participation were offered. This study was approved as exempt by the University Institutional Review Board.
As these were anonymous surveys, no demographic data are available for the students in this study (see below). In terms of the larger population in this school from fall 2011 to summer 2015 (on the particular campus from which this convenience sample was obtained), the following demographic data are available. Eighty-five percent of the students were female with a mean age of 28.1 (SD = 8.0). International students comprised 8.5% of this student body. Major racial/ethnic categories were Asian/Pacific Islander (11.5%), Black (12.3%), Hispanic (12.9%), and White (56.9%).

ESE-II.
The ESE and the ESE-II are both designed to assess social work students' self-efficacy regarding their ability to conduct evaluations of social work practice. They were developed following self-efficacy scale development guidelines (Bandura, 2006b). ESE instructions to participants were slightly revised for the ESE-II to enhance orientation of participants and their understanding of the scale instructions. In addition, items were modified to better reflect the complexity of the target behaviors. Next, three new items (Items 2, 4, and 9 in Table 1) were added. As can be seen in Table 1, the resulting 14-item scale has an 11-point response format (0 = cannot do at all; 50 = moderately certain can do; 100 = certain can do). The ESE-II total scale score is the mean of these 14 items.
Self-efficacy test items. Three self-efficacy test items were derived from the EPAS of CSWE (2008) and used in our Self-Efficacy Regarding Social Work Competencies Scale (Holden et al., in press). The items pertained to (a) using practice experience to inform scientific inquiry, (b) using research evidence to inform practice, and (c) social workers critically analyzing, monitoring, and evaluating interventions.

Procedure
Participants were assessed in the second and 13th session of a 14-session course. Administration was conducted by the instructor who explained that the instrument was voluntary and assured participants that he would not be in the room to monitor participation. He provided the directions for completing the instrument, answered any questions and then left the room. No demographic variables were assessed as this was an anonymous administration intended to further reduce socially desirable responding (Paulhus, 1991). We developed our approach for subject generated identification codes (i.e., Hogben numbers) before we discovered that similar ideas already existed in the literature (Honig, 1995;Yurek, Vasey, & Havens, 2008). This method allows matching of pre-and post-scores for individuals. This approach and additional details regarding the overall methodology used in these studies and justification for it are available from the first author. After each administration, data entry was carried out by a research assistant with a 10% check for errors by another research assistant.

Data-Analysis Strategy
The 14 items of the ESE-II were subjected to exploratory factor analysis (EFA) with principal axis factoring separately for pretest (n = 207) and posttest (n = 169) data to explore the factor structure of the scale. As an index of reliability, Cronbach's alpha was computed with the lowest acceptable value set at .70 (Nunnally & Bernstein, 1994). Pearson correlations were used to examine convergent validity with criteria for a small, medium and large correlation set at .10, .30, and .50, respectively (Cohen, 1988). Pre-post change scores and 95% confidence interval (CI) were computed at ESE-II item and scale level in addition to standardized mean differences (d) for dependent groups. Cohen's d values of .30,.50,and .80 were considered to reflect a small, medium, and large difference, respectively (Cohen, 1988). Analyses were conducted using SPSS 24.0.

Descriptive Statistics
All available data were used to compute descriptive statistics (Ns ranged from 169 to 210). Following Cronbach's (1963) suggested focus on individual items as well as total scale scores, the specific item scores for students' evaluation selfefficacy on the ESE-II are presented in Table 1 (note that the sample sizes reflect all available respondents with usable data for each item). The pretest item means ranged from 30.3 (i.e., create and carry out an inferential data analysis plan for your evaluation of social work practice) to 67.8 (i.e., create an effective search strategy and conduct a thorough search of electronic library based databases, and other Internet resources, to obtain the scholarly literature necessary to design your evaluation of social work practice).
The posttest item means ranged from 65.4 (i.e., create and carry out an inferential data analysis plan for your evaluation of social work practice) to 84.9 (i.e., create and carry out an evaluation of social work practice that incorporates social work values and ethics (e.g., protects the participants in the evaluation). The pre-post change scores on the ESE-II and their 95% confidence intervals are also displayed in Table 1. The three self-efficacy test items ranged from 56.2 (SD = 25.5) to 65.1 (SD = 21.1) on the pretest and from 81.2 (SD = 21.1) to 85.9 (SD = 13.4) on the posttest.

Factor Structure
Prior to performing EFA with principal axis factoring, the suitability of the data was examined. The Kaiser-Meyer-Olkin (KMO) value of .93 (>.60) and a significant (p < .001) Bartlett's Test of Sphericity supported the factorability of the correlation matrix at both assessment points. Analyses pointed toward a single-factor solution at both assessment points as indicated by a clear break after the first factor in the scree plots. The single-factor solution explained 59.8% and 58.8% of the variance in the pre and posttest, respectively. All variables loaded substantially on the single factor with loadings ranging from .46 to .88 for the pretest and from .60 to .87 for the posttest. In both cases, Item 1 (create an effective search strategy and conduct a thorough search of electronic library based databases, and other Internet resources, to obtain the scholarly literature necessary to design your evaluation of social work practice) loaded least pronounced on the factor, while Item 12 (create and carry out a group research design to evaluate the outcomes of social work practice) yielded the strongest factor loadings.

Validity
Content validity. Our argument in support of the content validity of the ESE-II is straightforward. Both the original ESE and the ESE-II use items reflecting the knowledge and skills one aims to improve in a course on evaluation in social work. For both scales, the items were directly matched to the objectives of the course in which data were obtained.

Construct validity.
In line with our predictions, we found a positive correlation between the ESE-II and the SWE that was medium in magnitude, r = .33, according to Cohen's (1988) guidelines for interpreting effect sizes. To compare the ESE-II correlation with the SWE in the current study to the correlations between the ESE and SWE obtained in two prior studies (r = .18 and r = .25), we applied the estimation approach using Cumming's (2012) ESCI software (see also Association for Psychological Science, 2017; Cumming, 2015;Cumming & Calin-Jageman, 2017). The differences between the correlations were .15, 95% CI = [−0.09, 0.4], and .08, 95% CI = [−0.14, 0.30] with 95% confidence intervals including 0, indicating that the differences were not meaningful.
The positive correlation between the ESE-II and the mean of the three test self-efficacy items derived from the CSWE's (2008) EPAS was r = .75, 95% CI = [0.68, 0.81]), clearly exceeding Cohen's (1988) threshold of .50 reflecting a large effect size.

Sensitivity to Change
Pre-post change scores for individual items ranged from 12.9 to 36.6 (see Table 1). None of the 95% CIs for the individual pre-post change scores contained the value zero, suggesting that all of the particular aspects of student's self-efficacy over the course of the semester were strengthened. At item-level the pretest-posttest differences ranged from d = .64 to d =1.59.
As presented in Table 2, the ESE-II total scale mean scores at pretest ranged from 5.7 to 99.3, in contrast to a minimum and maximum of 35.7 and 99.3 at posttest. Across students the pretest total scale mean was 45.9, 95% CI = [42.9, 48.9], and the posttest mean was 75.9, 95% CI = [74.0,77.8]. The difference between these total scale means was 30.0, 95% CI = [27.4,32.6] suggesting strengthening of students' self-efficacy over the course of the semester as the value zero was not included in the CI. In terms of Cohen's d (1988), the pretest-posttest difference was very large in magnitude (d = 1.91).

Discussion
The current study aimed at evaluating the psychometric properties of a revised tool to assess students'self-efficacy outcomes. Revision of the theoretically derived and empirically supported original ESE included rewording of directions and existing items. In addition, three new items were developed to better reflect the objectives of a revised course. Overall, findings supported the psychometric properties of the revised ESE-II evaluated in this study. EFA revealed a single-factor structure underlying the 14 self-efficacy items at the pre-and posttest assessment. The ESE-II scale also demonstrated excellent reliability at both assessment points. The argument made in the two prior studies-that the original ESE could be seen as having preliminary evidence of content validity because the items were drawn directly from the objectives of a typical evaluation course-continues to apply in the current study. Preliminary evidence supporting the construct validity of the ESE-II was also observed. Finally, the ESE-II was sensitive to change in students' selfefficacy over the course of the semester, both at the item and scale level.
We frame the implications of these findings in the context noted by the Open Science Collaboration: "[i]nnovation points out paths that are possible; replication points out paths that are likely; progress relies on both. Replication can increase certainty when findings are reproduced and promote innovation when they are not" (2015, p. 943). Encouraging results across three studies increases certainty regarding the utility of the ESE and ESE-II. In addition, it continues to appear that Cronbach's (1963) suggested focus on individual items as well as total scale scores is wise. The substantial point difference on the lowest and highest rating at pretest leads to a number of questions related to curriculum that would be more obscure if the focus was on total scale scores alone. Modifying the ESE to produce the ESE-II to accommodate course changes, emphasizes the flexibility of the construct and supports our assertion that self-efficacy may form the basis of a larger outcome assessment system.
Although these results are consistent with prior research, some limitations should be noted. The sample of students was nonrandom and drawn from a single school of social work during a limited time period in the context of a single course. Although the current study replicates and extends our prior work by revising an existing scale to better assess the outcomes of a modified evaluation course, the psychometric properties of the ESE-II should be evaluated in additional situations. For instance, Deck, Platt, and Conner's (2015) recent assessment of a service learning research course demonstrated that the ESE performed similarly to our initial ESE studies.
The representativeness of the findings may have been reduced by attrition (20% reduction of useable scores). There are a variety of reasons for attrition possible here (e.g., refusals to participate, late for class or absent at the point of administration, leave of absence, shift into a different program track or to a different campus, dropped out, confused by the identification system, lack of incentives for participation, no threat of being identified as a dropout, etc.). In comparison, our last five self-efficacy scale development studies (covering different pre-post period lengths) had attrition rates ranging from 21% to 45%. Thus, while the 20% attrition might have had an impact in this study, it falls at the lower end of attrition rates of our prior studies.
The use of self-efficacy in social work educational outcomes studies has been questioned on occasion (e.g., Drisko, 2014). As no one has challenged our earlier responses to these criticisms (Holden et al., 2017;Holden et al., 2007), we can only reiterate two key points. First, it appears some of those concerns flow from a basic misunderstanding of the construct. "Self-efficacy refers to beliefs in one's capabilities to organize and execute the courses of action required to produce given attainments" (Bandura, 1997, p. 3). This definition is not equivalent to "a belief in one's ability to learn" (Drisko, 2014, p. 421). Neither is self-efficacy equivalent to self-esteem, self-worth, locus of control, or competence. Second, although Bandura has provided a guide for measuring self-efficacy (Bandura, 2006b), a number of recent studies in social work do not cite and/or optimally utilize this instruction (e.g., Jacobson, Osteen, Sharpe, & Pastoor, 2012;Parrish, Oxhandler, Duron, Swank, & Bordnick, 2016;Tompsett, Henderson, Byrne, Mew, & Tompsett, 2017). Adherence to conceptual and operational definitions is key when researchers utilize the construct. Moreover, failing to find a relationship between self-efficacy and outcomes with no supporting psychometric evidence does not, in our view, comprise a convincing argument against the use of self-efficacy.
In conclusion, it is logical to use theoretically derived, empirically supported constructs that have performed well across investigations. The current study provides additional evidence supporting the use of self-efficacy measures in the conduct of social work educational outcome assessments.

Authors' Note
The scale developed in this study is available at no charge from the first author.

Ethical Approval
New York University: Institutional Review Board approved as exempt.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article. Note. Higher scores indicate higher levels of self-efficacy. ESE-II = Evaluation Self-Efficacy Scale-II; CI = confidence interval.