Defining Grammatical Difficulty to Make Better Choices about Corrective Feedback: A Meta-Analysis of Persian EFL Learners

Past experimental studies of corrective feedback (CF) have isolated factors like grammatical complexity, learner proficiency, and L1 one by one, carefully designing experiments that eliminate the influence of “extraneous” factors. Because each factor is actually codependent, more holistic study is needed. Eleven studies, all of which had English as a Foreign Language (EFL) learners with a Persian L1 and productive measures of speech or writing, were selected for meta-analytic examination. Results suggest that type of grammatical feature, as well as associated learner variables such as L1 similarity or proficiency, collectively influence the efficacy of different CF types. As variables jointly add to the difficulty of a grammatical feature, CF providing a kind of scaffold, in the form of a written or oral reformulation from the teacher, appears to be the most effective. As grammatical difficulty decreases, learners appear to benefit from CF in which the learner is compelled to self-repair.


Introduction
Concerning written Corrective Feedback (CF), Lee (2019a) describes an English teacher in Hong Kong who is fed up with never-ending corrections that don't improve students' editing skills, "as students continue to make mistakes, even the very same mistakes" (p. 524). Nearly every EFL instructor can sympathize with this dilemma, slaving over corrections, only to realize that their work has little to no effect. This phenomenon has led some researchers to suggest that correcting written errors is not effective at all (Truscott, 1996(Truscott, , 1999. While a great deal of research has emerged to suggest that written forms of CF are indeed effective (Bitchener et al., 2005;Chen et al., 2016;Ferris, 2006;Shintani & Ellis, 2013), controversy on how to use this pedagogical technique continues to persist. Lee (2019a) contends that problems with providing written feedback lie in a tendency to correct all errors, which does not allow for focus on specific grammatical features. Although heightened emphasis of just one target feature appears to improve student performance in writing, how a feature should be emphasized is still an issue of contention (Mao & Lee, 2020). Some studies indicate that correcting an error in writing, referred to as direct corrective feedback (DCF), has the most significant impact on student accuracy when used exclusively (Shintani et al., 2014), whereas other studies suggest that providing explicit information about a grammar error, referred to as metalinguistic corrective feedback (MCF), leads to higher gains when used along with DCF (Bitchener et al., 2005;Sheen, 2007). Lee (2019b) accurately points out that such discrepancies are the result of heavy reliance on experimental studies, which do not result in "research that can guide teachers' actual practice" (p. 1). Experimental research conducted so far does limit the adaptation of theory to practice, explaining why more qualitative research has been recommended to help bridge this gap.
Although it is true that more qualitative research is needed, reliance on experimental study does not embody the crux of the problem. Instead, the issue lies with a tendency to examine singular variables. According to a standard reductionist approach, past experimental studies have isolated just one factor by eliminating all "extraneous" variables. In one study of the English article, for example, two types of CF were examined, along with language aptitude, by separating treatments into three groups (DCF only group, DCF and MCF group, and a control group). Although results correctly isolate treatment variables, showing that DCF was most effective when provided with MCF (Sheen, 2007), the effect is very context specific. Since differences in variables like target feature (Van De Guchte et al., 2015;Varnosfadrani & Basturkmen, 2009) and learner characteristics (Ellis, 2010) are known to influence corrective feedback, the same results can only be reliably obtained when the experimental environment is replicated. Participants from this study came from various different ethnic and linguistic backgrounds. Not only is it difficult for both researchers and teachers to recreate this classroom environment to repeat the effects, combined evaluation of learners from different backgrounds sends an implicit message that "one-size-fits-all" solutions are needed. Not surprisingly, research conducted with a new target feature (the hypothetical conditional) and more linguistically uniform learners, all of whom had the Japanese L1, yielded contradictory results (Shintani et al., 2014).
Past experimental studies of CF are expertly designed, yet they attempt to isolate variables in a scientific tradition that does not accurately characterize the real world. These experimental studies cannot be adapted to the classroom, primarily because factors that impact learning are codependent. Due to a need for more holistic understanding of multiple variables, researchers have called for additional qualitative research (Lee, 2019b). Ultimately, such calls reflect a greater need for comprehensive study of learner differences, an area of research that has been underexplored (Li & Vuono, 2019). If a more holistic understanding of CF is to be obtained, discrepancies in past research must be clarified, allowing for better prediction of learner outcomes. Like qualitative research, meta-analysis may be useful in providing a more comprehensive perspective. This technique, which collates data from several experimental studies, provides new insights about how multiple variables are related. By using meta-analytic investigation to gain a new perspective of CF, educators may finally obtain the knowledge needed to adapt theory to practice.

Literature Review
Before the effect of any pedagogical technique can be fully understood, we must first comprehend the causes. Analysis of past research suggests that type of grammatical feature is one "possible contributor" to the impact of CF (Li & Vuono, 2019, p. 103). In some ways, this "possible" contribution seems to be an understatement. It equates to the idea that "what you teach is a possible contributor to what students learn." Although choices of grammatical feature would seem to have a significant impact, this idea is not clearly established through past research, which often yields variable results. Failure to concretely understand the role of grammar has compelled researchers to turn toward background variables, which focus on the mediating effects of student and teacher beliefs (Bao, 2019;Couper, 2019;Han, 2017). As in the case of earlier studies, this research does not provide the holistic perspective needed to consistently use different types of CF.
Earlier research does acknowledge that grammar type influences the effectiveness of CF. Unfortunately, this research has used overly simplistic categorization to analyze grammatical differences, thereby limiting understanding. Ferris (2006) first separated grammar into just two groups: systematic features (e.g., past regular tense or English article), which have rules that can be taught, and lexical features (e.g., the past irregular tense or collocations), which lack "teachable" rules. Although the untreatable distinction of lexical features has now been debunked (Van Beuningen et al., 2012;Wang & Jiang, 2015), oversimplification of grammatical features continues to be a problem within new studies, which consistently use a simple binary dichotomy such as simple versus complex or early versus late (Spada & Tomita, 2010;Van De Guchte et al., 2015;Varnosfadrani & Basturkmen, 2009). Modern research generally recognizes the importance of grammatical differences, yet experimental designs continue to reflect the traditional bias. A recent study of CF with metacognition, for example, examined only two features: the third person singular (-s) and possessive determiner (his/her) (Sato & Loewen, 2018). Both of these features reflect past distinctions, arbitrarily dividing grammar into systematic and lexical categories. As language teachers often realize, grammar is not so simplistic. It is semantically, morphosyntactically, and phonologically variable. By cultivating a "one-size-fits-all" paradigm for grammar, the diversity of these features and their influences on corrective feedback remains enigmatic.

Factors of Grammatical Difficulty: Morphosyntactic, Semantic, and Phonological Complexity
In reality, characteristics of grammatical features differ significantly, affecting how they are acquired. Features that are often characterized as "early," such as the irregular past, plural -s, and definite article (Varnosfadrani & Basturkmen, 2009), vary in phonological salience, regularity of form, and semantic complexity. The past irregular tense, for example, has sonorant vowel sounds, making it easier to hear, whereas the plural -s is less phonologically noticeable, often lacking a vowel (e.g., parents). Concerning regularity, the plural -s is highly systematic in form, while the irregular past has many different lexical variants, which require a more sophisticated understanding of form-meaning mappings. Concerning the definite article, it is phonologically salient and regular in form, yet has a heightened semantic complexity than the other "early" features. It may be used to signify general cultural use (e.g., the sun), immediate situational use (e.g., Don't go in there. The dog will bite you!), perceptual situational use (e.g., Pass me the salt.), and local use (e.g., the car/ the pub) (Celce-Murcia et al., 1999).
Grammatical features that are often characterized as "late," such as the indefinite articles a and an, regular past, relative clause, passive voice, and third person -s (Varnosfadrani & Basturkmen, 2009), are also more diverse than a simple binary dichotomy would suggest. Like the plural -s feature, third person -s is systematic and morphologically simple, yet challenging to hear or perceive. Unlike the plural -s, third person singular morphology links a subject with the predicate verb (e.g., He eats). This cognitive link between agent and action appears to explain why this feature is so difficult to acquire (Pienemann, 2005). In the case of the passive voice, constructions like, He was killed, include not only regular past tense features in the verb (e.g., killed), but a lexically contrived, past irregular component (e.g., was). Relative clauses require a combination of words in a syntactic order, as well as correct conjugation of its constituents. Such characteristics make the feature easy to perceive, yet morphosyntactically and semantically complex. Overall, review of grammatical features reveals that simple binary dichotomies are insufficient as a means to explain variability.
Although grammar differs in a number of ways, impact of this variability on the effectiveness of CF has never been fully investigated. Because MCF provides explicit information about grammar or meaning needed for correct use of a target feature, it may be most effective when used with grammatical features like articles, which are semantically complex. Oral forms of CF like recasts, which provide a "reformulation of all or part of a student's utterance, minus the error" (Lyster & Ranta, 1997, p. 46), may be more effective than MCF or DCF when used with phonologically complex grammatical features. As the past regular tense has only one orthographic form and meaning (ed is used to mean "past"), yet three phonological variants ("t", "d", or "id"), it may benefit more from recasts, which provide valuable phonological input. There may be times when a specific kind of feedback is needed for a target feature. Unfortunately, little research has been conducted to examine how different types of CF are affected by unique semantic, morphosyntactic, and phonological characteristics of each grammatical feature. This failure may have limited our understanding of how and when to use CF.

Additional Factors of Grammatical Difficulty: Learner Proficiency and L1
Although morphosyntactic, semantic, and phonological complexity are all important, learner characteristics like proficiency and L1 impact the acquisition of a grammatical feature. Both of these factors help to determine whether or not a target feature will be more difficult to acquire. Studies of the Processability Theory suggest that grammatical features are more easily acquired at distinct stages of proficiency, which move from morphosyntactic operations within a phrase (a noun, verb, or adjective phrase) toward more sophisticated operations that link different phrases or clauses (Pienemann, 2005). English learners first acquire SVO sentences and basic aspects of morphology like the plural -s. They then acquire aspects of do support, question inversion, and phrasal verb use, all of which reflect cognitive links between multiple noun and verb phrases. Finally, learners begin to use features such as the hypothetical conditional and embedded questions (e.g., Could I ask what the trouble is?), which require complex dependent clauses (Mohamed Salleh et al., 2020;Pienemann, 2005). Because learners become able to acquire grammatical features at a specific level of proficiency, timely introduction of CF may be a crucial determinant of efficacy. This notion is supported by past research of explicit grammar instruction, which is more effective when the target feature is just above a learner's ability level (Dyson, 2018;Dyson & Håkansson, 2017;Pienemann, 1989).
In addition to proficiency, a learner's L1 influences how difficult a grammatical feature is to acquire. This influence is revealed by a recent study, which shows that explicit focus on L1 processing routines can be used to improve accuracy and fluency in the L2 (McManus & Marsden, 2019). Research suggests that differences in L1 morphology (free vs. bound morphemes) or syntax (head-final vs. head-initial languages) have a negative impact on the process of acquisition (Luk & Shirai, 2009;Maleki, 2006;Shin, 2015). Conversely, recent meta-analysis of Korean and Chinese learners suggests that similarities of L1 linguistic structures make an L2 target feature easier to learn (Schenck, 2020;Yang, Cooc et al., 2017). While such research is insightful, a primary focus on Asian languages has limited understanding of L1 influence. Because Indo-European languages like Persian include grammatical features that are not present in languages like Korean or Chinese (e.g., an article system), further investigation may allow for comparison, leading to a more holistic perspective on the efficacy of CF.
Because grammatical difficulty cannot be isolated into a single variable, multiple factors must be simultaneously studied. The challenge posed by a specific grammatical feature can only be ascertained through examining the complexity of a grammatical feature (morphosyntactic, semantic, and phonological) and learner characteristics (learner proficiency and L1). Through comprehensive analysis of both grammatical and learner variables, difficulty posed by specific target features may finally be understood, allowing educators to predict the impact of CF techniques. Further study of grammatical difficulty may allow educators to better understand how CF will impact the learner, leading to better choices that promote acquisition.

Types of Corrective Feedback
A variety of CF types such as recasts, clarification requests, elicitation, metalinguistic feedback, indirect feedback, and direct feedback may be used to increase grammatical accuracy and foster acquisition (Lyster & Ranta, 1997;Tedick & De Gortari, 1998). While these techniques all compel a learner to correct an error, they differ according to how much information is provided. Both elicitation and clarification requests ask the learner to fix an error without providing any explicit information about the mistake (e.g., Excuse me, could you repeat that?). Metalinguistic and indirect feedback also prompt the learner to correct an error, yet they provide explicit information for support. Concerning indirect feedback, errors are underlined or highlighted. Metalinguistic feedback provides explicit feedback in two main forms, through coding errors (within writing) or providing grammatical hints without a correction (e.g., Should you use the past tense or present tense in that sentence?) (Sanavi & Nemati, 2014).
While clarification requests and elicitation differ from metalinguistic and indirect feedback in the degree to which explicit information is provided, all of the techniques compel a learner to fix a grammatical error without providing the correct form (see Table 1). It is the learner that must reformulate their speech or writing. In contrast to these types of CF, which are collectively called prompts, techniques such as recasts and direct feedback rely on the instructor to reformulate an error. Recasts, which generally provide a correction orally, do not explicitly suggest that a learner has made an error, whereas direct feedback, which is usually delivered in writing, provides information that can be explicitly processed and reviewed. Although recasts are implicit and direct feedback is explicit, both techniques provide a revised and correct form to students. Thus, they are termed reformulations (see Table 1).
There are four different characteristics of corrective feedback: prompt, recast, implicit, and explicit (see Table 1). Each one of these characteristics may influence efficacy of a CF technique. In the case of explicit CF, increased conscious awareness can help the learner, providing information that promotes error correction. At the same time, increased conscious awareness also increases cognitive load, which may decrease accuracy. If a learner is attempting to use a difficult grammatical feature (as determined by proficiency level and L1), this learner's ability to process will decline. Without providing CF that is manageable, learners may become "unable to process feedback effectively and may experience something akin to the learning breakdown predicted by cognitive load theory" (Hartshorn et al., 2010, p. 88). In the case of implicit prompts, no real assistance to correct an error is given, leaving a learner with the highest burden. Although different forms of CF may vary in the cognitive load they place on the learner, effects of these differences have not yet been studied. Concerning written corrective feedback, for example, the links between working memory (cognitive load) and CF type are currently unknown (Li & Roshan, 2019).
In addition to traits outlined in Table 1, medium of CF delivery, either written or oral, must also be considered. As pointed out by Li and Vuono (2019), all written forms of feedback are explicit since a learner is able to view grammatical emphasis over and over. Oral prompts and reformulations, however, do not provide the same explicit scaffold. They force a learner to process an error in real time. In some way, oral input provided by recasts can serve as a scaffold for phonologically challenging grammatical features. This view is supported through research of structured input, which has been shown to be useful for the past regular -ed, morphology that is phonologically complex, yet semantically and syntactically simple (Benati & Angelovska, 2015;Benati & Batziou, 2019).
Grammatical complexity, proficiency, L1, and CF type all exert an influence on the process of grammatical acquisition. Past studies have tried to isolate these variables one by one, carefully designing experiments that eliminate the influence of "extraneous" factors. Because each factor is actually codependent, discreet isolation is impossible. To better clarify relationships between multiple influences, more comprehensive study is needed. Such study will finally provide the holistic perspective needed to adapt theory to practice. Essentially, understanding interconnected influences of grammar acquisition will make outcomes of CF predictable, thereby giving educators the knowledge needed to make important choices about how and when to use such techniques.

Research Questions
To provide a more holistic understanding of the multiple influences that impact CF, the following questions were posed: 1. Which kind of CF is most effective for promoting accuracy in the production of speech or writing? 2. Does the efficacy of CF differ based upon the target feature? What types of CF are most effective with each grammatical feature?

Method
The present study was designed to examine the impact of both grammatical (semantic, morphosyntactic, and phonological) and learner (proficiency and L1) characteristics on the efficacy of CF. In order to obtain data for analysis, a search was conducted for experimental studies which included information about the target feature, learner proficiency, and L1. To avoid confusion over linguistic variability of participants, studies had to include learners with only one L1, Persian (Farsi). After studies were located and statistically analyzed, results were then compared to a previous meta-analysis of CF which included only Korean learners (Schenck, 2020). Google scholar was used to locate all research for initial review. The database includes not only journals from Scopus and Web of Science, but sources outside these two databases, making it a comprehensive resource for existing research. The database was systematically searched by first using general search terms Iranian and feedback. Following this, these search terms were used with specific grammatical features (plural, past tense, past regular, past irregular, passive, third person, questions, article, definite article, indefinite article, phrasal verb, verb particle, conditional) and feedback types (feedback, recasts, indirect feedback, direct feedback, metalinguistic feedback, prompts).
Explicit (conscious) knowledge of a grammatical feature was not examined in this study, primarily because it does not always equate to increased accuracy in natural speech or writing. To help ensure that implicit (unconscious) knowledge was being examined, studies had to include pretest and posttest measures with the following qualities for the measurement of implicit knowledge: the communication of ideas, not rules; a focus on meaning not form; and the avoidance of using metalanguage (Ellis, 2009). Studies that used oral imitation were also included in the meta-analysis. Being an oral measure, pressure is placed on the learner for an immediate response, which prevents conscious correction. All of the tests required either spoken or written production. Studies which used multiple choice or fill-in-the-blank instruments were excluded.
In order to be considered for the present meta-analysis, studies needed to include the following: 1. A type of CF (including time for treatment and methods of delivery) 2. Pretest and posttest measures of production (either oral or written) 3. Information about the type of grammatical feature targeted 4. Information about proficiency 5. Participants that had a Persian (Farsi) L1 In addition to these criteria, all of the studies were reviewed for quality, ensuring the selection of studies with sound experimental methods which included adequate description for replication. All of the studies were composed in English and most had from one to four treatments, with the exception of Davaribina and Hossein Karimi (2014), Ghoorchaei and Gharanjik (2020), and Zarei et al. (2018), which received from 6 to 15 sessions of treatment (see Appendix A for more information). The length of treatment session was qualitatively considered as effect size was analyzed. Potential influences of treatment length were considered in the limitations section. Of the 374 files discovered, only 11 met all the criteria necessary for analysis (see Appendix A for more information on these studies).

Corrective Feedback
Types of CF were analyzed according to four categories suggested in a study by Lyster and Saito (2010): implicit prompts (clarification requests and elicitation), implicit reformulations (recasts), explicit prompts (metalinguistic and indirect feedback), and explicit reformulations (direct feedback). Specific types of feedback were charted based upon whether or not they were written or orally delivered. As suggested by Li and Vuono (2019), all written forms of feedback carry an explicit quality, making this separation necessary.

Grammatical Features
After studies were collected and examined, there was a total of 37 treatment groups from 11 studies (see Appendix A). The following grammatical features were available for analysis: indefinite and definite articles (four studies with fourteen groups), regular past (one study, four groups), irregular past (one study, four groups), passive voice (one study, four groups), wh questions (one study, two groups), future tenses (one study, two groups), and conditional (one study, one group). A study by Zarei et al. (2018) combined past (past simple and past progressive) and future tenses ("be going to" and present progressive for future) together for treatment of four groups, while the study by Davaribina and Hossein Karimi (2014) combined articles with the past simple tense for treatment of two groups. These studies were included to show how heightened complexity of multiple features impacts CF.

Proficiency Level
Studies selected for meta-analysis included the following proficiency levels:  Rassaei (2013) included only information about prior English instruction without a proficiency designation, whereas a study by Khezrlou (2019) assessed learners at the CEFR B2 level, which may be considered to be upper intermediate. Because proficiency levels were determined via several different measures, results of any comparison between studies must be interpreted with caution. Liu and Brown (2015) have cited that assessments of proficiency used in past studies are often unsystematic or neglected. Despite such methodological issues, consideration of proficiency level from past studies may help in identifying some key relationships with other variables (CF type, grammar type, and L1), which will heighten our understanding of how and when to use CF.

L1 Similarity
Effect sizes were analyzed based on an English target feature's similarity or dissimilarity to the Persian L1. Persian and English have many grammatical similarities, yet notable differences also exist, which may complicate the effectiveness of CF instruction. The two languages were systematically compared by examining the following traits: presence/ absence of a target feature; free, bound, or lexical morphological attributes; and word order (head initial vs. head final). Differences in these characteristics have been shown to influence the process of acquisition (Luk & Shirai, 2009;Maleki, 2006;Shin, 2015). Unlike English, which is an SVO language, Persian is an SOV language. Questions use the same SOV word order (subject, interrogative pronoun, and verb) as in the following example:

Is what this
The above question, which is read from right to left ("This what is?"), uses declarative sentence structure, without any inversion. There is also no do support in Persian. Instead of changing word order, intonation is used to suggest a question (North Carolina State University, 2013, p. 5).
Concerning verb tense, Persian uses a past tense verb that is completely different from the present form (similar to lexical past in English). To correctly use the past tense, a morphological ending that reflects the subject (I, You, they, etc.) must also be added, with the exception of third person, which does not require an ending (North Carolina State University, 2013, p. 9). Like the lexical past tense, "the present stem has very little or no resemblance to the infinitive form of the verb and needs to therefore be learned through repeated exposure to the print" (North Carolina State University, p. 19). Concerning the future tense, it "is used almost exactly like the English future tense; the only difference being that it is also very common in Persian to use the present tense for expressing future actions" (Mazdeh, 2013, para. 1). The future tense is constructed by using the present simple tense form of "to want" ‫)خواستن(‬ with a past tense stem of the main verb. This structure somewhat resembles English structures that use an auxiliary "going to" with a consistent form of main verb ("to eat"), while verb forms are lexically different (Mazdeh, 2013).
Concerning conditionals, the if structure is mainly initial in imperative and declarative statements (Abdollahi-Guilani et al., 2012). The Persian word for if ‫)اگر(‬ is generally used at the beginning of the conditional clause, followed by a main clause which uses the future tense. Despite Persian being a head-final language, the if marker appears at the beginning of the conditional clause, as in English. Grammar structure is also very similar to English. Both English and Persian have an article system. Unlike English, however, nouns do not carry a definite morpheme as in the English "the." Instead, definiteness is implied through the lack of an indefinite article. Within colloquial speech, a bound morpheme (-e) may be attached at the end of a noun (in accordance with the head-final nature of the language) to signal definiteness (Momenzade & Youhanaee, 2014).
The final feature analyzed within this study, the passive, has some notable dissimilarities with English. The feature also exists in Persian, yet the auxiliary verb appears at the end of the sentence as in "apples ‫)سیب(‬ washed ‫)شسته(‬ are ‫")شوند(‬ (Li & Roshan, 2019, p. 5).

Effect Size
To calculate effect size, pretest scores (M2), posttest scores (M1), and standard deviations (SD2 and SD1) were inserted into Cohen's d formula for effect size (Spada & Tomita, 2010, p. 307): Initially, calculating Cohen's d by comparing posttests of treatment groups with those of control groups was considered. Four of the studies, however, did not use a control group (they studied different types of treatment). Using the pretest/ posttest calculation allows for inclusion of more grammatical features and provides an accurate measure of improvement for each treatment group. While pretest/posttest scores were used for this study, calculation of Cohen's d using posttests from treatment and control groups was performed to the check results. Of the studies with control groups, the Cohen's d using posttests from treatment and control groups resembled the Cohen's d using pretest and posttest scores. Effect sizes for grammatical features and CF types differed in nearly the same proportions for both calculations.
After effect sizes were calculated, they were graphically charted according to grammatical feature and type of CF. Since the meta-analysis was limited to only 11 studies, there was not enough data to collate effect sizes based on L1 differences or proficiency. Instead, these factors were qualitatively considered for each grammatical feature by looking at characteristics of individual research studies responsible for the effect size. Potential differences like treatment length were also considered. Such qualitative inquiry helped to ensure that complex interaction between factors was identified, thereby providing a more holistic perspective of CF. Such analysis also helped to address issues of heterogeneity between studies by acknowledging and interpreting different influences from individual studies. Following analysis, potential problems with meta-analytic comparison of studies was also considered, along with a need for methodological changes within future research.

Results
Contrary to prior meta-analysis suggesting higher gains for explicit emphasis of grammar (Spada & Tomita, 2010), implicit prompts (clarification requests) had the highest overall effect size (d = 3.40). Explicit forms of written DCF (d = 2.98), oral MFC (d = 2.26), written indirect feedback (d = 1.80), and written MFC (d = 1.47) had the next highest effect sizes. Finally, implicit recasts (d = 1.50) had the lowest effect sizes (see Appendix B for more information). Overall, written forms of CF (d = 2.21) had a slightly larger effect size than oral forms of CF (d = 2.09). This finding may be explained by the characteristics of written CF, which is explicit and more highly scaffolded (learners have time to process and correct an error). While all of the collated results are interesting, separation of analysis by target feature and learner characteristics (proficiency and L1) suggests that general claims about the efficacy of CF are misleading (see Figure 1). Context-specific variables concerning grammar and learner background must be considered before a CF technique can be labelled either effective or ineffective.
Effect sizes for CF varied considerably based on the target feature (see Appendix C). This variance appears to be explained by grammatical difficulty, as defined by characteristics of the target feature (morphosyntactic, semantic, and phonological) and learners (proficiency and L1). To illustrate this point, the overall effect size for clarification requests may be examined. This statistic is based upon just one study by Khezrlou (2019). On the surface, it suggests that Persian learners benefit more from implicit prompts, which parallels findings in an experimental study of Chinese learners (Yang & Lyster, 2010). However, the broad claim that implicit prompts are more effective cannot be asserted, primarily because the Persian and Chinese learners in both studies received CF for the same grammatical feature (past simple tense). Additionally, participants within both studies appear to have had a larger degree of English proficiency than the average learner. Whereas the Chinese participants were sophomores majoring in English language and literature, Persian participants were learners from a private institute who scored at the B2 level of the Common European Framework of Reference (CEFR). Using implicit prompts at a higher proficiency level for the past simple tense may explain the results. For Korean learners, results were reversed, with recasts revealing a larger effect size (Schenck, 2020). Learners in these studies, however, were deemed to be pre-intermediate or low-level (Cho, 2012;Kim & Cho, 2017). Perhaps the cognitive burden placed on low level learners by implicit prompts explains the lower performance for Korean learners.
For studies which collectively examined past or future tenses, recasts had the highest effect sizes (see Figure 1). Like prior research conducted in Korean contexts, Persian studies of past verb tenses used low intermediate and preintermediate learners (Hamtaei et al., 2015;Zarei et al., 2018). Replication of results among low-level learners, who are more heavily influenced by recasts, suggests that proficiency influences what type of CF is best for a target feature. Being at lower proficiency levels, learners may have more difficulty self-repairing, explaining why a scaffolded reformulation is needed. Oral MFC had a sizeable effect size, yet less so than recasts. Oral MFC was also least effective for the past tense, which has a systematic morphological element (-ed) that does not exist in Persian. L1/L2 dissimilarity may have limited the efficacy of oral MFC, while simultaneously promoting the impact of recasts, which provide valuable input concerning three phonological variants. Perhaps oral MCF would be more effective at a slightly higher proficiency level when challenges posed by tenses are less substantial. Overall, results of CF used with verb tenses suggest that both proficiency and L1 similarity (or dissimilarity) define grammatical difficulty, thereby influencing the efficacy of different types of feedback.
Regarding the conditional feature, effect size for metalinguistic feedback was lower than that used for other grammatical features (d = 1.27). Students in the Persian study examined (Ghoorchaei & Gharanjik, 2020) were deemed to be beginner learners, which may explain why this more morphosyntactically and semantically complex feature had a lower effect size. For Korean learners, effect size of treatment for the conditional feature was over double (d = 2.67). Participants in this study, however, were at a higher, intermediate proficiency level, which may explain the findings (Kim, 2019). Using metalinguistic feedback for the conditional feature may be more appropriate at an intermediate level. The head-initial characteristic of the conditional clause in Persian is also similar to that in English, whereas Korean has a head-final marker. Metalinguistic emphasis of this L1/ L2 difference may be more necessary in Korean, yet not in Persian, explaining the findings.
Effect size for the article (a and the) reveals that metalinguistic feedback is superior to recasts for Persian learners, who were at intermediate levels of proficiency (see Appendix A). This finding agrees with other studies conducted in Korea, which suggest that metalinguistic feedback may provide information that assists the learner to understand more complex semantic relationships associated with this feature (Schenck, 2020). At the same time, the difference between feedback types is small (difference of 0.17) compared to that of Korean learners (difference of 0.60), which suggests that recasts are more effective with Persian learners. Whereas the Persian language has an article system, Korean does not. A relatively large effect for recasts may reflect the presence of an article system in the L1, which promotes acquisition for the Persian learner through a transfer effect. For Korean learners, the article system does not exist, suggesting that implicit techniques may not be enough to help learners recognize an error.
Metalinguistic learning was highly effective for wh questions, revealing a much larger effect size than recasts for the intermediate learners that were studied (Figure 1). The larger difference in effect between these two feedback types (difference of 1.28) may reflect both grammar type and proficiency level. According to the Processability Theory, wh questions are acquired at intermediate stages, before more complex clauses (e.g., conditionals). Learners in the study of wh questions (Rassaei, 2014), being at an intermediate level, may have been cognitively ready to benefit from explicit emphasis of the target feature. The use of an explicit prompt (MFC) may help learners at the right proficiency to recognize differences between the L2 and Persian.
When both articles and simple past were emphasized within one study (Davaribina & Hossein Karimi, 2014), more highly scaffolded (yet explicit) DCF was more useful than indirect corrective feedback. In another study (Li & Roshan, 2019, p. 5), DCF was also more useful than MFC. DCF had an overall higher impact than both MFC and indirect corrective feedback, both of which are explicit prompts. The scope and complexity of target features may explain why scaffolded reformulations, in the form of DCF, may have been more effective. Emphasizing both articles and the simple past increases the scope, thereby making the target of emphasis more complex. The passive structure is also highly complex, including not only L1/L2 differences in word order, but morphological and lexical elements that increase scope and complexity. DCF provides a written reformulation that lessens pressure, allowing learners time to examine challenging target features.
Results suggest that type of grammatical feature, as well as associated learner variables such as L1 similarity or proficiency, influence the efficacy of different CF types. Low levels of proficiency, L1/L2 disparities, or heightened scope of grammatical emphasis all seem to increase the difficulty of acquiring a grammatical feature. As complexity increases, learners appear to benefit from CF that serves as a scaffold (both implicit and explicit reformulations). Recasts and DCF are more effective for intermediate learners when used with target features that are larger in scope (e.g., combined verb tenses/combined article and past tense) or more complex (e.g., passive tense). In contrast, when a grammatical feature is easier for the learner (as determined by proficiency and L1 similarity), prompts appear to be more effective. MFC yielded higher effect sizes when used with both English articles and wh questions. These features are easier for intermediate learners to acquire than the conditional structure, which requires creation and manipulation of larger clauses (according to the Processability Theory). With this in mind, it is not surprising that MFC for the conditional feature yields lower effect sizes with elementary learners. Implicit prompts (clarification requests) were most effective when targeting the past simple tenses. Past regular and irregular tenses are semantically, morphosyntactically, and phonologically less complex than other features like the English article, wh questions, or conditionals. The past regular tense is more systematic in form (only -ed) than the more semantically complex article with three variants (and, a, or the). The past irregular tense is similar to Persian (uses a lexical variant of the present tense for each verb), explaining why implicit prompts were even more effective for this feature. Overall, grammatical features that are easier for a learner to acquire appear to benefit more from prompts.
Results of this study are intriguing. They suggest that grammatical difficulty determines the effectiveness of CF type. Although further research is needed, a relationship between grammar and CF appears to have emerged (see Table 2).
As complexity increases (determined by grammatical characteristics and learner background), more scaffolded forms of CF (explicit feedback or reformulation) may be needed. Conversely, when the scope or complexity of a grammatical feature decreases, less scaffolded forms of CF (implicit feedback or prompts) may be required. Although assertions about CF are supported by evidence obtained from the present meta-analysis, along with cross-linguistic comparison of both Chinese and Korean learners, more research is needed.

Conclusion
Results of the present study reveal that different types of CF cannot be understood without grasping the difficulty of a target feature, which is defined by qualities of the feature itself, along with characteristics of the learner (proficiency level and L1). Because traditional experimental methods isolate these variables, relationships are effectively hidden, limiting our understanding of how to use CF in realistic contexts.
Through meta-analysis, relationships between multiple influences of CF become clearer. As grammatical difficulty increases, recasts and direct feedback appear more useful as a kind of scaffold. Learners are freed from the burden of reformulating an error on their own. In cases where grammatical features of the L2 differ from those in the L1, explicit techniques appear to be effective. Explicit focus on differences appears to have a large impact. Finally, learners at a proficiency level that is high enough to focus on a specific target feature (meaning they have adequate cognitive resources) may benefit from prompts, which place a burden on the learner to reformulate an utterance.

Limitations and the Need for Future Research
While information obtained from this meta-analysis was insightful, experimental research which can be used for comprehensive analysis of CF is limited. Very few studies include all of the variables needed to assess grammatical difficulty (characteristics and scope of a grammatical feature; L1 similarity or dissimilarity; and learner proficiency). In addition to limitations associated with assessment of grammatical difficulty, studies do not often examine a range of different CF types and grammatical features. Concerning implicit prompts (clarification requests), for example, only one study was available for meta-analysis of Persian learners. This study, like others used to examine both Chinese and Korean learners, chose the same grammatical feature (the simple past tense). The tendency to use the same grammatical features with the same types of CF has limited our understanding. In reality, all four types of CF (implicit prompts, explicit prompts, implicit reformulations, and explicit reformulations) need to be tested with a range of grammatical features, and with learners of differing L1s and proficiency levels. More comprehensive study will give us concrete information concerning how to adjust CF techniques as grammatical difficulty either increases or decreases.
Variable treatments and assessment of implicit knowledge are also problematic in past studies of CF. Regarding treatments, for example, studies that combined different grammatical features (Davaribina & Hossein Karimi, 2014;Zarei et al., 2018) also had the highest number of treatments, from 6 to 10. In this case, realizing the role of treatment length may be problematic, since other variables like grammatical  (Ghoorchaei & Gharanjik, 2020), lower effect size would suggest that grammatical feature type and proficiency level played a larger role than treatment length. No matter how many small-scale individual experimental studies are conducted, using different methods (e.g., treatment lengths, tasks, or assessments) threatens the validity of any comparison. While all effort was made to eliminate these threats to validity in the present meta-analysis, results require further experimental verification. More large-scale studies that systematically utilize and assess the impact of CF are clearly needed.
To better understand multiple variables involved in CF, Lee's (2019b) argument for more qualitative research seems to be a valid one. It would allow for consideration of multiple variables in relation to learner background. Unfortunately, this type of research suffers from the same weaknesses of past experimental studies. Results obtained from different learners cannot be collated or compared if treatments and measures are not systematically utilized. It is important to conduct studies that carefully record relevant variables for CF delivery in systematic ways. For example, an individual's proficiency level, L1, and cognitive capacity (working memory) can be carefully assessed according to the same instruments. Conducting several studies in this way will create a corpus of experimental studies, which may be systematically searched.
While problems with past experimental studies is clear (Liu & Brown, 2015), the best way to develop future research is more uncertain. Essentially, teachers need a guide to use CF techniques, which considers both grammar type and learner background. With this in mind, more research is needed to identify when each form of CF should be introduced. Through understanding the time in which more scaffolded techniques are needed (or not needed), teachers may be able to tailor instruction to learner needs, thereby maximizing the effectiveness. Far from being finished with study of CF, we have only scratched the surface. Either large-scale studies of multiple variables or small, collatable mini studies must be conducted to provide a holistic perspective that teachers can really use.