Effects of Family Literacy Programs on the Emergent Literacy Skills of Children From Low-SES Families: A Meta-Analysis

The aim of this meta-analysis was to investigate effects of family literacy programs on the emergent literacy skills of children from low socioeconomic status families (0–6) and to establish which program, sample, study, and measurement characteristics moderate program effects. Outcomes of 48 (quasi-)experimental studies covering 42 different programs revealed a medium average effect of Cohen’s d = 0.50 on immediate posttests and a marginal average effect of Cohen’s d = 0.16 on follow-up measures. Together, effects of different moderator variables indicate that children benefit from targeted programs that focus on a limited set of activities and skills and that are restricted to one (training) context. Additionally, we found larger effects in experimental studies and when researcher-developed tests were used. Our outcomes not only provide guidelines for program developers but also call for more longitudinal research that examines how positive short-term changes as a consequence of program participation can be sustained over time.

low-SES families-provide children with little literacy support due to limited resources and parental reading abilities (Niklas & Schneider, 2013;Payne et al., 1994) and that this results in low child literacy levels (Hoff, 2006). FLPs might then encourage home support in the form of activities such as shared reading, thereby stimulating children's literacy development. At the same time, several studies have shown substantial variability in how children's home literacy environments (HLEs) are shaped, not only between, but also within SES groups (Auerbach, 2001;Lynch, 2009; van Steensel, 2006). Using refined conceptualizations of the HLE, such studies suggest that low-SES families do provide opportunities for literacy development, but in different ways (González et al., 2005). Consequently, it is recommended that FLPs adhere to the resources available in these families. There are thus various possible approaches to designing FLPs (van Steensel et al., 2012), which raises the question what types of programs best accommodate low-SES families and, consequently, which approaches best serve children's literacy development.
Effects of many FLPs have been tested in (quasi-)experiments, the outcomes of which have been summarized in a series of meta-analyses (Lonigan et al., 2008;Manz et al., 2010;Mol et al., 2008;Sénéchal & Young, 2008; van Steensel et al., 2011). These meta-analyses provide information on overall intervention effects and give insight into program characteristics that influence effects. However, meta-analyses have not yet answered the question of which types of programs are beneficial for children from low-SES families, because all previous analyses of program characteristics were based on databases comprising both lower and higher SES samples. In the current meta-analysis, we focus on samples that mainly consist of children from low-SES families, attempting to identify what works for this target group. We assessed effects on children's emergent literacy skills and analyzed which program, sample, study, and measurement characteristics explain possible differences in effects.

Effects of Family Literacy Programs: Previous Meta-Analyses
A vast amount of research has shown that children's HLEs, which can be globally defined as "the environment the family provides the child to gain specific precursors of reading and spelling, and linguistic competencies" (Niklas & Schneider, 2013, p. 40), contribute to the development of important literacy skills before the start of formal schooling. The frequency of literacy-related parent-child activities such as shared reading and the quality of parent-child interactions during those activities have been found to predict both "comprehension-related" emergent literacy skills, such as vocabulary and story comprehension, and "coderelated skills," such as letter knowledge and phonological awareness (Burgess et al., 2002;Bus et al., 1995;Hammett et al., 2003;Hood et al., 2008;Leseman & De Jong, 1998;Mol & Neuman, 2014;Sénéchal & LeFevre, 2014;Sonnenschein & Munsterman, 2002). Studies have also shown that children's home literacy experiences vary with demographic variables such as SES (Bus et al., 2000;Gonzalez et al., 2017;Leseman & De Jong, 1998; van Steensel, 2006) and that the HLE partly mediates the SES effect on emergent literacy (Aikens & Barbarin, 2008;Niklas & Schneider, 2013). Together, these observations have given rise to the development of a broad array of FLPs (Wasik, 2012). The assumption is that such programs encourage parents to engage their children in stimulating activities around print and consequently contribute to various components of children's emergent literacy development.
Over the years, many FLPs have been the subject of effect studies, which have been summarized in a number of meta-analyses (Lonigan et al., 2008;Manz et al., 2010;Mol et al., 2008;Sénéchal & Young, 2008; van Steensel et al., 2011). In a synthesis of these meta-analyses, van Steensel et al. (2012) concluded that average program effects are generally positive. Meta-analyses differ, however, in their conclusions for low-SES families and other marginalized groups. On the one hand, Manz et al. (2010) and Mol et al. (2008) found marginal average effects for children from low-income families (Cohen's d = 0.16), ethnic minority families (Cohen's d = 0.14), and families that were labelled "at-risk" (Cohen's d = 0.13). These effects significantly differed from effects for children from high-income, majority, and "non-at-risk" families (Cohen's d = 0.39,0.64,and 0.53,respectively). Lonigan et al. (2008), Sénéchal and Young (2008), and van Steensel et al. (2011), on the other hand, found no such differences, with mean effects for low-SES and "at-risk" children ranging up to Cohen's d = 0.48.
One explanation for this difference is that both sets of meta-analyses focused on different activities. Manz et al. (2010) and Mol et al. (2008) only included studies of shared reading programs, most of which were based on "dialogic reading" (DR), an approach that encourages children to assume the role of storyteller . DR requires parents to engage in "higher order" interactions (e.g., make predictions or inferences), expand or elaborate children's initiations, scaffold their understanding in case of misconceptions, and gradually let children assume more responsibility. Both Manz et al. (2010) and Mol et al. (2008) question whether such programs are appropriate for the above-mentioned groups. However, whereas Mol et al. (2008) argue that DR places too high demands on parents they label at-risk, implying that these parents are missing particular skills, Manz et al. (2010) propose that such interventions are not sufficiently attuned to parents' beliefs and practices. Manz et al. therefore suggest replacing a "one-size-fits-all approach" with interventions "that allow for understanding, appreciation and incorporation of stakeholders ' [i.e., parents'] values" (p. 424). They propose to make programs more culturally valid-that is, consistent with parents' values and behaviors. Such suggestions reflect a wider movement away from a deficit view toward a perspective that builds on the knowledge and resources available in families (Irvine & Larson, 2007;Street, 1995), as elaborated in concepts such as "funds of knowledge" (González al., 2005;Moll et al., 1992) and "community culture wealth" (Yosso, 2005). The assumption is that using family knowledge and resources can play an important role in children's academic achievements, as was corroborated by Rios-Aguilar (2010). Starting from the conclusion that "one size does not fit all" (Manz et al., 2010, p. 424), an important question then is how FLPs can be effectively adapted to the families they target.
Contrary to Manz et al. (2010) and Mol et al. (2008), the other meta-analyses (Lonigan, et al., 2008;Sénéchal & Young, 2008;van Steensel et al., 2011) also included other interventions than those targeting shared reading. Possibly, this broader range of interventions included activities that were more appropriate Fikrat-Wevers et al. and, consequently, more effective for children from marginalized groups. However, none of these meta-analyses were able to draw conclusions on what types of interventions might specifically work for children from these groups. Lacking is a systematic analysis of program variables that influence the variability in intervention effects for children from marginalized groups: In all previous reviews, moderator analyses were conducted on databases that included heterogeneous samples.

Possible Moderators of Program Effects
Starting from previous meta-analyses of FLPs, we discuss which program, sample, study, and measurement characteristics might affect intervention effects for our target group. For program characteristics, we make a further distinction between characteristics that pertain to the content and nature of programs and those that pertain to their organization and delivery.

Program Characteristics: Content and Nature
Previous meta-analyses have either focused on examining effects of one specific parent-child activity, such as shared reading (Manz et al., 2010;Mol et al., 2008), or compared effects of different activity types (Sénéchal & Young, 2008;van Steensel et al., 2011). By and large, these activity types can be categorized using Sénéchal's (2006) distinction between "formal" activities, in which parents explicitly teach their children about written language (particularly, letter names and sounds), and "informal" activities, in which children learn about written language more implicitly (often, shared reading). A related distinction is based on the skills programs target: van Steensel et al. (2011) distinguished programs that focus on comprehension-related skills from those that focus on code-related skills or both. Parental preferences for activities appear to be related to SES. Various studies have suggested that low-SES (i.e., low income or low-educated) parents are less inclined to engage in informal activities such as shared reading (Curenton & Justice, 2008;DeBaryshe, 1995) and prefer formal, code-focused activities (Lynch et al., 2006;Stipek et al., 1992). This might imply that formal, codefocused programs are more effective for low-SES children, because they lead to more parent engagement. This assumption seems partly confirmed by the aforementioned, marginal effects of shared reading interventions (Manz et al., 2010;Mol et al., 2008). Other studies have, however, reported substantial variability in parental preferences within SES groups based on income or educational level (Evans et al., 2004;van Steensel, 2006). Whether formal, code-focused programs are more effective for children from low-SES families thus remains to be seen.
The distinction between activity types and between targeted skills relates to another issue: differentiation in program content. As described earlier, Manz et al. (2010) suggest that in FLPs, one size does not fit all. They argue that programs should be more culturally valid-that is, more consistent with parents' values and behaviors. This could involve employing other than print-based activities, such as talk, oral storytelling, and play (Boyce et al., 2010;Roggman et al., 2008;Van der Pluijm et al., 2019). In line with the aforementioned concepts of "funds of knowledge" and "community culture wealth," it also implies that programs make use of the knowledge and resources available in low-SES families. The question of whether a program is sufficiently attuned to participating families further refers to their home language situation. Obviously, offering program activities in a second language (L2) that parents are not proficient in, likely hampers program implementation (De la Rie et al., 2017). Encouraging parents to promote children's first language development not only acknowledges parents' "linguistic capital" (Yosso, 2005, p. 78), it might also be beneficial for children's L2 development. Previous studies have, for instance, demonstrated positive transfer from first to second language literacy skills in Spanish-and Hmong-speaking children (Goodrich & Lonigan, 2017;T. A. Roberts, 2008).
Programs might also focus on other skills than literacy alone. Apart from targeting other cognitive skills (e.g., emergent numeracy), several FLPs additionally aim to contribute to children's socioemotional development by promoting parenting skills (O'Farrelly et al., 2018;Scott et al., 2012). Such programs are based on the assumption that parental sensitivity and responsiveness are conditional for effective literacy support (Landry & Smith, 2007). An association between parenting skills and home literacy has indeed been established in previous descriptive studies: In a sample of primarily African American families with varying educational levels, Bingham et al., (2017), for instance, found that a sensitive, responsive parenting style positively predicted children's literacy experiences, which, in turn, contributed to emergent literacy development. Additionally, Bingham et al. found a small but significant negative correlation between parental education and the presence of a nonresponsive, authoritarian parenting style, and a medium positive correlation between parental education and (informal) home literacy activities.
Programs can additionally differ in whether parent-child activities at home are combined with teacher-child activities in centers/schools. Combined programs are assumed to be advantageous, because children receive a higher dosage of positive experiences and because activities in both contexts can work synergistically: Continuity in input at home and centers/schools is expected to strengthen program effects (Christenson & Sheridan, 2001;Mendez, 2010;Ramey & Ramey, 1998). In a meta-analysis of (general) early childhood interventions, Blok et al. (2005) indeed found significantly larger effects on cognitive skills of combined home-and center-based programs than of home-based-only programs. In the context of FLPs, this moderator has only been examined in the meta-analysis by van Steensel et al. (2011). Different from Blok et al. (2005), they did not find a significant difference in program effects between home/center programs and home-only programs, but their conclusion does not necessarily extend to low-SES children.
Finally, FLPs might differ in whether they include information and communication technology. Anderson et al. (2010) argue that family literacy tends to be conceptualized quite conservatively by program developers as paper-based, whereas digital devices such as touchscreen tablets and smartphones have become an intricate part of young children's lives (Neumann & Neumann, 2014). Surveys have shown that in Western societies, the availability of such devices is high among families with both higher and lower income and educational levels (Kabali et al., 2015;Rideout, 2017). Research has also shown that the presence of digital media in young children's homes can have a positive impact on their emergent literacy development (Neumann & Neumann, 2014). Particularly, the use of high-quality digital storybooks seems beneficial . A number of experiments have shown that in individual, lab-like settings, such storybooks can contribute to both children's comprehension-and code-related skills  and that these effects are present for children from both low-and high-income neighborhoods (Korat & Shamir, 2007). So far, meta-analyses have not given insight into whether digital activities are included in FLPs and whether these activities add to program effects.
In summary, we aimed to test moderator effects of the following features of program content and nature: activity type (particularly, the distinction between formal and informal activities), program focus (comprehension-and/or coderelated skills), possibilities for differentiation in program content and language, inclusion of other skills, combination with center-/school-based activities, and inclusion of digital materials.

Program Characteristics: Organization and Delivery
Two previous meta-analyses have examined whether the way parent training is organized and delivered has an impact on program effects, focusing on two variables: setting (at home and/or in centers or schools) and type of trainer (professional, paraprofessional, or both;Manz et al., 2010;van Steensel et al., 2011). Whereas van Steensel et al. (2011) found no setting effects, Manz et al. (2010) did: Programs in which training was provided in children's homes had significantly larger effects than those in which training was also delivered in centers/ schools (Cohen's d = 0.47 vs. 0.13). Although Manz et al. did not focus specifically on low-SES families, they suggest training in educational settings might have unfavorable effects for parents from these families, because of their negative previous experiences with school, possible mistrust of professionals, and practical limitations. Additionally, when trainers visit children's homes, they can give more targeted support, because they can adapt more easily to parents' personal situations and needs, which could, in turn, enhance parent engagement in program activities (Roggman et al., 2001).
Although van Steensel et al. (2011) found no moderator effect of trainer type, this variable might still be relevant for our target group. Employing paraprofessional trainers might be advantageous, because they are regularly recruited from parents' home communities. This could make parent training more effective, as such trainers likely have better insight into family backgrounds and, in the case of L2 learners, often speak the same home language as parents do (Keller & McDade, 2000). Employing paraprofessionals from home communities could also contribute to program acceptance, because the aforementioned mistrust is avoided.
Additionally, we examine whether the use of specific training techniques contributes to program effects. Previous research has suggested that low-income parents find it difficult to transfer trained program strategies to subsequent parent-child activities (Zevenbergen et al., 2018). We therefore focus on moderator effects of two training techniques that might be beneficial: modeling and guided practice. Model behavior, either provided by trainers or in video clips of parents and children engaging in high-quality interactions, has been shown to be effective in training general parenting skills and might thus help in promoting skills targeted by FLPs (Breitenstein et al., 2014). Guided practice-in the form of role-play or by providing opportunities to conduct parent-child activities with trainers present-could contribute to program effects, because it allows trainers to identify specific needs and provide parents with immediate and targeted feedback (Brown & Lee, 2017).
Finally, programs vary in dosage. Four meta-analyses tested the effects of dosage, either expressed by duration (Mol et al., 2008;Sénéchal & Young, 2008;van Steensel et al., 2011) or intensity of parent training (Manz et al., 2010;Sénéchal & Young, 2008), but in only one case a moderator effect was established: Sénéchal and Young (2008) found larger mean effects for shorter than for longer programs. For low-SES parents, effects of dosage might go two ways. On the one hand, longer, more intense programs could strengthen relationships between parents and trainers and allow trainers to fine-tune support, thus adding to parental engagement and more effective participation (Allen et al., 2007). On the other hand, high dosage programs might be too great a burden for low-SES families that are relatively often subject to stress factors such as financial problems or the absence of a partner (McElvany & van Steensel, 2009).
Summarizing, we aimed to test moderator effects of these features of program organization and delivery: setting (home and/or center or school), trainer type (professional and/or paraprofessional), use of modeling and guided practice, and dosage.

Sample Characteristics
Low SES often coincides with other characteristics. We therefore analyzed moderator effects of two additional variables. In the meta-analysis by Manz et al. (2010), smaller program effects were found for children from ethnic minority (i.e., African American, Hispanic, Asian, and "other minority") families than for children from majority (i.e., European American) families. We therefore analyzed whether immigrant status or membership of a sociocultural minority group (e.g., African Americans in the United States) were related to program effects. In addition, we examined moderator effects of whether participating families were L2 speakers. The effect of L2 status is likely related to the question whether programs accommodate languages other than the majority language (see above): It is probable that effects for L2 speakers are larger if program activities and/or training are provided in their home language. A final sample characteristic is age. In their meta-analysis of DR, Mol et al. (2008) found larger program effects for preschoolers (aged 2-3 years) than for older children (aged 4-5 years). They suggest that the type of interactive reading promoted in DR is more beneficial for preschoolers, because they are more dependent on adults for story comprehension, whereas older children might find such interactions interruptive. However, other meta-analyses found no effects of age (Sénéchal & Young, 2008;van Steensel et al., 2011).
In summary, we aimed to test moderator effects of these sample characteristics: immigrant status/sociocultural minority membership, L2 status, and age.

Study Characteristics
Two study characteristics are obvious moderators in any meta-analysis: publication source and study design. To test the occurrence of publication bias (Rothstein et al., 2006), we included both published and unpublished studies and used publication source as a moderator. From the previous meta-analyses, only Mol et al. (2008) included unpublished research, but they were unable to test the effect of publication source because of the small number of unpublished studies (k = 2). With respect to study design, we distinguished between experiments and quasi-experiments (Shadish et al., 2002). In experiments, units (here: individual children/families) are randomly assigned to experimental and control conditions, minimizing the chance that intervention effects are determined by variables other than the experimental manipulation (here: the FLP). Although nonrandomization often leads to inflated effect sizes, particularly through volunteer effects (Lipsey, 2003), none of the previous meta-analyses revealed moderator effects of this variable. Because of the recent interest in implementation quality as a factor in the effectiveness of FLPs (De la Rie et al., 2017;McElvany & van Steensel, 2009;Powell & Carey, 2012), especially for low-SES families (Manz et al., 2010;van Steensel et al., 2011), we examined whether studies provided indicators of implementation quality.
Summarizing, we aimed to test moderator effects of these study characteristics: publication source, study design, and implementation quality.

Measurement Characteristics
Similar to previous meta-analyses, we registered whether measures were administered immediately after program participation or in a delayed, followup test, and we included timing of posttest as a moderator (Sénéchal & Young, 2008;van Steensel et al., 2011). Additionally, we tested if instruments developed within the context of the study (i.e., study-specific measures) yield effects other than study-independent measures (e.g., standardized tests). For FLPs as well as for other types of literacy interventions, the former generally result in larger effects, likely because they are more attuned to the contents of the intervention (Okkinga et al., 2018;Sénéchal & Young, 2008;Swanson, 1999). Finally, we analyzed moderator effects of instrument type (test vs. observation) and language in which effect measures were administered: Because L2 learners form a substantial part of the target group of low-SES families and some programs are offered in children's home languages, it is relevant to test whether intervention effects are (additionally) realized in these languages (Anderson et al., 2017).
In summary, we aimed to test moderator effects of these measurement characteristics: timing (immediate posttest or follow-up), instrument type, and language of assessment.

Research Questions
On the basis of the considerations above, this meta-analysis attempts to answer two general research questions: Research Question 1: Do FLPs positively affect low-SES children's (comprehension-and code-related) emergent literacy skills? Research Question 2: Which program, sample, study, and measurement characteristics moderate possible program effects?

Search Strategy and Study Selection Criteria
We conducted a search in four electronic databases: Web of Science, PsycINFO (via Ovid), ERIC (via ProQuest), and Google Scholar. We combined four groups of keywords reflecting (a) the dependent variable (e.g., emergent literacy, early literacy); (b) the independent variable (e.g., program, intervention); (c) the context of the intervention (e.g., family, parents); and (d) the target group, split into age (e.g., preschoolers, kindergarteners) and demographic indicator (e.g., low SES, ethnic minority). The search syntax is in Appendix A (in the online version of the journal). To optimize the search, we compared our results to relevant studies from previous meta-analyses. Because some potentially relevant articles from those meta-analyses did not appear in our search results, we used information from the title, abstract, and keywords of those studies to add relevant thesaurus terms to our original search. The search was conducted again, and after these changes, we were able to retrieve the missing studies.
In addition to peer-reviewed articles, we included other publications (dissertations, book chapters) to create a complete overview of available information and examine publication bias. Similar to previous meta-analyses, we chose 1990 as a starting point (Manz et al., 2010;Mol et al., 2008;van Steensel et al., 2011). The search for studies published between January 1990 and April 2018 yielded 10,303 hits. A study was included if it was published in English and met these criteria: (a) it measured effects of an FLP on children's literacy skills, (b) it compared an experimental group that participated in the intervention with a control group that did not, (c) participants were young children (0-6 years), (d) most participants in the sample were characterized by low SES (see below), (e) the study provided effect sizes or information (e.g., means, standard deviations, ns, statistical tests) allowing effect sizes to be calculated. Criterion (c) was included because our focus was on emergent literacy. Mostly, 0-to 6-year-olds have not started formal reading and writing instruction yet (although they often do attend early education in preschools, nursery schools, educational daycare, or kindergarten).
Studies were excluded if the outcome measures involved skills other than literacy skills: For instance, Van den Heuvel-Panhuizen et al. (2016) tested the effects of a shared reading intervention but assessed math skills only; if no control group was present-for instance, Leung et al. (2010) evaluated the Hands-On Parent Empowerment (HOPE) program but did not include a control group; if the participants were older children-for instance, Overett and Donald (1998) tested the effects of a family-based reading intervention on Grade 4 students; and if the majority of the sample was not characterized by low SES-for instance, Chow et al. (2010) examined the effects of DR on children in high-income families. Additionally, in the case of combined family and center-/school-based interventions, we only included studies in which a combined condition was compared with a condition in which children had participated only in the center/school intervention to be able to isolate the effects of the family component. Consequently, studies in which a combined condition was compared with a no-treatment control group were not included (e.g., Brotman et al., 2013). Furthermore, studies that evaluated interventions focusing on children diagnosed with learning disabilities, speech or language impairments, and emotional or behavioral disorders (e.g., M. Y. Roberts & Kaiser, 2012) were excluded.
Similar to previous meta-analyses, we classified a sample as mainly low-SES when at least half consisted of children from families with a low socioeconomic status (Manz et al., 2010;Marulis & Neuman, 2010). In most studies (18), low SES was based on a combination of low parental educational level (defined as high school graduation or below, or as classified by the authors of the primary studies) and low family income (as classified by the authors). In other studies, low SES was based on low parental educational level only (8 studies), a combination of low educational level and eligibility for state-funded services such as Head Start (6 studies), eligibility for services only (6 studies), low occupational status (unskilled labor/unemployment; 3 studies), a combination of low educational level and low occupational status (2 studies), low income only (2 studies), living in a poor area (1 study), a combination of low education and living in a poor area (1 study), and a combination of living in a poor area and low literacy level (1 study). In most studies, low-SES children made up the major part of the sample. Thirty-two studies provided percentages; in these studies, on average 84.4% of the participants could be classified as low SES. In the other 16 cases, percentages were not given, but the authors indicated that a significant majority of the participants were low SES.
All titles and abstracts, and, if necessary, full texts, were screened by the first and second authors. This resulted in 98 potentially relevant studies, of which full texts were read. When an article was unavailable, we tried to contact the authors. For three potentially relevant articles, we were not able to retrieve the full text. We also tried to contact the authors when the statistical information provided in a paper was insufficient. This resulted in additional information for two studies. Full text reading resulted in the exclusion of another 50 studies, because on closer inspection inclusion criteria were not met: For instance, demographic information showed the low-SES criterion was not satisfied, although the authors had designated the sample as low SES (e.g., Huebner, 2000;Huebner & Meltzoff, 2005). The final sample thus consisted of 48 studies. A flowchart of the selection process is presented in Appendix B (in the online version of the journal). During literature selection, the first two authors agreed on 98.8% of their decisions. For the remaining articles, the authors discussed discrepancies and decided together whether the studies could be included.

Coding Scheme
All studies were coded according to a standardized coding scheme with the following sections: article information, program characteristics, sample characteristics, study characteristics, measurement characteristics, and program effects.

Article Information
This section included title, author name(s), and source name.

Program Characteristics
After recording the program name, we reported characteristics with respect to content and nature. For "activity type," we used the distinction between formal and informal activities. Because of its high prevalence, we divided the second category into shared reading and "other" informal activities. "Program focus" reflects whether emphasis was on comprehension-related skills, coderelated skills, or both. "Differentiation" indicates whether the intervention provided opportunities to adapt activities to the situation of families and make use of their knowledge and resources. Additionally, we coded whether the program offered materials and/or training in the home language of L2 speakers. Also, we registered whether skills other than emergent literacy were addressed, distinguishing between cognitive skills (e.g., numeracy), and socioemotional skills, with the latter category mostly referring to programs that provide parenting support. For "combined center-based," we registered whether the FLP was combined with a parallel center-or school-based program conducted by teachers. Finally, we reported whether the program included digital materials (e.g., digital storybooks).
Subsequently, we registered characteristics reflecting program organization and delivery. First of all, we registered the location of training sessions: at home, at a center/school, or in both contexts. "Trainer type" refers to the question of whether parents were trained by professionals: trainers schooled and experienced in the field of literacy training, education, and/or parent training (including teachers and researchers); paraprofessionals: trainers not (yet) fully schooled and experienced in these fields (e.g., students and trained parents); or both. Additionally, we reported whether trainers used modeling and/or guided practice to support parents. Finally, we coded program dosage by registering the number of weeks and training sessions in a program.

Sample Characteristics
We first registered demographic variables other than SES. We focused on two variables: immigrant status or membership of a sociocultural minority and L2 status. The first variable indicates whether children or parents were born in another country, or whether they were members of a specific group, such as African Americans in the United States and Xhosa in South Africa. Similar to our low-SES criterion, we classified a sample as mainly consisting of immigrant/ minority families when at least half of the sample met this criterion. The second variable indicates whether children and parents spoke another language instead of or in addition to the majority language. When families used a majority language, but this language was not the country's official language (and thus the language in education), children were also classified as L2 speakers (e.g., in Senegal, most people speak Wolof as a first language, but the official language is French). For this variable as well, a threshold of 50% was applied. In addition, we categorized age-groups, distinguishing between interventions for children aged 0 to 3 years and interventions for children aged 3 to 6 years. Other potentially relevant sample characteristics (e.g., literacy levels) were not included in the moderator analyses, because most studies did not describe the study sample in such detail.

Study Characteristics
The variable "peer-reviewed" distinguished between articles published in peer-reviewed journals and other publications (mostly dissertations and book chapters). In the case of "study design," we distinguished between experimental and quasi-experimental studies. We coded studies as having an experimental design only if individual children or families were randomly assigned to program and control conditions. Studies in which randomization took place at group level (e.g., classes) and studies in which participants were matched prior to randomization were classified as quasi-experimental. Additionally, we registered whether researchers assessed implementation quality. However, because the great variety in indicators and norms hampered the construction of an implementation quality variable, we had to exclude this information from further analyses (see also De la Rie et al., 2017). Finally, the sample sizes of both the experimental and the control group were registered.

Measurement Characteristics
We first coded whether the effect measures reflected comprehension-related skills or code-related skills. "Timing" indicates whether the measurement concerned an immediate posttest or a follow-up test. Additionally, we registered whether the measures used to evaluate program effects had been developed within the context of the study or were preexisting, study-independent measures. The variable "instrument type" distinguished between tests and observations. The latter category also includes parent and teacher reports. Finally, we registered whether literacy skills were assessed in the home language of L2 speakers.

Program Effects
We coded available statistical information to calculate effect sizes (means, standard deviations, n, t, F, etc.) and, if possible, the effect sizes (Cohen's d, η 2 , R 2 etc.) as reported by the researchers.

Coding Procedure
A first version of the coding scheme was tried out on a random sample of five studies. The first and second author coded these studies, compared their codes, and, based on their comparisons, adjusted the scheme. Subsequently, all 48 studies were coded by both authors. Interrater agreement was, on average, 86.2% (range: 65.4% to 100%). Discrepancies were discussed until consensus was reached. For "activity type," "trainer type," "duration," "number of sessions," "other skills," "modeling," and "guided practice," interrater agreement was relatively low (range 65.4% to 78.8%), usually because of lack of clarity in the information provided by the primary studies. We thus decided to consult other resources about the intervention (e.g., the website or a different article about the intervention). Sometimes, we contacted authors for additional information.

Data Analysis
Because some studies included multiple experiments, we used "experimental comparison" as the basis for our analysis (see also van Steensel et al., 2011). For instance, when a study included two samples or two treatments were compared with a control condition, these were reported as two experimental comparisons. For the analysis of overall effects, and the effects on comprehension-and coderelated skills, we calculated a weighted effect size (Cohen's d) for each experimental comparison for the concerning skills, using available statistical information. When multiple effect measures were used to assess literacy skills within an experimental comparison, we calculated the mean of the effect sizes for these measures.
If possible, we used both pre-and posttest scores of the program and control group for the calculation of effect sizes. When pretest scores were unavailable, we calculated the effect sizes based on the between-group differences in posttest scores. In some studies, means and standard deviations were not reported. In such cases, we either used effect sizes reported by the authors or calculated the effect sizes based on other available information (e.g., t, F, or p values), combined with information about the sample size.
Data analysis was conducted using Comprehensive Meta-Analysis 2.0 (Borenstein et al., 2005). All analyses were based on the random effects model, taking both within-and between-study variance into account. To explain betweenstudy variance, we conducted moderator analyses, based on subgroup analysis for categorical variables and meta-regression analysis for continuous variables. We applied Cohen's (1988) guidelines for distinguishing small (d > .20), medium (d > .50), and large (d > .80) effects.

Overview of Program Characteristics
The 48 studies included in the meta-analysis cover a total of 42 different programs. Detailed information about program characteristics per study is in Appendix C (Tables C1 and C2 in the online version of the journal). Five programs were the subject of more than one study, none of which concerned replication research. DR and HIPPY (Home Instruction Program for Preschool Youngsters) were both evaluated in four studies (DR: Chow & McBride-Chang, 2003;Lonigan & Whitehurst, 1998;Reese et al., 2010;HIPPY: Brown & Lee, 2017;Kağıtçıbașı et al., 2001;Necoechea, 2007;Van Tuijl et al., 2001). In four other studies, effects of an adapted version of DR were examined (Aram et al., 2013;Cooper et al., 2014;Ergül et al., 2016;Murray et al., 2016). Three programs were evaluated twice: PCHP (Parent-Child Home Program; Allen et al., 2007;Manz et al., 2016), PRIMER (PRoducing Infant/Mother Ethnic Readers; Cronan et al., 1996;Cronan et al., 1999), and MEES (Migrant Education Even Start; St. Clair & Jackson, 2006;St. Clair et al., 2012). The remaining programs were the subject of one study. Four articles reported the effects of more than one program. Aram and Levin (2014), for instance, evaluated effects of a writing intervention and DR. Four studies evaluated different versions of the same program. In two studies, the effects of a more intensive version with more training sessions were compared with those of a less intensive version with fewer (Cronan et al., 1996) or without any (O'Farrelly et al., 2018) training sessions. In the other two studies, the effects of an FLP with and without a center-based component were evaluated (Ergül et al., 2016;Lonigan & Whitehurst, 1998). Because these examples show that the same program can be implemented in several ways, we base the overview below on program versions rather than programs. In total, we were able to distinguish 56 different (versions of) programs.
Most programs (n = 34, 61%) combined shared reading with other activities. In 14 cases (25%), shared reading was combined with other informal activities, in 13 cases (23%) with other informal and formal activities, and in seven cases (13%) with formal activities. Nineteen programs (34%) offered only shared reading. Two programs (4%) focused only on formal activities and in one case (2%) only other informal activities than shared reading were offered. Shared reading activities were often based on DR. Examples of other informal activities were singing nursery rhymes and oral storytelling. Formal activities often involved games that trained spelling, letter knowledge, or phonological awareness.
Nearly all programs focused on comprehension-related skills or a combination of comprehension-and code-related skills (n = 28, 50%, and n = 24, 43%, respectively). Only two programs (4%) exclusively targeted code-related skills, and in two cases, the targeted types of literacy skills were not clearly mentioned. Nine interventions (16%) provided the possibility to adapt program activities to families' individual situations. This was, for instance, the case in the study by Boyce et al. (2010): Trainers developed a book together with families, based on stories told by parents, family routines, and cultural background. The book was written in the families' home language and was used during program activities. Forty-six interventions (82%) lacked the possibility to differentiate, and in one case, the program description was unclear. Most interventions did not provide materials or training in other languages (n = 34, 61%). In 14 cases (25%), both materials and training were available in other languages, in six cases (11%), only training was provided in other languages, and in two cases (4%), only materials were available in other languages. In 21 programs (37%), other skills in addition to literacy were targeted. Seven programs (13%) offered activities stimulating other cognitive skills, such as emergent numeracy. In seven other programs (13%), socioemotional skills were promoted, mainly through parenting support. Seven programs (13%) addressed both types of skills. Most interventions included only activities executed within the family (n = 49, 88%), while in seven cases (12%), home activities were combined with corresponding teacher-child activities at centers or in schools. In the study by  for instance, parents were trained in using DR at home, while professionals incorporated DR in activities at daycare centers. Only five interventions (9%) offered digital materials-for instance, digital storybooks (Korat et al., 2013).
For most programs, training sessions only took place at centers or in schools (n = 21, 38%), 17 programs (30%) only provided training at home, and in 15 cases (27%), training was provided in both contexts. In three cases (5%), parents did not receive training, because these interventions only provided materials (e.g., books, brochures). Professionals, often teachers or researchers, conducted the training sessions in 24 cases (43%). In 11 interventions (20%), only paraprofessionals (mostly parents from the community or students) carried out the training sessions, and in another 11 interventions, both types of trainers were employed. In three cases (5%), there were no trainers, and in six cases (11%), it was unclear who the trainers were. Most interventions made use of modeling and guided practice (n = 39, 70%, and n =36, 64%, respectively). On average, programs lasted 38 weeks (SD = 64.40, range = 2-352), and the average number of training sessions was 23 (SD = 36.19, range = 1-184).

Main Effects
To answer the first research question, we first calculated the overall program effects without distinguishing comprehension-and code-related skills (see also  Table 1). Analysis based on 65 experimental comparisons revealed a small, positive effect of Cohen's d = 0.41 (SE = 0.06). Subsequently, we examined the effects on comprehension-and code-related skills separately. In both cases, we found small effects: For the 51 comparisons involving comprehensionrelated skills, the mean effect size was Cohen's d = 0.40 (SE = 0.07) and for the 33 comparisons involving code-related skills, the mean effect size was Cohen's d = 0.40 as well (SE = 0.08). The results of z tests indicated that all three effect sizes deviated significantly from zero: For the overall effect sizes z Subsequently, we compared effect sizes on immediate posttests and follow-up tests (Table 1) (Lipsey & Wilson, 2001).

Moderator Effects
To answer the second research question, we performed four series of moderator analyses. We decided to only include the effect sizes of the immediate posttests, because only few studies reported follow-up results. Moreover, the available information showed large variability in the amount of time between interventions and follow-up tests (range: 6-312 weeks), making valid analyses based on followup results problematic. Table 1 provides an overview of all significant and nonsignificant moderators.

Program Characteristics
Six program characteristics moderated intervention effects. Two characteristics moderated effects on all three outcome measures: "combined center-based" and "inclusion of other skills"; two moderated effects on comprehension-related skills: "activity type" and "setting"; and two moderated effects on code-related skills: "program focus" and "L2 materials/training." The nature of each moderator effect is explained below. The variables "differentiation," "digital material," "trainer type," "modeling," and "guided practice" did not demonstrate any significant moderator effects: Program effects appeared to be unrelated to these program
Combined center-based (k overall = 48, k comprehension = 38; k code = 22). Programs without a center-based component yielded medium effects on all three outcome measures, while interventions that offered a center-based component did not demonstrate effects.
Inclusion of other skills (k overall = 48, k comprehension = 38; k code = 22). Programs only offering literacy-related activities yielded both larger overall effects and larger effects on comprehension-related skills than interventions also targeting other skills. For code-related skills, programs that only offered literacy activities and programs that offered literacy activities and other cognitive activities (e.g., emergent numeracy activities) yielded the largest effects.
Activity type (k comprehension = 38). Interventions only offering shared reading yielded the largest mean effect (medium) on comprehension-related skills. When shared reading was combined with either informal or formal activities, the effects were small, while a combination of the three types of activities did not show any effect on children's skills. A small effect was found for formal activities, but this result was only based on one study.
Setting (k comprehension = 36). Programs in which the training sessions only took place either at home or at a center/school yielded larger effects on comprehension-related skills than programs offering training activities in both contexts and programs without training.
Program focus (k code = 21). Programs focusing on code-related skills alone demonstrated larger effects on code-related skills than programs that (also) focused on comprehension-related skills.
L2 materials/training (k code = 22). When training in another language was provided, a large effect on code-related skills was reported. However, this outcome was based on only one study. Programs offering both materials and training in other languages demonstrated medium effects. Programs that did not provide materials and training in other languages and programs offering only materials in other languages showed small effects.

Sample Characteristics
In the next set of analyses, we examined the role of three sample characteristics: immigrant/minority status, L2 status, and age. The moderator analyses did not demonstrate any significant effects of these variables.

Study Characteristics
The variable "peer-reviewed" was nonsignificant, which implies there is no indication of publication bias. Research design was a significant moderator of effects on overall scores (k = 48) and comprehension-related skills (k = 38). Studies with an experimental design yielded larger effects than studies using a quasi-experimental design.

Measurement Characteristics
In the final set of analyses, we examined the effect of three measurement characteristics. First of all, we were interested in the question whether researchers used instruments developed within the context of the study or preexisting, studyindependent measures that had not been developed for the study (e.g., standardized tests such as the PPVT). This variable was a significant moderator of all three outcomes (k overall = 48, k comprehension = 38; k code = 22): In all cases, effects were larger for study-specific than for study-independent instruments. Additionally, we analyzed the effects of "type of instrument" (test vs. observation) and of whether children's skills were tested in their home languages but found no significant moderator effects.

Discussion
The results of this meta-analysis indicate that investing in a stimulating home environment by means of FLPs can positively affect the literacy development of children from low-SES families, particularly in the short term. Moderator analyses revealed which types of programs are particularly beneficial for these children. Several of the moderator effects appear to indicate that children benefit more from a targeted than from a comprehensive approach, implying that "less is more." The foundation for this hypothesis is most solid for the variables "activity type," "combined center-based," "inclusion of other skills," and "setting." As shown in Table 1, programs that focus on a limited set of activities (particularly shared reading), that do not combine home activities with center-or school-based activities, that only target literacy skills, and that are restricted to one training setting (either home or center/school) yielded the largest effects. Other significant moderator effects (comprehension-related skills: "inclusion of other skills"; coderelated skills: "program focus," "combined center-based," and "focus on other skills") also suggested a positive impact of a targeted approach, but because these were based on small numbers of studies (k < 5), they should be interpreted with caution (Kontopantelis & Reeves, 2010).
The conclusion that targeted interventions are most effective corresponds with the outcomes of a previous meta-analysis by Bakermans-Kranenburg et al. (2003), who examined the outcomes of parent-child sensitivity interventions in early childhood. They found that interventions with a clear focus-programs that only aimed to enhance parental sensitivity at the behavioral level instead of also trying to change parents' mental representations and providing social support-yielded the largest mean effect (Cohen's d = 0.48). Bakermans-Kranenburg et al. suggest that, in the case of the families that were the target group of these interventions (i.e., families with a minority status, families characterized by poverty and social isolation, and single-parent families), broad programs "take too much time and energy away from a potentially effective, goal-directed intervention approach" (p. 210). A similar mechanism might be at work in the case of FLPs: The programs included in our meta-analysis mainly targeted low-SES parents, who are characterized by a relatively high incidence of stress factors such as the absence of a partner or financial problems (McElvany & van Steensel, 2009). It is plausible that overloading these parents with a broad range of activities that aim to stimulate a variety of skills and that require parents to alternate between receiving trainers at home and going to a preschool center or school might be asking too much of them. Providing parents with more streamlined interventions makes these interventions easier to implement, which likely adds to their effectiveness.
Our meta-analysis further suggests that introducing shared reading in families in which this is not necessarily a common activity can be a powerful mechanism (Leseman & De Jong, 1998;van Steensel, 2006). It appears that exposing low-SES children to the rich language input of books enables them to enhance skills such as vocabulary knowledge and story comprehension. Although theoretically plausible (Sénéchal, 2006;Sénéchal & LeFevre, 2002), this outcome is remarkable as well, because in previous meta-analyses (Manz et al., 2010;Mol et al., 2008) only marginal effects of shared reading interventions were found for low-SES children. To explain this discrepancy, we compared our database with the databases from these earlier reviews. Apart from the obvious fact that we were able to include newer studies (k = 10), this comparison additionally revealed that we made other choices. In fact, only two of the 10 older studies selected by Manz et al. (2010) and Mol et al. (2008) were also included in our meta-analysis (Cronan et al., 1996;Lonigan & Whitehurst, 1998). There were two reasons for this. In some respects, our inclusion criteria were broader than those used in the previous meta-analyses, resulting in the selection of additional studies. The earlier reviews either included published research alone (Manz et al., 2010), focused on one specific program (Mol et al., 2008), or only selected studies using standardized tests (Mol et al., 2008). In the current meta-analysis, we also included unpublished studies, studies targeting a broader range of interventions, and studies that (also) used researcher-developed tests. In other respects, however, our inclusion criteria were more stringent. First of all, we focused on children from low-SES families and used stricter criteria for low education and income. Different from Manz et al. (2010) and Mol et al. (2008), we therefore excluded studies that focused on children with language delays or from families with a history of reading disability without reference to socioeconomic criteria (Crain-Thoreson & Dale, 1999;Fielding-Barnsley & Purdie, 2002, 2003 and studies in which the samples were, on a closer look, not characterized by low SES (Huebner, 2000;Huebner & Meltzoff, 2005). Additionally, we made sure to include only those studies in which the effects of a family program could be isolated. We thus excluded some of the studies selected by Manz et al. (2010) in which home-and center-based activities could not be separated (DeBaryshe & Gorecki, 2007;Hargrave & Sénéchal, 2000;Whitehurst et al., 1999).
It can be argued that our inclusion of unpublished studies and our use of stricter criteria combined with a larger sample size provides a more solid base for the conclusion that shared reading interventions have beneficial effects. However, this does not make the possible impediments signaled in the meta-analyses by Manz et al. (2010) and Mol et al. (2008)-that is, the potential mismatch between programs and parental beliefs and abilities-irrelevant. We would like to stress that we do not rule out the possibility that some parents prefer activities other than shared reading. Likely, parents with limited or no basic reading skills will find difficulty in reading even simple storybooks to their children and possibly benefit from a more differentiated approach (see below). However, because hardly any studies provided information on parental literacy, we were not able to examine whether shared reading programs were indeed less effective for children of these parents. Future researchers might thus consider including a variable for parental literacy.
The observation that programs that only involved parent-child activities in children's homes were more effective than programs that also involved teacherchild activities in preschool centers or schools was contrary to initial expectations: It seemed more plausible that the higher dosage in combined programs and the synergy between home and school would lead to larger effects (Blok et al., 2005;Christenson & Sheridan, 2001;Mendez, 2010;Ramey & Ramey, 1998). A possible explanation is that in combined programs, the design of the family component is often based on that of the school component and, therefore, does not take sufficient account of the situation of families taking part, overlooking not only the needs of these families but also the knowledge and resources they have (González et al., 2005;Irvine & Larson, 2007;Moll et al., 1992;Street, 1995;Yosso, 2005). The REDI-P (REsearch-based Developmentally Informed-Parent) program, for instance, consists of a set of activities developed as a supplement to an existing center-based program (Bierman et al., 2015). The MEES program is also strongly oriented on the preschool curriculum (St. Clair & Jackson, 2006;St. Clair et al., 2012). It might thus be that the parent-child activities in these programs are too "academic" and therefore lack cultural validity (Fantuzzo et al., 2003;Manz et al., 2010;Quintana et al., 2001). In this light, it is interesting to observe that although there were few programs that allowed for differentiation in program activities, those that did showed a large mean effect size (Cohen's d = 0.90). Some of these explicitly made use of families' "funds of knowledge" (e.g., Boyce et al., 2010;Hirst et al., 2010;Ijalba, 2015;Johnson & Walker, 1991). These outcomes combined indicate the relevance of further research into how programs can be adapted to the characteristics and needs of participating families as well as make use of the knowledge and resources available in these families (Irvine & Larson, 2007;Street, 1995;Van der Pluijm et al., 2019).
Compared with programs that only focused on literacy, programs encouraging parents to support a broad range of skills proved in most cases to be the least productive: Average effect sizes were around Cohen's d = 0.20 across the different measures. Partly, these broad programs targeted children's socioemotional development by supporting parenting skills. In these cases, it might be that training parents to apply techniques for improving children's social behavior and emotional regulation drew their attention away from supporting children's literacy development. Support for this hypothesis comes from the study by Leung et al. (2011), who compared the effects on literacy skills and socioemotional variables of the HOPE program for children of Chinese immigrants in Hong Kong. In addition to emergent literacy, HOPE targets child behavior management by providing techniques for increasing desirable behavior (e.g., praise, rewards) and decreasing undesirable behavior (e.g., ignoring). The researchers found positive program effects on (reported) child behavior problems and parenting stress, but none on children's vocabulary.
There appeared to be one exception to the conclusion that programs targeting more skills were least effective: Programs focusing on both literacy and other cognitive skills-or more specifically, early numeracy-had the largest mean effect on children's code-related skills (Cohen's d = 0.65). While the small number (k = 3) makes this observation tentative, it is in line with a presumed theoretical mechanism. Korat et al. (2017), for instance, argue that the relationship between emergent code-related skills and emergent numeracy is reciprocal, because they are based on the same underlying proficiency: For both skills, children need to understand symbols (letters or numbers) and comprehend that operations that combine these symbols (reading or calculating) can create new symbols with new definitions. Stimulating children's early numeracy skills could thus have an additive effect on their code-related skills.
When training was only provided in children's homes, programs were more effective than when training was provided in both homes and centers/schools. This outcome was similar to the observation made in Manz et al.'s (2010) metaanalysis. Manz et al. explain this difference by proposing that training in institutional settings raises a threshold for low-SES parents due to their negative educational experiences in the past and mistrust of professionals. However, our finding of comparable effects of programs only providing training at home and programs only providing training in centers/schools (Cohen's d = 0.68 and 0.63, respectively) suggests otherwise. We hypothesize that training in more than one setting, requiring parents to alternate between settings, might detract from the consistency of a program, which negatively affects program implementation. What might have added to this inconsistency is that alternating between settings in some cases also implies alternating between trainers. In two of the studies showing particularly small effects, both evaluating (a variant of) the HIPPY program, home visitors and moderators of group meetings in schools were not the same persons (Necoechea, 2007;Van Tuijl et al., 2001).
The fact that other program characteristics-program duration, number of sessions, trainer type, modeling, and guided practice-did not yield significant moderator effects (see Table 1) is informative. The absence of effects for these variables suggests that longer interventions are not necessarily more effective than shorter interventions, that program effects delivered by professional trainers are not necessarily more effective than those delivered by paraprofessionals, and that larger effects are not restricted to programs that make use of specific training techniques. Two other program variables-differentiation and the use of digital materials-did not show significant moderator effects either, although it might be that this is due to the small number of programs that included these elements. The fact that programs that allowed for differentiated support and included digital activities showed effects larger than average (Cohen's d of overall effects = 0.90 and 0.68, respectively) could be an encouragement for further research.
Moderator analyses of sample characteristics showed no effects (see Table 1), implying that programs were equally effective for majority and immigrant/minority children, for first and second language speakers, and for younger and older children. Although this does not preclude the possibility that certain program characteristics are more important to specific groups (e.g., providing the intervention in a safe space and the home language in the case of immigrant parents), the small number of studies did not allow moderator analyses on subgroup level.
Analyses of study and measurement characteristics revealed two moderator effects (see Table 1). In line with our hypothesis, study-specific instruments yielded larger effects than study-independent instruments for all three categories of effect measures. This finding is explained by the fact that study-specific measures are generally more sensitive to identifying program effects than study-independent instruments (for similar outcomes, see De Boer et al., 2014;Okkinga et al., 2018;Sénéchal & Young, 2008). Contrary to expectations, however, experimental studies (in which individual random assignment was applied) showed larger mean effects than quasi-experimental studies. Further analysis revealed that this outcome was likely the result of a confounding of variables: The 12 programs that were evaluated in an experimental design were mostly those that had a targeted approach. All 12 were home-only programs and focused only on literacy skills. Additionally, in 11 of the 12 programs, training was provided in one context and eight were shared reading-only programs.

Limitations
As in many meta-analyses, a possible limitation of our study is the occurrence of a so-called apples and oranges problem (Kulik & Kulik, 1989;Lipsey & Wilson, 2001). First of all, we combined the outcomes of a variety of programs, which may hamper the generalizability of the overall effect size we found. However, since our goal was to identify the "effective ingredients" of FLPs by means of moderator analyses, we view this diversity more as a strength than as a weakness of our study. Additionally, we combined data based on effect measures of a range of skills. Similar to the meta-analysis by van Steensel et al. (2011), we dealt with this by dividing measures into two theory-based categories: comprehension-and code-related skills. This decision provided some nuance to our outcomes, as some moderator effects that were found for one type of skills were not found for the other. What we could not sufficiently accommodate, however, is the variability in the way socioeconomic status was defined: Whereas some researchers based their definition of low SES on parental educational level or income, others used indicators such as qualification for free/reduced lunch or eligibility for state-funded programs, or used combined measures. Though correlated, these indicators do not necessarily coincide. This outcome could be taken as an encouragement for researchers to provide more comprehensive information on SES.
A second limitation of our meta-analysis is that the information necessary to analyze long-term effects was too limited and diverse, allowing no definite conclusions about the presence or absence of fading-out effects. First of all, of the 48 studies, 33 only included immediate posttests, nine included only follow-up measures, and only six included both immediate posttests and follow-up measures. Additionally, there was large variability in the timing of follow-up assessments: The time between the end of the intervention and the administration of the delayed posttest ranged from 6 weeks (Ford et al., 2003) to as much as 6 years (Kağıtçıbașı et al., 2001;St. Clair et al., 2012). Certainly, this limitation is an impetus for further research on long-term effects of FLPs. Given the fact that longitudinal, correlation studies have shown that the early HLE has a profound impact on children's later reading development and because FLPs aim at making changes in parental daily routines, examining whether immediate effects are sustainable is highly relevant (Niklas & Schneider, 2017;Sénéchal & LeFevre, 2002).
A third limitation is that we were not able to analyze the effects of variability in implementation quality. Several authors have argued that implementation quality is essential to interpreting effects of FLPs (Bryant & Wasik, 2004;Powell & Carey, 2012;Raikes et al., 2006). However, previous meta-analyses (Sénéchal & Young, 2008;van Steensel et al., 2011) as well as a recent review on this topic (De la Rie et al., 2017) have shown that there is little system in how studies report implementation quality, which in the case of our meta-analysis made it impossible to include it as a variable in moderator analyses. Powell and Carey (2012) have suggested a framework for analyzing implementation quality, distinguishing three main variables: "delivery" (the way a program is transferred from trainer to parent), "receipt" (the way parents take in and engage in program activities), and "enactment" (transfer of program contents to nonprogram situations). Recently, De la Rie et al. (2017) further operationalized these variables. Two of the studies we analyzed, thoroughly examined program delivery, receipt, and enactment. Boyce et al. (2010) registered the number of home visits families received, made video observations to assess the home visitor's ability to facilitate parent-child interactions as well as parent and child engagement during home visits, and observed maternal supportive behaviors during parent-child interactions. Necoechea (2007) recorded treatment intensity (i.e., the number and duration of home visits and parents' attendance at group meetings), had home visitors assess the quality of the home visits, and assessed treatment fidelity on the basis of home visit observations, a parent questionnaire on program engagement, and a test measuring parental compliance with program activities. Necoechea found positive relations between receptive vocabulary, treatment intensity, and treatment fidelity. We advise future researchers to take account of frameworks such as those developed by Powell and Carey (2012) to examine the effects of implementation quality in a more structured way.

Implications
Our meta-analysis supports the hypothesis that children's literacy development can be promoted by making changes in their HLEs and that this is also true for children from low-SES families. This outcome is encouraging for practitioners, program developers, and policymakers, because it implies that efforts to support low-SES parents in stimulating children's literacy skills can be fruitful. An unanswered question is whether program impact is sustainable. Although it appeared that, on the long term, program effects fade out, we cannot be sure because of the aforementioned limitations in the measurement of delayed effects. This calls for more longitudinal effect studies, but probably also for more research that examines which types of interventions could aid parents in preserving positive changes made during program participation. An interesting perspective to the issue of sustainability was recently suggested by Macleod and Tett (2019). In a qualitative study of Family Learning Workers in Scotland and the parents they supported, these researchers observed how making use of the knowledge and resources available in families and approaching parents as "experts on their own children" (p. 181), resulted in long-term changes in parental behavior. They concluded that "where a funds of knowledge approach . . . is taken it has the potential to make a lasting difference to parents' sense of self and on their practices" (p. 181).
Some programs were more effective than others. Important for practitioners, program developers, and policymakers is the recommendation not to overload parents (Bakermans-Kranenburg et al., 2003): A targeted program, focusing on a limited set of activities and skills, and restricted to one (training) context appeared to be most beneficial for supporting children's literacy development. Having said that, some possibly effective program elements have been underexplored. First of all, we found medium effects of programs that offered digital activities, but the numbers of studies that examined their impact was small. Because of the large role digital devices play in the lives of young children, researching digital opportunities in FLPs is relevant (Kabali et al., 2015;Rideout, 2017). Also, the large mean effect of programs that allow differentiation seems highly promising, although, again, few studies examined this type of intervention. We therefore recommend that researchers invest more effort in testing the effects of programs that are culturally valid and that build on families' own knowledge and resources.

Notes
Suzanne Fikrat-Wevers is now at the Institute of Medical Education Research Rotterdam, Erasmus University Medical Center.