Measuring early child development across low and middle-income countries: A systematic review

The Sustainable Development Goals mandate that by 2030, all children should have access to quality early child development opportunities, healthcare and pre-primary education. Yet validated measures of ECD in low and middle income countries (LMICs) are rare. To address this gap, a Systematic Review (SR) of measures available to profile the development of children between the ages of 0–5 years in LMICs was undertaken. Drawing on education, psychology and health databases, we identified reliable, valid or measures adapted for use in LMICs for either assessments of children’s development or their learning environments. The inclusion criteria were (1) peer reviewed papers published between January 2009 and May 2019; (2) assessment tools used to measure cognitive/language development or the early years or home environment in at least one LMIC; (3) report of the psychometric properties (validity and reliability) of the tool, and/or description of the cultural adaptability/translation process undertaken before applying it to a LMIC. Two hundred and forty-nine available records published in the last decade in peer-review journals and nine relevant systematic literature reviews were identified. Fifty-seven records were qualitatively synthesised based on their psychometric properties and cultural adaptation. Forty-three tools were reviewed utilising 12 criteria. Five elements of analysis present in Tables 2 and 3 (study, population tested, validity, reliability and cultural adaptability/translation) focused on the tools’ psychometric properties and previous application in LMICs. A further seven dimensions outlined in Tables 4 and 5 identified specific characteristics of the tools from target age, administration method, domains, battery, accessibility, language and country/institution. We suggest these 12 key considerations for the selection of measurement tools that are applicable to effectively assess ECD in LMICs.

old infants will have no expressive language while other indicators of development such as some motor skills, can reach ceiling effects relatively quickly. These developmental patterns indicate the importance of using concurrent measures of development to identify patterns of need and to use developmental trajectories only when there is a population-based comparison as a benchmark. The ways in which a child's developmental competencies can be measured, also varies. Profiling domains of development may be based on either criterion referenced measures or normative data; as we shall see normative data are often lacking in LMICs. Measures may involve direct assessment of the child's skills through the use of standardised tests or observations, or be collected by using a proxy, such as a parent or teacher reports. Direct assessments of children's skills provide more robust and valid measures of development.
The challenges for drawing comparisons across populations varies by type of measure and response format. For example, norm-referenced tests, which are reliable and valid, provide information about where an individual lies in comparison to peers of the same age. Norm-referenced tests can focus on hypothetical constructs such as non-verbal ability or specific abilities such as naming vocabulary. The basic principle of norm-referenced tests is to define a continuum of performance from lowest to highest and the measure assigned to a particular individual locates his/her position on that continuum relative to the standardisation sample. Tests can only provide appropriate norms if they are used for the population for which they were intended. Norms from highincome (often USA, UK and Australia) countries will not be appropriate for LMICs samples where children experience very different social contexts, languages and have access to different educational opportunities. By corollary, norms that are standardised on monolingual children may not be appropriate for bilingual or multilingual children. Norms must also be current as they become outdated by about three points a decade (Trahan et al., 2014).
To augment child-level data in the early years, it is also important to capture the child's learning environment, profiling both the home and the early years settings (Fernald et al., 2009(Fernald et al., , 2017. Both environments have the potential to support and enhance ECD. The home learning environment includes the physical characteristics but also, importantly, the interactions which occur between the child and their families or primary caregivers within the home. These interactions offer both implicit and explicit learning opportunities for the child. The home environment is a key predictor of cognitive and socio-emotional development, and its effects are evident throughout formal education (Bradley and Caldwell, 1976;Olson et al., 1990). Home interactions, particularly maternal responsiveness, mediate the impact of social disadvantage on development (Evans et al., 2010;Foster et al., 2005). The impact of the home environment is complemented by the opportunities afforded by the early years environment.
The 'quality' of early years settings impacts on children's development (Sylva et al., 2006). Assessments of quality typically consider both structural (e.g. child ratios, group size, caregiver's qualifications and training) and process factors (e.g. caregiving practises, children's experiences and caregiver-child interactions) that promote learning and development (World Health Organization, 2004). Whilst the nature of the environment varies across different types of settings, there is a strong relationship between structural and observed process characteristics. For example, as with the data from the home environment, process features such as caregivers' warmth and responsiveness (Perlman et al., 2016), directly impact on positive children's outcomes. Environments with high quality processes offer children rich opportunities to interact with adults, peers and materials (World Health Organization, 2004). Key factors for maintaining quality in preschool settings include childadult pedagogical interactions, the curriculum, learning materials, teachers' perceptions of learning and professional development opportunities (Mathers, 2021;Rao et al., 2019). Whether the same constructs generalise to LMICs is an important empirical question. Current understanding indicates that assessing learning environments need to consider the specific cultural context of what makes a positive learning environment (Raikes et al., 2019). There are therefore strong empirical and theoretical reasons to profile children's development, identifying strengths and needs, as well as capturing the learning environment. Even in countries in the global North -where a wide range of assessment tools have been developed and standardised-there remain significant debates about which measures to use for which children at which point of development and in which settings.
While more than 80% of the global childhood population resides in LMICs, most ECD measures come from high income countries (Rao et al., 2019). Child Development Assessment tools (CDATs) in LMICs tend to follow one of four formats (Sabanathan et al., 2015): 1. a standard western CDAT with no adaptations; 2. a western CDAT translated (linguistic equivalence) and/or adapted for the local cultural environment (cultural equivalence); 3. an amalgamation of a number of translated and/or adapted items from several different western CDATs; or 4. a locally developed, culturally specific CDAT consisting of original items designed to be relevant to the population of interest.
Each of these approaches raises challenges for use and interpretation. Locally developed tools limit comparison across countries and settings, reducing our understanding of biodevelopmental niches. By contrast, measures designed and standardised in more affluent western settings with no appropriate adaptations will not be culturally appropriate. Norms are likely to be inaccurate and developmental criteria identified in criterion referenced assessments may not be culturally appropriate.
There are thus a series of questions that need to be considered in any study aiming to profile the skills of children in LMICs (McCoy et al., 2018a, 2018b. In addition to the challenges with standardisation, measures which rely on self-completion by parents or professionals need to consider the literacy level of the respondents and the way in which items are interpreted within particular contexts. As McCoy et al. (2018b) argue 'few valid and reliable tools exist for capturing ECD at scale across cultural contexts' (p.58). What remains clear is that assessment tools developed in highincome countries (HICs) need to be modified before they are applied in LMICs. Cultural adaptation includes (a) establishing the appropriateness of target items, (b) translation/back translation of the measure and the underlying construct(s), (c) adaptation of the content and the procedure of administration, (d) piloting and iterative testing of the tool (Fernald et al., 2009). Without cultural adaptation, there is no guarantee that the same underlying abilities are being captured (Sabanathan et al., 2015). In sum, measuring ECD across LMICs poses significant challenges that need to be recognised when reporting child development profiles and profiling ECD environments. Accordingly, the following research questions guided our work: (1) What assessment tools have been used by peer-review published studies conducted in the last decade in LMICs to profile children aged 0-5 years old's cognitive development and learning environment? (2) What assessment tools have been recommended by relevant previous systematic reviews to measure children aged 0-5 years old's cognitive development and learning environment?

Methodology and methods
To answer RQ1 the SR aimed to identify reliable and valid tools which can be used in LMICs to profile children's cognition and their learning environment. To answer RQ2, we included previous relevant systematic literature reviews (see ** in References). The methods entailed a systematic searching and screening of published literature using a set of inclusion and exclusion criteria:

Selection criteria
The inclusion criteria were: • • peer reviewed papers published between January 2009 and May 2019 • • assessment tools used to measure cognitive/language development or the early years or home environment used in at least one LMIC • • report of the psychometric properties (validity and reliability) of the tool, and/or description of the cultural adaptability/translation process undertaken before applying the tool to a LMIC.
We excluded studies that: • • Included assessments tools that were developed, standardised and used only on HIC • • Applied the tool to age groups different from our study • • Did not provide information about the tool's psychometric properties (validity and reliability) and/or a description of the cultural adaptability/translation processes.

Search terms
Using the search terms provided in Table 1, 258 peer-review journal articles were retrieved through the authors' university access system in relevant Education, Psychology and Health databases (ProQuest, PubMed, EconLit, PsychInfo, ERIC, Medline and Global Health). Two hundred and forty-six of these records were identified through database searching and 12 additional records were identified through hand search. Out of these, 68 duplicates where removed. From the 190 records screened, 68 were excluded based on their titles and abstracts following the inclusion and exclusion criteria descried above. After assessing 122 full-text articles, 65 were excluded as they did not meet the inclusion criteria. The remaining 57 full text articles were included in our qualitative synthesis (see * in the Reference). In order to ensure the accuracy and reproducibility of the review, the fourth author replicated the screening and data extraction stages. The raw proportion of agreement between both coders was very high (97%). After providing further evidence to justify the inclusion/exclusion criteria of the nine studies where there was disagreement (3%), one study previously excluded was included. Figure 1 shows the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines flowchart for article selection (Moher et al., 2015).

Data extraction and coding
Data from the 57 selected studies (43 reporting measures of child development, 14 environment) were entered into a spreadsheet. We extracted information from the studies using 25 criteria included in previous relevant SRs and agreed between the team: (a) Tool information (nine criteria); (b) Study information (four criteria), and (c) Tool application (12 criteria) (see Supplemental Appendix 1 for details). When information required for a full assessment of the feasibility of applying each tool was not provided, we imputed 'not reported' and interpreted it as an inconclusive area for future examination.

Results: Qualitative synthesis
Forty-two selected studies included in the SR reported 34 tools assessing children's development at age 0-5 years old in 35 LMICs. Most of the tools reported validity (n = 15), but this was variably  described; some studies mentioned that the tool had 'well established', 'satisfactory' or 'good' validity without providing more details. Studies also varied in which type of validity was considered including concurrent, face, construct, content and convergent, without justifying these choices. When internal consistency was measured, Cronbach alpha varied between 0.23 (CREDI) and 0.95 (IDELA and CDSC). For most of the tools (n = 16), reliability (inter-rater and test-retest) was also reported. Results varied greatly from poor (Kappa: 00 for some CREDI items) to very good reliability (BSID-I, STBAPD and MacArthur-Bates Communicative Development Inventory 0.99). Indeed, differential reliability (per domain as opposed to general) was reported, with lowest coefficients for social-emotional, and highest for motor, cognitive and language domains. For most of the tools, (n = 15), cultural adaptation mentioned translation, backtranslation and adaptation of items to the new culture by the research team informed by local and international staff. Nine environment tools were identified from the review. Ten studies reported five environment tools to measure the home environment at 0-5 years old in LMICs. These were applied in Bangladesh, Colombia, India, Indonesia, Mexico and Pakistan. Similarly to the developmental tools, there was marked heterogeneity in the way psychometric properties and cultural adaptation were reported. Again, there was variability in the type of validity reported including concurrent, face, construct, content and convergent, without justifying these choices. When internal consistency was measured, Cronbach alpha varied between adequate (Cronbach: 0.71 for FCI) to moderate (0.46 HSQ). Some studies simply referred back to previous research stating that the tool had good test-retest reliability or high inter-rater reliability without providing specific details. Cultural adaptation tended to mention that the instrument has been used worldwide previously, but with limited information regarding the process undertaken. Only the HOME included detailed information regarding its cultural adaptation.
From the five studies reporting four environmental tools to measure ECE settings, these were applied in four LMICs: China, Indonesia, Tanzania and South-Africa. Only the Chinese Early Childhood Environment Rating Scale (CECERS) mentioned good content, concurrent and criterion-related validity and the Early Childhood Environment Rating Scale-Revised (ECERS-R) referred to demonstrated predictive validity. The reliability of the three tools was reported to be good, ranging from 0.95 for ITERS-R to 0.97 for ECERS-R. Cultural adaptation tended to mention that the instrument has been used worldwide previously, but with limited detailed information regarding the process undertaken, except for Measuring Early Learning Quality and Outcomes (MELQO) (MELE module) where the process undergone was outlined.
Focusing on the 34 tools to assess children's development at age 0-5 years old in LMICs, their target age was (1) 18-24 months (n = 9); (2) 25-60 months (n = 21) and 0-60 months (n = 4). Focusing on the administration method, the majority of assessments were direct assessments of the child (23) and 11 were completed by caregivers. Developmental domains included language, cognition, motor skills and social-emotional development. However, the operationalisation of the domains varied by test and developmental phase. Overall, language (n = 22) and cognition (n = 17) were assessed in the majority of measures while motor skills (n = 16) and socio-emotional development (n = 14) were less common. More than a third of the studies did not include information about accessibility (n = 11). From those that did, 14 required payment and nine were free to use. The tools were primarily produced in English (n = 24), with five tools developed in local languages, such as French, Kigirima and Chinese. Five tools did not report the language of use. Most of the tools were developed in USA (n = 13), while others were globally developed by international organisations such as World Bank, UNICEF and UNESCO (n = 6). A few were developed in the UK (n = 5) and countries such as India, Malawi, Kenya, Hong Kong and South Africa.
Focusing on the ten studies reporting five environment tools to measure the home environment at 0-5 years old in LMICs, all were suitable for 0-60 months (n = 5). Regarding the administration method, the majority of the tools were completed by caregivers (n = 4), and one was a direct assessment. Three tools focused on cognitive and socioemotional caregiving with no information provided for the remaining two. Information about accessibility was often not provided (n = 3), while two required payment. The dominant language of the tools was English (n = 4), with one tool with missing information. Most of the tools were developed by international organisations such as UNICEF (n = 3), with the remaining two, developed in USA and India, respectively.
Five studies reported four tools to measure the early learning environments at age 0-5 in LMICs. Regarding the target age, one was suitable for 0-60 months (n = 1), and three, for 25-60 months (n = 3). Three of the tools were direct assessments (n = 3), one parent reported, and one was not reported. The environment tools also vary in terms of the domains assessed. All assessed the space and physical setting as well as the quality of interactions, curriculum planning and implementation and personnel (n = 4), but varied in terms of the other included dimensions, such as personal care routines (n = 2), inclusiveness (n = 1) and play (n = 1). Scales which examined the environment and physical setting were more common in the ECE settings measures than the home environment, whereas the key feature included in every tool was the quality of interaction with the child. Regarding accessibility, two studies did not include this information, one required payment and one was free to use. Two tools were in English (n = 2), one in Chinese, and one was not reported. Two of the tools were global (n = 1), one developed in USA (n = 1), and one was Chinese (n = 1).

Discussion: 12 considerations for selecting suitable measurements to effectively assess ECD in LMICs
The SR was undertaken to identify tools available to profile the development of children and their learning environments between the ages of 0-5 years. It reviews forty-three tools (34 focusing on child development and 9 on the environment) that have been used previously to assess early development in LMICs. The ongoing debate about which measure to use for which children in which setting remains a pressing one. This is of particular importance for childhood stunting and, as such, for the UKRI GCRF Action Against Stunting Hub.
Drawing on the synthesis of 57 records included in this SR, we compared the tools' application identifying five critical markers (the study, population tested, validity, reliability and cultural adaptability/translation) outlined in Tables 2 and 3. Focusing on the psychometric properties, studies varied greatly in the way validity and reliability were reported, ranging from no reporting to a variation in the way these characteristics were addressed. Studies also varied in which type of validity was considered including concurrent, face, construct, content and convergent, without justifying these choices. Without valid and reliable tools, measuring the impacts of developmental challenges, such as stunting, interventions and ECD across different cultural contexts will not yield equivalent conclusions making it harder to identify barriers, drivers of development, and effective interventions. Here we used as a benchmark validity >.7, but a close attention of how reliability is reported is important. Regarding the cultural and contextual appropriateness/potential to adapt, and in line with Sabanathan et al. (2015), we found that researchers typically translated tools from HIC (linguistic equivalence) with a minority adapting them in a systematic way for the local cultural environment (cultural equivalence). In some cases, an amalgamation of a number of translated and/ or adapted items from several different HIC tools were used but the validity of this approach was rarely examined. Researchers need to actively engage in developing robust measures which include cultural adaptations and translation/back translation. These procedures and any changes should be reported for the measures.
We also compared 43 tools (34 for assessing children's development, 5 to measure the home environment and 4 to measure ECE settings) on key markers outlined in Tables 4 and 5. We suggest     that these seven markers (target age, administration method, domains, battery, accessibility, language and country/Institution) are also critical to address the implementation challenges that practitioners and researchers face when choosing tools. Crucially, information on the tool's accessibility (including licences, training and other operational aspects) are required in order to successfully apply the tool to a new context. Moreover, measures which require high levels of professional training will be challenging in contexts where psychologists, speech and language therapists or occupational therapist are not commonplace. Overall, our SR highlights a need for improvement in the way studies report a tool's psychometric properties and the cultural adaptation. In line with McCoy et al. (2018b) we found few valid and reliable tools suitable for use in comparative studies across LMICs for cognition and the environment.
Finally, conducting the SR has raised important questions about how measures are selected. Reliability and validity are necessary dimensions in deciding appropriate measures but equally important are considerations of cultural appropriateness and suitability of the tool for intended use.
Making an informed choice about which measure and why requires a nuanced understanding of the purpose and overarching objectives of the project and research focus. Why, what and how to measure children's development at different ages are crucial decisions to choose suitable ECD measures (Fernald et al., 2009). Our SR has served as a foundation for identifying relevant opportunities and challenges when choosing ECD measures in LMICs. Over 30 years of child development, research has emphasised the ways in which children and contexts shape each other (Sameroff and MacKenzie, 2003) yet studies in LMICs have often been limited to child level measures alone. Any attempt to measure and model development must include both the child and the different contexts in which they develop. The SR confirmed that to capture the child's biodevelopmental niches measures at child and environment level are needed.

Limitations
As all SR, the results were determined by our keywords and search parameters. The focus during the last decade meant that resources published before 2010 were excluded. Although we compensated for this focus on the last decade by including nine previous relevant SR, there are limitations derived from our choices. In addition the significant number of studies that did not report their psychometric properties or cultural adaptations limited our ability to synthesise the evidence from these sources.

Conclusion
Effective ECD measures are crucial for meeting the SDGs. Our SR illustrates a number of opportunities and challenges when identifying tools to measure ECD across LMICs. Selecting appropriate measures is a crucial step to tracking early development and learning to better understand a complex challenge such as childhood stunting. A poorly chosen measure can significantly compromise the best research design and study. Overall our SR put forwards 12 key considerations used to compare the tools. Five dimensions present in Tables 2 and 3 (study, population tested, validity, reliability and cultural adaptability/translation) bring attention to previous applications of the tools in LMICs. Seven dimensions outlined in Tables 4 and 5 (target age, administration method, domains, battery, accessibility, language and country/Institution) refer to the tools' characteristics. Together they can illuminate the process of selecting assessment tools. These key considerations extend beyond evaluating basic psychometric properties to consider the wider social context in which children are developing to ensure their suitability and validity for the study's purpose.
Finally, our contribution to the field of early childhood research is the revision of 43 up-to-date tools (34 for assessing children's development, five to measure the home environment and four to measure ECE settings) for measuring ECD across LMICs. We suggest that the 12 key considerations used in our SR are critical as they offer future researchers and practitioners in the field a guide to pay attention to the implementation challenges, psychometric properties and cultural appropriateness of different tools to assess ECD in LMICs.