Construct-related validity of the strengths and difficulties questionnaires with three and five dimensions: A multitrait-multimethod analysis

The Strengths and Difficulties Questionnaire (SDQ) is one of the most broadly used questionnaires to evaluate children’s psychological adjustment, however its internal structure has been a target of ongoing controversy. Recent studies suggested a three-factor structure of the SDQ, however data is still scarce. The present study used the Multitrait-Multimethod analysis to examine SDQ construct related-validity with three and five dimensions, provided by children, their parents and teachers. A total of 415 participants were recruited from a Portuguese community sample. Both SDQ versions presented good convergence-related validity, with higher values for the five version. Findings from this study suggest that the SDQ with three dimensions could be more suitable as a screening measure of children’s psychological adjustment in a community low-risk sample. Nevertheless, the SDQ still needs further psychometric improvements in order to properly collect information from multi-source samples about the prevalence of children’s psychological adjustment.

dimensions namely Emotional Problems, Peer Problems, Conduct Problems, Hyperactivity/ Inattention and Prosocial Behaviours.Another SDQ version was further developed by Goodman and colleagues (2010) in which the SDQ could be organized into three dimensions, namely Externalizing, Internalizing and Prosocial Behaviours.In this version the Emotional Problems and Peer dimensions merge into a subscale named 'Internalizing Problems' and the conduct and Hyperactivity/Inattention dimensions merge into an 'Externalizing Problems' subscale.The two merged scales are more suitable to assess children's psychosocial adjustment from community samples, whereas the five separate scales could provide more useful information in high-risk and/or clinical samples (Goodman et al., 2010).
Nevertheless, the SDQ internal structure has been a target of ongoing controversy.Several studies with children/adolescents, parents and teachers as independent informants have reported adequate support for the five-factor structure version (e.g.Stone et al., 2010), whilst other studies have either found only marginal support (e.g., Hill & Hughes, 2007) or no support for this SDQ version (e.g., Dickey & Blumberg, 2004).Some studies showed difficulties in confirming the SDQ five-factor structure and revealed poor construct validity (Pechorro et al., 2011).Other studies with SDQ five-factor structure and with three informants (self-report, teacher and parent) have found no evidence of SDQ discriminant-related validity (e.g.Marzocchi et al., 2004).
The three-factor SDQ structure has been supported in some exploratory analyses in the US (parents), Belgium (parents and children), and Finland (youth SDQ) (Dickey & Blumberg, 2004;Koskelainen et al., 2001;Van Leeuwen et al., 2006).In Portugal, Costa and colleagues' (2020) study showed evidence of acceptable internal consistencies the three dimensions and evidence of discriminant-related validity, however poor convergent-related validity.This issue about the best psychometric version of SDQ should be investigated in detail.
Another enduring issue is whether different informants are truly indicating the same construct (Munkvold et al., 2009).A study with multinational sample of children aged between 6 to 11 yearsold from seven European countries have found only moderate agreement between parent and teacher SDQ ratings (Cheng et al., 2018), whilst Hill and Hughes (2007) have found evidence for convergency among parent, teacher and peer ratings with first graders at risk of educational failure.
Construct validity is an important aspect of validity assessment of a measurement, and it can be evaluated through convergent-and discriminant-related validities (Byrne, 2010).Research about construct validity usually focuses on the extent to which data exhibit evidence of convergent-related validity (the extent to which different methods correspond in their assessment of the same trait), discriminant-related validity (the extent to which independent methods differ in their assessment of different traits) and method effects (an extension of the discriminant-related validity issue; Campbell & Fiske, 1959).Convergent-and discriminant-related validities can be more robustly evaluated by using the multitrait-multimethod (MTMM) method (Widaman, 1985).
To date, eight studies (Table 1) examined construct-related validity of the SDQ five-factor structure, but none assessed it for the SDQ three-factor structure with the MTMM approach.Although Goodman (et al., 2010) recommended a three-factor structure the analysis was only performed with the Internalizing and Externalizing dimensions.Regarding the SDQ five-factor structure, most of the studies found good convergent-related validity and discriminant-related validity problems for most dimensions.
Despite useful findings from the aforementioned studies data is still inconsistent for the SDQ five-factor structure, and scarce to the three-factor version.The main purpose of this study is to assess construct validity for SDQ with three and five dimensions, provided by children, their parents and teachers, and within the framework of a multitrait-multimethod (MTMM) design.The specific goals are to evaluate the internal consistency, convergent-and discriminant-related validity of SDQ with three dimensions and five dimensions.The two versions of SDQ were explored since the threedimension version could be more appropriate to evaluate psychological adjustment in low-risk community samples.Goodman (et al., 2010) also recommended the importance of using multiple approaches to assess construct validity in order to obtain a more and complete information about SDQ performance, however, until this date very few studies have explored SDQ construct-related

Measures
Before completing the study measures, parents were asked to complete a brief sociodemographic questionnaire which included individual questions (age, relationship with the children, marital status, educational level, professional status, partner's educational level, partner's professional status and residential area) and family characteristics (child's age and gender, child relationship with the partner and household composition).

Procedures
Participants were recruited through non-probabilistic intentional sampling in private schools, postclass educational centres and soccer learning centres from the Lisbon metropolitan area.Participants completed the questionnaires via Paper and Pencil (P&P) or online, as some schools and parents requested an online version.During the sample recruitment, some schools and learning centres have only returned the parents' questionnaires (n = 55) and therefore those participants were excluded from the present study, since the purpose was to match children with their respective parent and teacher.Data from parents that did not respond to the sociodemographic measures that allowed us to make the match with the children and teacher were also excluded from the study.In some cases, it was not possible to recruit the three informants (14 participants excluded), however the questionnaires from the two informants were included.The final sample was comprised of 129 triads (children and their respective parents and teachers) and 14 doubles (6 Children and their parents; 1 Child and their teacher; 7 Parents and Teachers of the same children).Sample recruitment occurred between November 2018 and September 2019.Following the Declaration of Helsinki, all participants were given the option to elucidate any questions related to study's content and procedures.All participants have signed a consent form prior to their participation.The study was approved by *****'s Ethics Committee (approval number D/001/03/2018).

Data analysis
Sample size for the present study has been determined according to Saris and Galhofer (2014).The authors mentioned that the sample size of a three-group design should be higher than 300 in order to obtain design efficiency, regardless the method variance.Descriptive statistics were calculated for all items of the SDQ using SPSS (v. 25, SPSS Inc. Chicago, IL).Items' sensitivity was evaluated through Skewness (Sk) and Kurtosis (Ku) analysis.Absolut values of |Sk| and |Ku| greater than three and seven, respectively, were considered as a severe violation of the normality assumption (Marôco, 2014).SDQ internal consistency was evaluated through standard Cronbach's alpha coefficient (Marôco, 2014).
In order to test for evidence of SDQ construct-related validity, convergent-and discriminantrelated validity were tested within the framework of a multitrait-multimethod (MTMM) design by which multiple traits (SDQ dimensions) were measured by multiple methods (children, parents and teachers).The MTMM analysis was conducted for both matrix level and individual parameter level and were performed with the SDQ constituted by three dimension and by five dimensions.The following steps were performed according do Byrne's guidelines (2010) and all MTMM analysis were conducted using AMOS program (v. 18, SPSS Inc. Chicago, IL).Four models were created using SDQ dimensions as traits and the three informants (children, parents and teachers) as methods.The first model (Model 1) is represented by both trait and method factors and includes correlations between traits and correlations between methods.Model 1 also represents the hypothesized model and was the baseline against which the other nested and more restricted models were compared.Model 2 corresponds to an absence of trait factors.Model 3 includes trait and method factors, although it includes traits that are perfectly correlated.Similarly, to Model 1 and 2, in Model 3 the method factors are freely estimated.Model 4 is different from Model 1 only in the absence of specified correlations between method factors.Specification of parameters of the SDQ with three dimensions for MTMM Model 1, 2, 3 and 4 is portrayed schematically in Figures 1-4 respectively.
MTMM matrix-level analyses.The goodness-of-fit indices (χ2 and CFI) of Model 1 were compared with the goodness-of-fit indices of the other MTMM Models to evaluate the existence of evidence of convergent-and discriminant-related validity on a matrix level.To determine convergent-related validity, or the extent that independent measures of the same trait are correlated (e.g., parent rated and self-rated prosocial behaviours), comparisons between Model 1 and Model 2 were performed using the difference in CFI and χ 2 values.A significant difference in χ 2 (Δχ 2 ) and in CFI (ΔCFI) and a ΔCFI greater to .01 provides evidence of convergent-related validity (Byrne, 2010).Discriminantrelated validity was evaluated for traits and methods.To examine traits' discriminant-related validity (the extent to which independent measures of different traits are correlated), Model 1 and Model 3 were compared.A large Δχ2 and/or a substantial ΔCFI provides support for discriminant-related    validity.To examine methods' discriminant-related validity, Model 1 and Model 4 were compared.A small Δχ2 and/or a small ΔCFI indicates discriminant-related validity.
MTMM Parameter-level analyses.A more precise evaluation of trait-and method-related variance can be determined by examining individual parameter estimates of the factor loadings and factor correlations of the hypothesized model (Model 1).Convergent-related validity was assessed through the factor loadings.The magnitude of the trait loadings reflects the convergent-related validity and an overall comparison of trait and method loadings reveals the proportion of method variance that may exceed the trait variance.If this proportion is significant, convergent-related validity could be weakened.Discriminant-related validity bearing on specific traits and methods is determined by examining the factor correlation matrices.In order to suggest discriminant-related validity, the correlations between traits should be negligible.Regarding method's factor correlations, their discriminability is related to the extent to which they are maximally dissimilar (correlations should be also negligible).
The mentioned analysis will be performed with SDQ with three dimensions and SDQ with five dimensions.

Missing data
There were 84 scores randomly missing from different participants within a total of 8715 scores.The missing value were approximately 1%.Expectation maximization (EM) was used to impute the missing data.
Convergent and discriminant-related validity: MTMM matrix-level analyses.Table 2 presents the summary of goodness of fit indices of Model 1, Model 2, Model 3 and Model 4. Model 1 showed a very good fit to the data while for Model 2 goodness-of-fit is very poor.Model 3 presented a marginally good fit but not as well-fitted as Model 1. Model 4 revealed good fit to the data but slightly less well-fitted than Model 1.
In order to evaluate convergent-and discriminant-related validity at a matrix level, a summary of comparisons between Model 1 with Models 2, 3 and 4 were performed.The Δχ 2 is highly significant (χ 2 (12) = 95.782,p < .001),and the difference in practical fit (ΔCFI = .348)is significantly above .01,which suggested strong evidence of convergent-related validity.The comparison between Model 1 and Model 3 yields a Δχ 2 value that is statistically significant (χ 2 (3) = 27.308,p < .001)and the difference in practical fit was fairly large (ΔCFI = .101),suggesting modest evidence of discriminant-related validity for SDQ traits.The comparison between Model 1 and Model 4 yield a Δχ 2 value that was small and statistically significant (χ 2 (3) = 31.59,p < .001)and the difference in practical fit was also small (ΔCFI = .024),which indicates evidence of good discriminant-related validity for the methods.
Convergent and discriminant-related validity: MTMM parameter-level analyses.As indicated in Table 3, almost all the traits' loadings were statistically significant, except for children's self-ratings of PB and parent ratings of PB.Teacher ratings of E presented the highest magnitude of trait loadings.Selfratings of E presented the lowest magnitude within the statistically significant traits loading.The comparison between traits' factor loadings and methods' factor loading, revealed that the proportion of method variance exceeds the trait variance for two of the self-ratings (E and PB), for all parent ratings and one of the teacher ratings (PB).The attenuation of traits by methods (mostly associated to parent ratings) showed a moderate convergent-related validity.
As shown in Table 4, the traits were significantly associated, except for I with PB.I and E presented a significant and low association, whilst the E and PB presented a moderate and negative association.Finally, method factor correlations presented significant correlations.The self-ratings with both parent and teacher ratings presented low associations, however the association between teacher and parent ratings was moderate.Those correlations indicated a modest level of discriminant-related validity of the methods.
Convergent and discriminant-related validity: MTMM matrix-level analyses.Table 5 presents the summary of goodness of fitness indices of Model 1, Model 2, Model 3 and Model 4. For Model 1 there was a good fit to the data, while for Model 2 the Goodness-of-fit was not adequate.Regarding Model 3, although goodness-of-fit for this model is better than for Model 2, the model is marginally well-fitted and less well-fitted than Model 1. Goodness-of-fit results for model 4 revealed a good fit to the data, but slightly less well-fitted than Model 1.
Regarding convergent-and discriminant-related validity, comparisons between Model 1 with Models 2, 3 and 4 were performed.For Model 1 and Model 2 the Δχ2 was highly significant (χ 2 (25) = 169.297,p < .001),and the difference in practical fit (ΔCFI = .269)was significantly above .01,which suggests strong evidence of convergent-related validity.For Model 1 and Model 3 the comparison yields a Δχ 2 value that is statistically significant (χ 2 (10) = 68.363,p < .001)and the difference in practical fit was large (ΔCFI = .109),suggesting modest evidence of discriminant-related validity for the traits.For Model 1 with Model 4 the comparison yields a small and significant Δχ 2 value (χ 2 (3) = 10.974,p = .010)and also a small difference in practical fit (ΔCFI = 0.015) which argues for evidence of good discriminant-related validity for the methods.
Convergent and discriminant-related validity: MTMM parameter-level analyses.All trait loadings were statistically significant with magnitudes ranging from .286(parent-ratings of PB) to .870(teacher ratings of PP).The comparison between traits' factor loadings and methods' factor loading, revealed that the proportion of method variance exceeds the trait variance for one of the self-ratings (Pb), two of the parent ratings (CP and PB) and one of the teacher ratings (PB).The proportion of methods with higher loadings than the traits correspond to 4 in 15 methods.The slight attenuation of traits by methods (mostly associated to parent ratings) and the significant magnitudes of the traits' loadings showed a good convergent-related validity (Table 6).
As shown in Table 7, most of the traits were significantly associated, except the CP with EP, and HY with PB.PP presented moderate to high associations with CP and EP, respectively.PB presented moderate to high and negative associations with PP and CP, respectively, and HY presented high associations with CP.The mentioned associations decreased the attainment of trait discriminant-related validity.Finally, regarding the method factor correlations, the association between parent ratings and teachers' ratings was moderate which diminished the discriminantrelated validity of the methods.Nevertheless, discriminant-related validity was higher for methods that for the traits.

Discussion
The present study examined construct-related validity of the three and five dimensions of the SDQ in a sample of children (10-15 years), parents and teachers, within the framework of a multitraitmultimethod (MTMM) design.The first goal was to explore evidence for internal consistency, convergent-and discriminant-related validity of SDQ with three dimensions.Regarding the SDQ dimension's internal consistency the main concern was with the low to very low self-ratings.Findings suggested that for this sample the SDQ is more reliable for the parents and teachers, rather than the children.Similarly, an Italian study (Di Riso et al., 2010) with children (8-10 years) reported unacceptable to good reliability values for the SDQ with three dimensions on the selfreports.Probably for other age range this could occur differently.The convergent-and discriminantrelated validity were moderate mainly due to parents' and children' ratings on prosocial behaviours.Prosocial behaviours also seemed to interfere with discriminant-related validity, especially for parents' and teachers' ratings.Issues with this dimension were already reported in other studies (e.g.Palmieri & Smith, 2007) and were expected by Goodman et al., (2010).The items of prosocial behaviours are rated in another format response and are interspersed with the items of the other dimensions (e.g.Achenbach et al., 2008).
Another goal was to explore evidence for internal consistency, convergent-and discriminantrelated validity of SDQ with five dimensions.Internal consistency values were unacceptable for parents' ratings and self-ratings.For teachers' ratings the values were very good, except for the conduct problems dimension.Since the children of this study belong to a community sample problematic behaviours could not exist, or certain problems could not be identified because they could not have the enhanced self-reflection about their own behaviours.Teachers probably could have more privileged access to children's behaviours in their multiple life areas comparatively to parents that usually observe them in their household and family moments.Therefore, the SDQ could be faithfully reflecting different children's behaviours observed in different contexts from different informants' perspectives.
The findings revealed a good convergent-related validity and low discriminant-related validity.Similarly with (Gomez, 2014), conduct problems and hyperactivity/inattention contributed to the decreasing of the discriminant-related validity.Emotional and peers' problems also contributed to this decreasing suggesting that these dimensions are confounded with each other possibly because is not expected for these children to manifest disruptive behaviours (lying or stealing) nor clinical emotional problems.
In sum, although the reliability of the SDQ with three dimensions in not satisfactory for all informants the values of internal consistency were more suitable for this sample when compared to the SDQ with five dimensions since it presented unacceptable values in self-ratings and parents' ratings.Also, when SDQ traits are combined into higher order dimensions it helps improve discriminant-related validity indicating that the SDQ with three dimensions could be more appropriate to evaluate psychological adjustment in non-clinical samples.Goodman et al. (2010) also found higher values of convergent and discriminant-related validities for internalizing and externalizing problems, than for the five-structure SDQ version.These dimensions have the advantage to reduce measurement error since they have a greater number of items.These findings seem to corroborate the literature that refers that the three-factor structure SDQ could be more suitable for community samples with minimal risk problematics and also more appropriate as explanatory or outcome variables in epidemiological studies (Costa et al., 2020;Goodman et al., 2010).Nevertheless, studies should further investigate whether these internal consistencies are due to cultural differences in parental and youth perceptions about the latent constructs.With the appropriate psychometrical modifications, SDQ with three dimensions could be used to collect information from multi-source samples about the prevalence of children's psychological adjustment and with the purpose of examine the need for prevention and intervention programs (Hill and Hughes, 2007).
Results should be considered in light of study limitations.The recruitment procedures and families' characteristics could influence the findings from this study, since it is not possible do determine the representativeness of our sample regarding the Portuguese population.Future studies should aim to recruit more diverse samples.Also, future studies should explore SDQ psychometric qualities in both clinical and community low-risk sample to confirm if SDQ is indicated to attribute clinical diagnostics and/or to be a screening instrument.SDQ external criterion should be posteriorly accessed.As far as we know, this was the first study to assess simultaneously both SDQ versions regarding their construct-related validity through an MTMM design and to explore Goodman and colleague's hypothesis about the SDQ most suitable version for low-risk samples.
Strengths and Difficulties Questionnaire (SDQ).Children, parents, and teachers completed the selfreport, parent and teacher version, respectively, of the Portuguese version of Strengths and Difficulties Questionnaire(Fleitlich et al., 2005).SDQ assesses children's psychosocial adjustment, relationships, emotions and behaviours.The SDQ is composed by 25 items and was originally created with five scales, namely, Emotional Problems [EP], Peer Problems [PP], Conduct Problems [CP], Hyperactivity [HY], and Prosocial Behaviours [PB].Recent studies suggested a three scales version, as being more suitable for low-risk community samples and composed by Internalizing Problems [I] (combining the EP and the PP scales), Externalizing [E] (combining the CP and HY scales) and Prosocial Behaviours [PB].Each dimension is scored on a three-point scale (0 = Not true; 1 = Somewhat true; 2 = Certainly True) and ranging from 0 to 10, and total difficulties score ranging from 0 to 40.The reversed scores were performed for items 7, 11, 14, 21 e 25.Internal consistency of SDQ version will be reported in the results section.

Table 1 .
Summary of Studies Assessing Strengths and Difficulties Questionnaire with a Multitrait-Multimethod Approach.

Table 2 .
Summary of Goodness-of-Fit Indices for SDQ MTMM Models.
a Respecified model with an equality constraint imposed between E7, E8 and E9.

Table 5 .
Summary of Goodness-of-Fit Indices for SDQ.