Not One Sexual Double Standard but Two? Adolescents’ Attitudes About Appropriate Sexual Behavior

Popular belief holds that sexual behavior is evaluated more liberally for males than females. However, the assessment of this “sexual double standard” is controversial. Therefore, we investigated measurement equivalence of commonly used items to assess sexual double standards in previous research. Based on established measurement equivalence, we investigated whether adolescents endorsed a sexual double standard. Using data from 455 adolescents (Mage = 14.51, SD = 0.64), confirmatory factor analyzes showed that the sexual double standard concept was measurement equivalent across sex, and partly across evaluations of the same and opposite sex. Factor analyzes demonstrated that there was not one, but two sexual double standards. Male adolescents evaluated male sexual behavior more liberally than female sexual behavior, but female adolescents evaluated female sexual behavior more liberally than male sexual behavior. This contradicts the traditional notion of the existence of one sexual double standard that favors male and suppresses female sexuality.

Popular belief holds that human sexual behavior is generally evaluated more liberally (e.g., admiration for having sexual intercourse with multiple persons) for men than for women. Recent events in the Netherlands indeed seem to suggest the presence of this "sexual double standard." For example, a sorority was recently in the news because of "slutshaming," and a so-called "Bangalist" in which male sorority members evaluated female members negatively for their sexual behavior (Parool, 2017;Trouw, 2017;Volkskrant, 2017). According to some female sorority members, male adolescents were appreciated for being sexually active, while female adolescents were brought down for doing the same thing (BNNVARA, 2017). Although this recent news led to a heated public debate, studies have yielded mixed evidence about whether this sexual double standard actually exists for adolescents. The lack of clarity stems from the fact that most of the studies on sexual standards are conducted in the USA and are characterized by methodological limitations in the measurement of sexual double standards in male and female adolescents.
It is important to know more about sexual double standards because these are likely to influence adolescents' sexual development. For example, sexual double standards can increase female adolescents' fear of stigma (Hamilton & Armstrong, 2009) and increase male adolescents' risk of peer rejection (Kreager & Staff, 2009). In different ways, the sexual double standard decreases both female and male adolescents' sexual agency. Therefore, in this paper we examine if there is a sexual double standard among Dutch youth. To examine this, we first assessed measurement properties of commonly used items for assessing the sexual double standard. Based on this assessment of equivalent measurement properties, we then examined the existence of the sexual double standard in a sample of Dutch youth by comparing their attitudes about sexually appropriate behaviors of both male and female adolescents.

Origin of the Sexual Double Standard Concept
Originally, the concept of a "sexual double standard" was developed by Reiss (1960), who was the first to classify attitudes toward (premarital) sexual behavior into categories. These categories were: abstinence (premarital sexual behavior discouraged for men and women), double standard (premarital sexual behavior discouraged for women but not for men) permissiveness without affection (premarital sexual behavior encouraged for men and women regardless of emotional involvement), and permissiveness with affection (premarital sexual behavior encouraged for men and women when in a committed relationship). The original concept of the sexual double standard thus referred specifically to discouragement of premarital sexual behavior for women but not for men.
A sexual double standard can be broadly defined as a standard that judges sexual behavior differently for men and women, with men being more positively evaluated than women for showing the same sexual behaviors (Milhausen & Herold, 2001). How can we explain the existence of such a sexual double standard? Social learning theory proposes that a distinction is made between gender-typed behaviors that are favorable and those that are not (Bandura, 1977;Mischel, 1966). Traditional gender-typed behaviors define sexual agency as a male trait, whereas they define sexual passivity as a female trait (Eagly & Wood, 1999).
These gender normed expectations might have originated from the belief that men hold more power than women (Eagly & Wood, 1999) or from the belief that men are evolutionarily "programmed" to be more sexually active (Trivers, 1972). Deviation from these gender normed expectations can lead to negative evaluations from others-these "others" being both men and women (Bandura, 1977). According to social learning theory, individuals are more likely to repeat behavior that is rewarded (sexual agency for men and sexual passivity for women) than behavior that is punished (Bandura, 1977). Therefore, the prevailing gender normed expectations in society are important for the manifestation and maintenance of a sexual double standard.

Manifestation of the Sexual Double Standard in Current Society
Studies investigated the traditional sexual double standard, and found evidence for its presence in young adults, mainly college students (e.g., Aubrey, 2004;Crawford & Popp, 2003;Hartley & Drew, 2001;Jonason & Marks, 2009;Kreager & Staff, 2009). Specifically, these studies found that male adolescents were generally evaluated more liberally than were female adolescents for showing the same sexual behaviors. However, these studies have been conducted ten to 15 years ago, which begs the question if this sexual double standard is endorsed as strongly in current society. Scholars suggest that male and female adolescents are becoming more liberal in their attitudes towards sexual behaviors (e.g., non-marital sex is now more accepted for both male and female adolescents) and there is a convergence in gender norms for sexuality in the last decades (Petersen & Hyde, 2011). This has led to scholars challenging the veracity of the sexual double standard, resulting in a large number of more recent studies investigating the manifestation of sexual standards in adolescents (e.g., Allison & Risman, 2013;Kettrey, 2016;Kreager et al., 2016;Lai & Hynie, 2011;Milhausen & Herold, 2010;Papp et al., 2015) The evidence coming from these recent studies is mixed and inconclusive. Some studies found evidence for a traditional double standard. For example, Lai and Hynie (2011), in their study of Canadian undergraduate students aged 17 to 25 years old, found that traditional sexual behaviors in committed relationships were evaluated more positively by female than male adolescents, and that experimental sexual behaviors-not necessarily within committed relationships-were evaluated more positively by male than female adolescents. This outcome supported the general idea of a traditional double standard. In contrast, other studies demonstrated a "reversed double standard" whereby male adolescents were evaluated more negatively for their sexual behavior than female adolescents (Milhausen & Herold, 2010;Papp et al., 2015). In one study, U.S. college students, aged 18 to 25, were asked to judge Facebook conversations between a "slut" (I am going out tonight, going to get some), and a "shamer" (I saw you last night, you are such a slut) (Papp et al., 2015). They found that the male slut was evaluated as less appealing than was the female slut, indicating a reversed sexual double standard. Other studies, however, found evidence for double standards that were different for male and female adolescents. These studies demonstrated that sexual behaviors by other sex peers were evaluated more negatively than sexual behaviors of same sex peers (Allison & Risman, 2013;Kreager et al., 2016;Soller & Haynie, 2017). For example, Allison and Risman (2013) found that among U.S. college students, more male than female adolescents endorsed a traditional double standard, and more female than male adolescents held a reversed double standard.

Methodological Limitations of Previous Research
The mixed and inconclusive evidence from previous research can perhaps partly be ascribed to methodological limitations. A problem in previous research is that measurement equivalence has not been assessed for differences between responses given by male and female adolescents, nor for differences between their responses about the behavior of male and female adolescents. Measurement equivalence entails that the observed differences in the latent mean (i.e., the underlying construct) are due to true differences in the underlying construct and not due to differences at the measurement level (i.e., differences in scaling; Grouzet et al., 2006).
The sexual double standard construct is a complex, multidimensional social construct that can be differently interpreted, manifested or expressed in various contexts, such as communities or gender contexts (Crawford & Popp, 2003). Therefore, it is questionable whether certain behaviors are interpreted in the same manner by female and male adolescents. Several studies have identified and expressed the need to carefully assess whether groups can be compared on sexual attitudes, before actual comparisons on latent means can be made (Constant et al., 2016;Crawford & Popp, 2003;Emmerink et al., 2017;Jardin et al., 2017;Zhou et al., 2014). Indeed, when a scale is used, the question often remains if the scale items are a good representation of the underlying construct (Sakaluk & Fisher, 2019), and if one can compare scale items between groups (e.g., cultural groups, religious groups, gender) or across repeated measures. One proposed solution for this problem is to assess measurement equivalence across groups and/or across repeated measures (Grouzet et al., 2006).
To date, not many studies included the evaluation of measurement equivalence in the field of sexual health (Zhou et al., 2014). However, one study did examine measurement equivalence of a scale designed to measure sexual double standards (Emmerink et al., 2017). Emmerink and colleagues (2017) concluded that their scale items reached acceptable equivalence with regard to gender differences. However, they relied on criteria that have been shown to have limited power to detect violations of measurement invariance (Chen, 2007;Jorgensen et al., 2018;Meade et al., 2008), which may raise doubts about the validity of their conclusions. In addition, Emmerink et al. (2017) only examined whether their items were equivalent for male and female adolescents. They did not test whether the evaluation for male adolescents' sexual behaviors and the evaluation for female adolescents' sexual behavior was equivalent for male and for female youths. According to sexual selection theory (Darwin, 1871), however, this is important. Indeed, an individual might judge the sexual activity of someone of their own sex (a competitor) differently than sexual activity of someone of the other sex (a target). Therefore, the intra-individual equivalence (the same individuals responding about the same and about the other sex) might be an important overlooked aspect.

The Current Study
In this study among 455 adolescents aged 13 to 17 years, we examined the sexual double standard by comparing differences in the evaluations of sexual behaviors for male and female adolescents. First, we assessed if the measurement properties of our scale items were equivalent across four possible contexts (i.e., male adolescents answering questions about male adolescents; male adolescents answering questions about female adolescents; female adolescents answering questions about male adolescents; female adolescents answering questions about female adolescents). Our specific aim, here, was to assess whether we could use our scale items for measuring adolescents' sexual double standards with adequate internal validity. Our second aim was to assess sexual attitudes of adolescents themselves, and to examine whether indeed there is evidence to support the notion of a sexual double standard across male and female adolescents. To our knowledge, measurement equivalence for scale items about sexual attitudes have not been evaluated before. Therefore, we were unsure whether measurement equivalence would hold, even partially. Also, given the mixed findings about adolescents' sexual double standards, we specified no a priori hypothesis on sex differences regarding this sexual double standard.

Participants and Procedure
We performed a secondary data-analysis on a sample of 455 Dutch adolescents who filled in questionnaires. Participants were male and female adolescents (male adolescents = 50%) whose ages ranged from 13 to 17 years old (M age = 14.51, SD = 0.64). Their educational level differed (lower vocational education = 66.5%, average or higher level secondary education = 33.2%). The majority of adolescents (81.7%) indicated that they were heterosexual, 11.8% indicated that they were gay or lesbian, 0.8% indicated that they were bisexual, and 5.8% indicated that they were unsure of their sexuality. Participants of 10 schools were asked to complete questionnaires. The adolescents and their parents provided informed consent, and adolescents participated in the study without any form of compensation. All study procedures were approved in full by the Faculty of Social and Behavioral Sciences of Utrecht University ethics board. Adolescents were informed that their information would not be shared with any third party, such as their lecturers or parents.

Measures
Sexual attitudes. Adolescents indicated their attitudes about sexual activities by responding to ten questions that concerned evaluations of sexual behavior of male and female adolescents. These were five questions concerning the evaluation of female sexual behavior, and five similar questions concerning the evaluation of male sexual behavior: "I admire a girl/boy who has sexual intercourse with multiple boys/girls," "I pity a girl/boy who is still a virgin at 18," "A girl/boy who has sexual intercourse on the first date, has no self-respect," "It is fine if a girl/boy has sexual intercourse with a boy/girl without being in love," and "I admire a girl/boy who is still a virgin when (s)he marries a boy/ girl." The questions were based on similar item sets used in previous studies (e.g., Allison & Risman, 2013;Kettrey, 2016) and questions that were addressed in the review of Crawford and Popp (2003). The questions were answered on a 5-point Likert scale from 1 (totally agree) to 5 (totally disagree).

Statistical Analyzes
Preliminary analyzes. Preliminary inspection revealed that the data were not multivariate normally distributed. Furthermore, not for all items in the sexual attitudes questionnaire did participants utilize all the response categories (i.e., "I totally agree") in both groups. Therefore, we collapsed answer categories 4 and 5 ("I agree" and "I totally agree," respectively) for two items. Given the small number of remaining ordinal categories, we subsequently treated the data as categorical rather than continuous in the factor analysis model (Rhemtulla et al., 2012), using diagonally weighted-least squares (DWLS) estimation (Rosseel, 2012). We could not use multiple imputation to handle missing data, because no method has yet been proposed for pooling robust chi-squared-difference test statistics across multiple imputations (Enders, 2010). Instead, missing data were handled with pairwise deletion. Because less than 1% of data points were missing on each variable, the missing data was assumed to have negligible impact.
Further inspection of the data indicated that not all items were correlated with each other. Item 3 (i.e., a girl/boy who has sexual intercourse on the first date, has no self-respect) and item 5 (i.e., I admire a boy/girl who is still a virgin when he/she marries a boy/girl) were uncorrelated with the rest of the items in both male and female adolescents. This indicates that these items are less relevant for use in current-day research on sexual attitudes. Further analyzes were conducted with the remaining three items. Testing hypotheses for latent variables using three indicators is a well-accepted practice and has the advantage of resulting in a just-identified latent variable-reducing spurious correlations and allowing only one way to specify inter-item relations (e.g., Coffman & MacCallum, 2005;Little, 2013).
Measurement equivalence analyzes. We tested measurement equivalence of adolescents' gender-based attitudes about sexually appropriate behavior across four contexts with structural equation modeling (SEM) in R (R Development Core Team, 2005). Three measured items were indicators of the evaluation construct in four contexts: "male adolescents" evaluations for male adolescents," "male adolescents" evaluations for female adolescents," "female adolescents" evaluations for male adolescents," and "female adolescents" evaluations for female adolescents." Measurement equivalence of these four contexts was assessed sequentially (i.e., configural equivalence, threshold equivalence, metric equivalence, scalar equivalence, and strict equivalence), with the theta parameterization (Muthén & Asparouhov, 2002). We used a 2 (male or female) by 2 (responses about males or about females) design, assessing between-group and cross-repeated measures equivalence, using the method suggested for polytomous items by Wu and Estabrook (2016).
First, we assessed configural equivalence, which means that the groups have the same factor structure and model fit is satisfactory. We specified several identification constraints: All intercepts and factor means were set to zero, all factor variances and residual variances were set to one, all factor loadings and thresholds were estimated, and the factor correlation and residual correlations between repeated measures within each group were estimated as well. To test the null hypothesis of exact model fit, we examined a mean-and variance adjusted χ 2 test, with an α = .05. To assess model fit, RMSEA and CFI were evaluated, with RMSEA values <.05 and CFI values > .95 indicating good fit. Second, a model was specified to investigate threshold equivalence (cf. Wu & Estabrook, 2016). In this model all thresholds were set equal across groups (i.e., male and female adolescents) and across repeated measures (i.e., responding about the same and responding about the other sex). For identification purposes, intercepts and residual variances were constrained in one context (male adolescents responding about male adolescents), but freely estimated in the other three contexts. Measurement equivalence was tested by comparing Model 1 (configural equivalence) and Model 2 (threshold equivalence) with a mean-and variance adjusted χ 2 difference statistic. This comparison was possible because the models were nested. If there was no significant decrement in fit, then the null hypothesis of equal thresholds across contexts could not be rejected. If the test did show significant decrement in fit, then thresholds were sequentially freed to detect which contexts differed from others for each item.
Third, metric equivalence (different items are equally indicative of the common factor for male and female adolescents responding about male and about female adolescents) was tested, by running a model in which factor loadings were equal across groups and across repeated measures in addition to the previous threshold constraints. For identification purposes, the variance of the common factor was set to one in one context reference group (male adolescents responding about male adolescents) and were freely estimated for the other contexts. If the test of measurement equivalence showed significant decrement in fit, then follow-up tests were conducted to test differences across particular contexts (e.g., only constrain males' loadings to equality for their responses about male and about female adolescents). After locating contexts that differ, omnibus tests were followed with tests of equivalence for loadings of each individual item.
The fourth step was to assess scalar equivalence (intercepts of the items are equal for male and female adolescents responding about male and about female adolescents). In addition to the previous constraints, here we also constrained the intercepts of the items to be equal across all four contexts (across groups and repeated measures) by setting them all to zero. Scalar equivalence means that we can make a valid comparison of the means of the latent variable between both groups. The mean of the common factor could then be interpreted, with a positive mean indicating that the mean of that group is higher than the mean in the reference group (i.e., male adolescents responding about male adolescents). If the null hypothesis of scalar equivalence was rejected, particular contexts were tested for equivalence, after which partial equivalence was tested by assessing which items' intercepts significantly differed from each other.
Strict measurement equivalence. Strict equivalence was assessed as a fifth and sixth step to assess the reliability of the scale. In the fifth model, residual variances were constrained to be equal across contexts by fixing them to 1 and residual covariances remained freely estimated. In the sixth model, equality of residual correlations across sexes was tested by constraining them to equality across sexes. When the model did not show a significant decrement of fit, then the null hypothesis of equal residual covariances could not be rejected. If there was a significant decrement of fit in Model 5 or 6, then follow up tests were performed where residual (co)variances were freed one by one.
Mean difference tests. When scalar equivalence held-at least partial equivalence had to be obtained-we were able to compare latent means between male and female sexual attitudes (i.e., evaluations about male and female sexual behavior). Significant differences would be evidence of a double standard, and double standards could also be compared between male and female adolescents. We assessed these mean differences with factor analyzes using latent mean estimates from the scalar equivalence model. When the difference between the means of sexual standards (i.e., male adolescents' evaluations for male adolescents and male adolescents' evaluations for female adolescents, and vice versa for female adolescents) was significantly different from zero, then there was evidence of a sexual double standard. Next, the mean differences on the sexual standards were compared between male and female adolescents, and we assessed whether these differences were equivalent. Nonequivalence would indicate that sexual double standards differ between male and female adolescents. Effect sizes were obtained by standardizing the coefficients, which can be interpreted as Cohen's d. Finally, we assessed mean differences between sexes regarding how they evaluated their own sex versus the other sex (how male adolescents were evaluated by male adolescents versus how male adolescents were evaluated by female adolescents; and how female adolescents were evaluated by male adolescents versus how female adolescents were evaluated by female adolescents).

Measurement Equivalence Tests
First, measurement equivalence was tested across the four contexts (i.e., across gender and across repeated measures). The configural model, the threshold-invariance model, the metric-invariance model, and the scalarinvariance model were specified and fit to the data, and each restricted model was compared to the baseline model (configural) model.
A configural fit was found for a model with six items (item 1, 2 and 4 for male and female adolescents), where fit indices (RMSEA and CFI) indicated acceptable fit. However, the chi-squared test of exact fit was significant, which indicates that the model did not fit perfectly. After inspection of the correlation residuals, we found no correlation residuals that were problematic (all residuals < 0.10), which indicates that model misspecifications did not appear to be large. Therefore, we concluded that the significant chi-squared test of exact fit reflected the accumulation of many small approximation errors, but that the approximate fit of the model was good. Subsequently, this model was retained as baseline model.
Next, the model that tested threshold equivalence did not show a significant decrement in fit compared to the baseline model; the null hypothesis of equal fit of the models could not be rejected. This finding indicates that the expected distribution of answer categories, controlling for the common factor, was similar across all four contexts.
The model that tested full metric equivalence had a significant decrement in fit compared to the configural model, indicating that full metric equivalence did not hold. Therefore, we first investigated the context (across gender or across repeated measures) in which the non-equivalence was manifest. We found evidence against the null hypothesis of equivalence across repeated measures (i.e., responding about male adolescents vs. responding about female adolescents), but not across groups (male vs. female respondents). Therefore, holding the loadings equal across groups, we freed constraints across repeated measures for one item at a time.
Results indicated that the model improved only when item 1 (i.e., I admire a boy/girl who has sexual intercourse with multiple girls/boys) was freely estimated across repeated measures (but still constrained across groups), and that this modification led to no significant differences between the configural model and the partial metric model. Therefore, the model in which the factor loading of item 1 was freely estimated across repeated measures was retained as the partial metric model. This finding indicates that all the items were equally indicative of the underlying construct for male and female adolescents.
However, the item "I admire a boy/girl who has sexual intercourse with multiple girls/boys" was not equally indicative for responding about the same and about the other sex. The item was more indicative for responding about male adolescents than for responding about female adolescents. We then specified a model where non-equivalence for this item was allowed, and achieved partial metric equivalence. Next, the model with partial scalar equivalence did not show a significant decrement in fit compared to the baseline model. This model could only test partial scalar equivalence; all the intercepts were constrained across all of the four contexts, except the intercepts across repeated measures for item 1 because those loadings were not equivalent. This model did not differ from the configural model, so partial scalar equivalence was found, indicating that overall it was likely that differences in the latent mean (sexual attitudes) reflected differences in the underlying construct and not differences at the measurement level.

Strict Measurement Equivalence
The additional analyzes focusing on measurement equivalence are displayed in Table 1. A fifth model was specified to assess partial strict invariance as we could not constrain item 1 across occasions because of differences between loadings. This model did not show a significant decrement in fit compared to the baseline model. Full invariance would indicate that if scale means or sums were analyzed instead of using a latent variable model, differences in variances between contexts could be attributed to differences in the underlying construct, rather than differences only with regard to a single item. However, because the factor loading of an item differed across repeated measures, a scale mean or sum would have different reliabilities across repeated measures, resulting in differential power to detect effects between these contexts.
When the residual covariances also were tested for equality in the sixth model, results showed that they were not equal to each other. Freeing the residual covariances one by one still resulted in significantly worse model fit. This indicates that if scale means or sums were analyzed instead of using a latent variable model, true sex-differences in correlations between same and other-sex evaluations would be confounded with differences in individual items.

Mean Difference Test
Mean differences for male and female adolescents on their sexual attitudes were assessed by comparing means in the partial scalar model (Table 2) in which the latent means were identified by constraining their average to zero. Results indicated there was a significant mean difference between how male adolescents evaluated male adolescents and how male adolescents evaluated female adolescents (evidence of a sexual double-standard among male Root Mean Square Error of Approximation (RMSEA) were assessed for all models assessing equivalence of two groups (i.e., female adolescents vs. male adolescents) and repeated measures (RM; i.e., respondent responding about male adolescents vs. responding about female adolescents). *significant at α = .05, **significant at α = .01, ***significant at α = .001.  Note. The grand mean is set at zero, higher scores represent more liberal attitudes. *significant at α = .05, ** significant at α = .01, *** significant at α = .001.
adolescents). Specifically, we found a positive mean difference, indicating that male adolescents evaluated male sexual behavior significantly more liberally than female sexual behavior. Cohen's d indicated this was a small effect (d = 0.043). Second, results indicated there was a significant mean difference between how female adolescents evaluated male adolescents and how female adolescents evaluated female adolescents (evidence of a sexual double standard among female adolescents). Specifically, we found a negative mean difference, which indicated that female adolescents evaluated female sexual behavior significantly more liberally than female adolescents evaluated male sexual behavior. This difference also had a small effect size (d = 0.149). Thus, both sexes evaluated the sexual behavior of their own sex more liberally than they evaluated the sexual behavior of the other sex, but neither sexual double standard appeared substantial. Finally, our results indicated that there was a statistically significant but small difference between the sexual double standards of male and female adolescents (d = 0.192). Specifically, male adolescents held a traditional double standard, whereas female adolescents held a reversed double standard, but the difference again could be characterized as small.

Additional Analyzes on Mean Differences
Additional analyzes are displayed in Table 2. Results indicated significant mean differences, showing that male sexual behaviors were evaluated in general more liberally by male adolescents than by female adolescents (d = 0.960), and that female sexual behaviors were also evaluated more liberally by male adolescents than by female adolescents (d = 0.768).

Discussion
Researchers have been involved in an ongoing debate about the actual existence of a sexual double standard in current society. Our current study, based on a stringent and successful test of measurement equivalence, demonstrated that nowadays adolescents do not have one sexual double standard, but instead have two. Specifically, male adolescents endorse a more traditional double standard, in which male sexual behavior is evaluated more liberally than female sexual behavior. In contrast, female adolescents endorse a reversed double standard, in which female sexual behavior is evaluated more liberally than male sexual behavior. These findings emerge against the backdrop of another clear sex difference: male adolescents hold more liberal attitudes towards sexual activity and behaviors than female adolescents for both male and female sexual behaviors.

Not One, But Two Sexual Double Standards
This study expands on results of Emmerink and colleagues (2017). The results of our study specifically demonstrate that items referring to sexual attitudes can be compared across sex, and across responding about the same and other sex. Still, we cannot exclude the possibility that the sexual double standard is actually a multidimensional construct (Crawford & Popp, 2003), and that adolescents may have been cautious in giving extreme answers, being inclined more to "lean towards the middle." Thus, our results show that before any data on female and male sexual activity or attitudes can be compared, psychometric data properties should be thoroughly assessed and modified when necessary, preferably using SEM.
Our study contradicts the long-held notion of a traditional double standard for both male and female adolescents, and supports the double-double standard that has been reported by several other scholars before (e.g., Allison & Risman, 2013;Soller & Haynie, 2017). This indicates that there is inequality in what is perceived as sexually appropriate behavior by male and female adolescents. However, this inequality manifests itself differently than has previously been suggested. Specifically, we found that although male adolescents endorse a traditional double standard in which male sexual behavior is evaluated more liberally than female sexual behavior, female adolescents in contrast evaluate female sexual behavior more liberally than male sexual behavior.
Evolutionary theory is often used to explain inequality in male and female sexual attitudes about appropriate behavior (e.g., Darwin, 1871;Trivers, 1972). Although this theory is in line with our finding that male adolescents are overall more liberal toward sexual activity than are female adolescents it cannot explain the so-called double-double standard, with female adolescents holding more positive evaluations of females' sexual behavior than of males' sexual behaviors. Perhaps, the double-double standard can be more readily explained by social role theory, which suggests that gender typed expectations are formed by preferred characteristics and different roles that male and female adolescents fulfill within societies (Eagly & Wood, 1999). In the Netherlands, acceptance of female emancipation (e.g., more equality in education and work) has increased since 1970 (Neve, 1995). However, male adolescents under the age of 25 have been shown to favor females' emancipation less than do older men and peer female adolescents (Neve, 1995). Although this result dates from 1990, it could be that female adolescents still accept female emancipation more than do male adolescents in the Netherlands, which might explain a double-double sexual standard. However, this does not explain our finding that male adolescents tend to have more liberal attitudes-also about female sexual behavior-than female adolescents have.
Another, perhaps more fitting, explanation for the existence of a doubledouble sexual standard in male and female adolescents can be found in social psychological theory: the in-group/out-group opposition (Lèvi-Strauss, 1967). Several previous studies underline that individuals categorize certain groups as an in-group (a group that matches characteristics of their own), or an out-group (a group that is different from characteristics of their own), and individuals tend to have a positive bias toward their in-group (e.g., Miller et al., 2010;Tajfel et al., 1971). From this perspective, female and male adolescents could be more inclined to evaluate sexual behavior that is carried out by someone of their in-group (in this case, someone of the same sex) more favorably.
We cannot exclude the possibility that our results reflect a specific cultural context. It is possible that the sexual double-double standard that we found is typical for Dutch adolescents, is not necessarily generalizable across cultures. According to erotic plasticity theory, for instance, individuals are subjected to external influences that determine what kind of sexual activity is desired (Baumeister, 2000). Gender-typed expectations are more traditional (e.g., male sexual behavior is evaluated more liberally than female sexual behavior) in more male-dominated cultures, whereas gender-type expectations regarding sexuality might be more equal in a country where female sexuality is not suppressed. In the Netherlands, a more feminine society, gender roles and expectations may be more equal than in more male-dominated societies such as the USA (Hofstede, 1998).

Limitations and Strengths
Our findings should be interpreted with several study limitations in mind. First, our study had an exploratory scope, due to methodological challenges we wanted to overcome before hypothesizing about the specific nature of adolescents' sexual double standard(s). Therefore, we advocate that it is important that our study is replicated in order to say more about the strengths and ecological validity of our results. In addition, it would be insightful to work with an enhanced item-set on sexual attitudes in future research, and establish a replication of our current findings. Specifically, future research could include a wider variety of sexual behaviors of male and female adolescents (e.g., sexting, having sex with multiple partners) (Petersen & Hyde, 2011). Furthermore, we measured sexual attitudes with self-reports. It should be noted that implicit sexual attitudes might differ from explicit sexual attitudes, and that therefore, it might be valuable to add implicit attitude measures in future research (Sakaluk & Milhausen, 2012). Notwithstanding these limitations, our methodological design and sophisticated analysis strategy (i.e., assessing measurement equivalence in four contexts with a latent-variable model) gives us great confidence that our study is a valuable addition to current research in adolescents' sexual development.

Conclusion
Finding two sexual double standards instead of one might indicate that sexual inequality is decreasing. However, it still indicates that male and female adolescents judge the other sex slightly differently than they judge their own. Whether this is because adolescents are more conservative towards their future partners, or because they (implicitly) hold the belief that their own sex is entitled to be more sexually active, inequality still exists. Educating children about equal rights for men and women, and equal (sexual) needs of male and female adolescents may help to create less stereotyped and less judgmental gender expectations. Future scholars could expand on our results by investigating the underlying values of the double-double sexual standard; does this standard indicate that male adolescents are still as traditional and female adolescents became slightly more egalitarian? Or does it, instead, indicate that male and female adolescents became more egalitarian but male adolescents became slightly less egalitarian regarding female sexual behavior? In any case, our results show that gender-based differences in sexual standards are visible in today's society. Because such double standards have real consequences for the development of individuals' psychosexual identities and health, they deserve our continued scrutiny and attention.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.