Signs of Narcissism? Reconsidering a Widely Used Measure

Recent research on CEOs’ narcissism has mostly used unobtrusive measures, even though such measures have not been validated sufficiently. In two settings (Study 1 with 601 participants from various occupations and Study 2 with 97 managing directors), we analyze the construct validity of the commonly used narcissism index (NI). We find that the NI is only moderately correlated with the established and validated Narcissistic Personality Inventory (NPI), which calls into question the convergent validity of the NI. We further alter the company’s financial performance in our simulation to test whether performance affects the NI. Results show that individuals have different levels of NI after a period with a high compared with a low financial performance. This casts doubt on previous findings in organizational research using the NI and other unobtrusive measures because it reverses the common assumption of cause and effect.

Unobtrusive measures (Webb et al., 1966;Webb & Weick, 1979) have gained a lot of attention, especially, from researchers who investigate how CEO's personality traits and values affect strategic choices and firm performance (Carpenter et al., 2004). Because direct access to leaders within large companies is limited and CEOs will scarcely answer questions about their psychological traits (Carpenter et al., 2004;Cycyota & Harrison, 2006), many scholars rely on evidence executives leave behind in their environment such as in public documents (Chin et al., 2013;Dahl et al., 2012;Finkelstein et al., 2009). One of the most prominent approaches is Chatterjee and Hambrick's (2007) narcissism index (NI), an alternative measure for CEO narcissism when other established approaches (e.g., collecting self-reported data) are not feasible. Their unobtrusive measure includes the prominence of CEO's photograph in annual reports (ARs), the prominence of a CEO's name in press releases, the CEO's use of first-person singular pronouns in interviews, and relative compensation. They found in a study of 111 CEOs that narcissists, as measured by the NI, prefer bold actions such as acquisitions and tend to generate higher fluctuations in performance. Their measure has appealed to many academic scholars who study the impact of narcissistic CEOs, for instance, on risk taking or different performance outcomes (Chatterjee & Hambrick, 2011;Engelen et al., 2013;Gerstner et al., 2013;Patel & Cooper, 2014;Tang et al., 2018;Zhu & Chen, 2015a, 2015b. In their recent meta-analysis, Cragun et al. (2020) highlight that the NI (in its original or slightly modified version) has become the most frequently used method in CEO narcissism research with 23 out of 42 articles using the NI. However, despite its common use, several concerns come up when using the NI. For example, findings from a recent review and replication study question the reliability (or more precisely, the internal consistency) of the NI (Van Scotter, 2019). In our study, we focus on two important concerns that cast doubt on whether the NI truly reflects a narcissistic personality.
First, research has failed to provide a clear picture of the relationship between this measure and other established measures of narcissism (convergent validity), the most important of which is the predominant and extensively validated Narcissistic Personality Inventory (NPI; Emmons, 1984;Raskin & Hall, 1979, 1981); and has not tested its relationship to other personality constructs (discriminant validity) yet. This is all the more surprising since this lack of validation is constantly criticized (Blettner, 2012;Hill et al., 2014;Wales et al., 2013) and even recognized as a main limitation by scholars using the NI (Chatterjee & Hambrick, 2007, 2011Ingersoll et al., 2017;Tang et al., 2018;Zhu & Chen, 2015a). The first objective of our work is therefore to test the construct validity of Chatterjee and Hambrick's (2007) NI. We design an online simulation that allows for simultaneous assessment of the NI and the NPI.
In two different studies (Study 1 with employees from various occupations and Study 2 with a sample of managing directors), we test convergent validity by examining correlations between both measures. In addition, we test discriminant validity in Study 1 and compare the pattern of correlations with a set of frequently studied personality constructs (e.g., psychopathy and Machiavellianism).
Second, we argue that the NI is also affected by external factors such as company growth or performance and highlight the general concern that empirical studies investigating CEO characteristics as determinants of corporate decisions might be endogenous to (expected) outcomes (Antonakis et al., 2010;Bascle, 2008). In our studies, to exemplify the context, we test whether the NI is robust for a company's financial performance. In a counterbalanced within-subjects design, we observe whether manipulations in the financial performance (three levels in Study 1 and two levels in Study 2) influence the reactions of participants as displayed in their NI score (e.g., choice of the size of picture in the AR).
With our work, we directly address the limitation mentioned by Chatterjee and Hambrick (2007) that characteristics other than CEO narcissism might confound the NI and requests for further validation. The lack of empirical evidence that supports the construct validity of the NI and other unobtrusive measures is a serious concern because poor construct validity can make any other type of validity (statistical conclusion, internal, and external validity) dispensable. Thus, measure deficiency may produce a series of subsequent failures in these studies and lead to flawed theoretical and practical implications. Deriving the right conclusions is crucial since narcissism and similar attributes of leaders are important indicators of organizational behavior and outcomes and spark interest among academic scholars (Grijalva et al., 2015;O'Reilly et al., 2018;Simonet et al., 2018;Wong et al., 2017).

Narcissism and Its Measurement
The psychological construct of narcissism consists of two elements: a grandiose self-view and self-regulating strategies to uphold or inflate that view (e.g., Morf & Rhodewalt, 2001). The narcissists' grandiose self-view refers to an agentic and inflated sense of the own capabilities, such as intelligence and leadership ability (Gabriel et al., 1994;Judge et al., 2006). To maintain and further bolster their exaggerated self-view, narcissists swagger about their qualities, direct attention to the self (Buss & Chiodo, 1991), and seek applause and affirmation from outside (Morf & Rhodewalt, 2001;Wallace & Baumeister, 2002). They also show a need for power and to dominate other people (Bradlee & Emmons, 1992;Carroll, 1987).
Since Kets de Vries and Miller's (1985) research established the topic of narcissism in organizational contexts, narcissistic leaders have been studied extensively (W. K. Campbell et al., 2011;Hoffman et al., 2013;Rosenthal & Pittinsky, 2006). In empirical studies, the most common measurement of narcissism is the NPI (Emmons, 1984;Raskin & Hall, 1979, 1981. The NPI is a forced-choice questionnaire that in its original version was derived from the traits of narcissistic personality according to the Diagnostic and Statistical Manual of Mental Disorders-Third edition (American Psychiatric Association, 1980). As the first measure of subclinical narcissism, the NPI has significantly affected its conceptualization and dominates social psychology research on narcissistic traits (Cain et al., 2008). The NPI has also been applied to measure the narcissistic personality of leaders (Grijalva et al., 2015;Hoffman et al., 2013;Rode et al., 2012). Other self-reported measures of subclinical narcissism that have been used in a leadership context include the Bold scale of the Hogan Development Survey of Hogan and Hogan (2009) and a scale derived from the California Personality Inventory of Wink and Gough (1990;Grijalva et al., 2015). Bold scale of the Hogan Development Survey is part of a longer measure of personality and consists of 14 dichotomous items focusing on the agentic/extraverted components of narcissism. The California Personality Inventory narcissism measure consists of 49 dichotomous items and measure individuals' inflated self-views, authority, and attention seeking. However, the NPI in its original (e.g., Emmons, 1984;Raskin & Hall, 1979) or shorter versions (Ames et al., 2006) is "[by] far the most widely used measure of narcissism" (Grijalva et al., 2015, p. 7) and has been the only psychometric self-report assessment used in CEO research (Cragun et al., 2020).
Nonetheless, upper echelon researchers have mostly relied on unobtrusive measures of narcissism (Cragun et al., 2020) because top executives are disinclined to participate in surveys and scarcely allow asking about their personalities (Carpenter et al., 2004;Cycyota & Harrison, 2006). In an initial study, Chatterjee and Hambrick (2007) computed the NI by gathering information about CEOs from public documents: prominence of a CEO's photograph in the AR, prominence of a CEO's name in press releases, cash and noncash compensation relationship between a CEO and the second-highest paid executive, and CEOs' use of personal pronouns in interviews. The last indicator has been excluded from most studies (Chatterjee & Hambrick, 2011;Engelen et al., 2013;Gerstner et al., 2013;Tang et al., 2018;Zhu & Chen, 2015a, 2015b because its inclusion weakened internal reliability of the index, CEOs' interviews follow a more rigid format in the post-Sarbanes-Oxley era, or the number of available interviews was insufficient. In a few recent studies, the CEO's name in press releases has also been excluded and the coding of the CEO's photograph has been modified (Ingersoll et al., 2017;Marquez-Illescas et al., 2019). While these scholars reduced the number of indicators, others used the CEO narcissism score, a more composite measure consisting of 15 indicators adding, for instance, number of awards or acquisitions (Buchholz et al., 2019;Rijsenbilt & Commandeur, 2013). Table 1 contains an overview of the studies using the NI in its original or modified version. In all these studies, CEO narcissism has a direct or moderating effect on company decisions and outcomes. Chatterjee and Hambrick (2007) aligned items from Emmons' (1984) NPI to their indicators to illustrate how the NI reflects a narcissistic personality. For example, they argued that the NPI item "I like to look at myself in the mirror" can be translated into "I enjoy the visibility that comes with being CEO" and, thus, narcissistic CEOs would also enjoy more prominence in ARs and press releases (for more details, see Chatterjee & Hambrick, 2007, Table 1, p. 365).

Concerns About the Narcissism Index
Whereas their argument seems logical and provides some face validity, the NI might lack construct validity. Chatterjee and Hambrick (2007) and few other researchers using the NI include some kind of validity checks. Table 1 displays how (if at all) they address construct validity concerns. In sum, these studies may provide some evidence that the NI overlaps with the definition of narcissism. However, results rely on very small subsamples (10 to 39 CEOs) and third-party ratings that are mostly based on very short definitions and a 1-item measure of narcissism. In another study, Petrenko et al. (2016) used third-party ratings of video samples of CEOs to measure CEO narcissism and identified in one of their robustness checks a much weaker correlation between their measure of narcissism and the NI (r = .40, p < .001). Still no study to date tested the convergent validity between the NI and the self-reported NPI in a larger sample.
Furthermore, we are not aware of any study examining discriminant validity. Ruling out alternatives is important to determine which construct the measures exactly tap. For example, other work has used the same indicators as used in the NI to measure CEO self-importance, core-self evaluations, or dominance (Hambrick & D'Aveni, 1992;Hayward & Hambrick, 1997;Hiller & Hambrick, 2005). Hence, we question whether the NI would be highly correlated with other traits and measures constructs other than narcissism.

Research Question 1:
To what extent does the NI converge with narcissism as measured with the NPI (and diverge from other personality traits)?
In addition, relationships found between the NI and company outcomes might be subject to reversed causality, or omitted variable biases (contextual factors that influence both the NI, and the outcome variables such as acquisition decisions or risk taking). Chatterjee and Hambrick (2007) and few ensuing studies discussed similar issues and controlled at least for some potential antecedent and contemporaneous variables. Nonetheless, they can only diminish the problems, and other scholars have not corrected for endogeneity at all. In the following, we use the company's financial performance as an example to show that external stimuli might affect the NI and that CEOs may show higher NI values without having a narcissistic disposition.
Individuals are inclined to reinforce the positivity of their self-concept by self-serving biases in the attribution of causality (Bradley, 1978;Miller & Ross, 1975). Furthermore, individuals use impression management (Baumeister, 1982) and self-present to others in a specific manner revealing only some aspects, but concealing others (Goffmann, 1959). In managerial contexts, executives attribute the cause of success to themselves and name their own capabilities when explaining good outcomes, but renounce responsibility and name external causes when explaining bad outcomes (Bettman & Weitz, 1983;Clapham & Schwenk, 1991;Salancik & Meindl, 1984). As a consequence, we expect CEOs to present themselves in a more prominent manner following good years (e.g., they choose a larger picture of them alone). Of course, other factors such as firm size, innovativeness, or risk may have similar effects. However, for the ease of readability and interpretation, we limit our argumentation and empirical study to financial performance and discuss other factors in the future research section in more detail. Based on these considerations, we question whether the NI is disturbed by contextual factors.

Research Question 2: Does context influence the NI?
We test both questions in two studies with different samples.

Sample and Procedure
We conducted an online experiment on Amazon Mechanical Turk (MTurk) with 601 individuals (mean age = 34.02, 47% female, 53.7% with bachelor's degree or higher, 69.7% with work experience of 6 years or more). The study was divided into two parts and participation took about 25 minutes. First, we used a simulation to rebuild the NI. Therefore, we developed a cover story where participants were put into the position of CEO of one of the 500 largest U.S. companies. Participants then made decisions that capture the NI indicators over a period of 3 years. Each year started with a background story about the company's financials in the preceding year, which was high, medium, or low. All participants were confronted with these three manipulations (within subjects). To avoid order or learning effects, the   order varied randomly between participants. In the second part, participants completed self-descriptive personality scales such as the NPI, provided demographic information, and answered questions that we used as manipulation checks.

Measures
We gathered (in the order described below) and computed the NI according to Chatterjee and Hambrick (2007) as follows. The variable picture reconstructed the indicator for prominence of CEO photographs in companies' ARs: Chatterjee and Hambrick (2007) expected highly narcissistic CEOs to be more prominent in the AR because they want to express their vanity and declare to be more important than others. By sifting through ARs, Chatterjee and Hambrick (2007) assigned points as follows: four points if the CEO was alone on the photograph and the picture was larger than half a page; three points if the picture was of the CEO alone and smaller than half a page; two points if the CEO was photographed with (one or more) other executives; and one if there was no picture of the CEO. In our simulation, the head of the public relations department presented different alternatives of pictures that displayed silhouettes of the CEOs alone or with their executive team. Participants selected their choice for the AR (including the alternative not to include a picture). In the next step, they chose the size of the selected picture (smaller than half a page, about half a page or larger). We assigned 1 point (no picture), 2 points (CEO with other executives), 3 points (CEO alone and smaller than half a page), and 4 points (alone and larger than half a page) for that variable following Chatterjee and Hambrick (2007). Pay emulated the indicators for CEOs' relative cash and noncash compensation: Chatterjee and Hambrick (2007) argued that narcissistic CEOs see themselves as more precious than their colleagues and, thus, demonstrate their higher value in their own payment in proportion to the pay of other executives. They distinguished between relative cash and noncash compensation and measured a CEO's cash pay as cash salary and bonus divided by the same cash components of the second-highest paid executive in the firm. Chatterjee and Hambrick (2007) applied the same logic for relative noncash compensation and calculated this measure as deferred income, stock grants, and options (with Black-Scholes valuation) divided by the noncash compensation of the second highest paid executive in the firm. In our study, we informed participants during a simulated meeting with a member of the compensation committee about the average total compensation per year for CEOs and second highest paid executives in comparable firms. Then, they had to indicate how much they as CEO and how much their second highest paid executive should receive. We calculated pay as the compensation CEOs assigned to themselves divided by that to the second highest paid executive.
The variable press replicated the CEO's prominence in the company's press releases: Chatterjee and Hambrick (2007) expected narcissistic CEOs to push their name as often as possible in press releases in order to express their vanity and authority. To measure this dimension, they counted how many times a CEO was named in press releases and divided the number by the amount of words (in thousands) in all the firm's press releases. In our simulation, participants received an e-mail from the head of the public relations department with six drafts for different releases, each consisting of a heading, a short message, and three text modules from which participants chose one module to include in the release: One module always contained a quotation from the CEO; one module consisted of a quotation from another person; and in one module nobody was named but neutral information was provided. To build the variable press, we summed the number of releases that contained the CEOs' reference.
We did not include CEOs' use of first-person singular pronouns in interviews as this indicator was excluded from almost all subsequent studies. We computed the NI in line with Chatterjee and Hambrick (2007) as the simple mean of the three standardized variables. Interitem correlations (displayed in Table 2) are low. Accordingly, this resulted in a low Cronbach's alpha of .17 which is in line with the reliability estimates of many NI studies as indicated by Van Scotter (2019).
We measured self-rated narcissism with the NPI by Emmons (1984), which consists of 37 dyadic items within four factors: leadership/authority, self-absorption/selfadmiration, superiority/arrogance, and exploitativeness/ entitlement. Sample dyadic items are "I insist on getting the respect that is due me" vs. "I usually get the respect that I deserve" (for exploitativeness/entitlement). Participants selected the statement from each pair that best described themselves. We summed the responses to form a composite NPI score that can thus range from 0 to 37 with higher scores indicating higher levels of narcissism.
For the nomological network, we included concise measurements of related constructs that have proven important in prior studies. We measured self-esteem (Rosenberg Self-Esteem Scale [RSE]) with Rosenberg's (1965) 10 items. We used Gosling's et al. (2003) Ten-Item Personality Inventory (TIPI) to measure the Big-Five personality dimensions (McCrae & Costa, 1987): extraversion, agreeableness, conscientiousness, emotional stability, and openness to experience. As narcissism is part of the "Dark Triad" (Paulhus & Williams, 2002), we included the four items for each of the other two dimensions, Machiavellianism and psychopathy, of the Dark Triad Dirty Dozen from Jonason and Webster (2010). We used 7-point Likert-type scales ranging from 1 (disagree strongly) to 7 (agree strongly) for all scales.
We controlled for social desirability using Reynolds' (1982) 11-item short and psychometrically sound form of the social desirability scale (Crowne & Marlowe, 1960). We again used 7-point Likert-type scales. We also controlled for the following demographic variables: highest achieved education level, years of work experience, gender, U.S. nationality, and age. Furthermore, we included the variable random that should theoretically be correlated with neither the NPI nor the NI to disguise the purpose of the experiment and to diminish concerns about common method bias (Podsakoff et al., 2003). In our first part of the experiment after participants choose their photograph for the AR, they had the chance to pick an arbitrary number from a variety of pictures (e.g., showing workers, products, or sustainable resources) to be included. The variable random was calculated as the total number of pictures chosen.
In addition, we included manipulation checks. In the end of the survey, we asked whether participants identified with their role as CEO using a 7-point scale from 1 (not at all) to 7 (very much). We further assessed how participants perceived the financial situation in the respective years on a scale from 1 (poor) to 7 (excellent). Complete materials of the study are available from the first author on request.

Analytical Strategy
The first goal of the analysis was to assess the convergent and discriminant validity of the NI (Research Question 1). To do so, we followed D. T. Campbell and Fiske's (1959) multitrait-multimethod criteria and placed the NI and the NPI into the nomological network of narcissism. The second aim of our analyses was to corroborate whether differences exist in the NI between the three financial situations (Research Question 2). Therefore, repeated measures analysis of variance (ANOVA) was appropriate.

Results
Manipulation Checks. We found that on average participants identified with their role as CEO, with a mean of 5.26 on a 7-point scale. A repeated-measures ANOVA conducted on the manipulation checks for the three conditions yielded statistically significant within-subject effects, F(1.59, 954.12) = 2088.32, p < .001, η p ² = .78). Planned comparisons were all statistically significant at the 5% level with means (of the perceived financial performance) of 6.46 in the good, 4.99 in the medium, and 2.25 in the low financial performance condition.

Convergent and Discriminant
Validity. To answer Research Question 1, we followed D. T. Campbell and Fiske's (1959) four criteria to test the construct validity of the NI. In these analyses, we averaged and standardized the NI across the three conditions to yield an overall NI for each individual neglecting the financial performance. First, we tested convergent validity while correlating the NI and the single components with the NPI and its four factors. The correlations among these variables are displayed in Table 2. The analysis revealed a positive but only moderate correlation between the NPI and the NI (r = .26, p < .01). Correlations between the NPI and the single components of the NI were even smaller (between r = .13 and r = .19). The single factors of the NPI were also not more highly correlated with the NI or its single components. Thus, we cannot confirm convergent validity, which would require a sufficiently large correlation between the NPI and the NI. Nor did we find single indicators that might be better estimators for the NPI.
Second, we examined discriminant validity by looking at the pattern of correlations in the nomological network (Cronbach & Meehl, 1955). Table 3 displays correlations among the NI, NPI, and the other personality scales and controls. According to the second and third criteria, the monotrait-heteromethod correlation (NI and NPI), should Note. Person-level data (n = 601). NPI = Narcissistic Personality Inventory; NI = Narcissism Index. a NI is the sum of the three standardizes variables picture, pay, and press. b Picture, pay, and press are averaged across the three situations. *p < .05. **p < .01.  be higher than all heterotrait-heteromethod (e.g., NI and RSE) and heterotrait-monomethod (e.g., NPI and RSE) correlations. We found that the correlation between the NI and the NPI exceeded all the heterotrait-heteromethod correlations. However, we found higher correlations among the NPI and some other personality dimensions measured with the same method such as extraversion, openness to experience, Machiavellianism, and psychopathy. Thus, the third criterion was not fulfilled.
The fourth criterion from a construct validation perspective is that the pattern of correlations should be similar across methods. In our setting (see Table 3), the NI exhibited patterns of correlations that matched only slightly those obtained for the validated NPI. Significant correlations between the two narcissism measures and the personality dimensions diverged substantially (Δr > .10). Thus, we conclude from the pattern of correlations that discriminant validity of the NI is not supported by our results.
To test whether our findings were driven by common method bias (Podsakoff et al., 2003) and to provide another test of discriminant validity, we included a variable in our experiment (random) that theoretically should not correlate with either of these measures. As expected, participants' choice of random pictures was not related to the NI or the NPI.
In sum, we did not find a consistent pattern of convergent and discriminant validity in our data, with the validity correlation (NPI and NI) being stronger than the other correlations. In contrast, we found a stronger overlap between the NPI and other personality scales.

Effects of Financial
Performance. To answer Research Question 2, we investigate differences in the NI among the three financial conditions within individuals in that we computed three NI scores (one for each year/condition) for each CEO. We then compared the mean NI scores across the three conditions using a repeated-measures ANOVA. Table 4 displays means and standard deviations of the NIs for each financial performance condition. We found significant differences in the test of within-subjects' effects, F(1.93, 1159.30) = 14.38, p < .001, η p ² = .02. Pairwise comparisons with Bonferroni adjustment showed that mean differences among all situations were significant at the 5% level. That is, the NI was higher (lower) when the financial performance was high (low) compared with neutral, leading to the implication that context influences the NI.

Robustness of Results and Further Analyses.
Demand characteristics due to the repeated measures design could cause the context effects found in our main analysis. To mitigate this concern, we neglected the repeated measures in an additional analysis where we took only the first year into account. We yielded qualitatively similar results. In addition, we found no support for the order of the manipulations influencing our results. Furthermore, we ran several multivariate analyses to model the effect of the financial performance on the NI within individuals (tables are not displayed, but analyses are available on request). We regressed NI on NPI, financial performance and other control variables. In a two-stage multilevel model taking, the repeated measures of the NI into account, the NPI remained positive, but only moderately related to the NI. The results provide further support for NI being affected by the financial situation. We achieved similar results when we excluded all outliers or participants who provided incongruous answers in the manipulation checks.

Sample and Procedure
The aim of Study 2 was to replicate the main findings from Study 1 in a sample of CEOs and managing directors. We sent an e-mail invitation to personal contacts and contacts of colleagues and friends as well as recruited managing directors and CEOs, mainly from small and medium sized companies, via a social network platform for business contacts. In total, we were able to use a sample of 97 managing directors and CEOs (20.6% female, 80.4% bachelor degree or higher, 67% with experience as a director of a company of more than 6 years) who lead companies of different size (37.1% with up to 10 employees, 34% with 11-50 employees, 14.5% with 51-250 employees, 14.4% with 251 up to 22,500 employees) and in different industries (e.g., automotive, manufacturing, wholesale, finance, and services) in Germany.
We designed the experiment as a brief version of the experiment in Study 1 to increase the likelihood of participation. Participation took about 15 minutes. We translated the first part of Study 1 into German and made some minor changes with regard to the German context. The procedure and tasks remained equal except for the reductions mentioned below. In the second part, we only measured selfrated narcissism with the NPI and included questions that we used for manipulation checks and demographic information to keep process time short.

Measures
We computed the NI following the same approach as applied in Study 1 with two time-saving modifications: we collected the NI and its single indicators (picture, pay, and press) in 2 years, with a high and a low financial performance condition, only. We again randomly varied the order between participants. The variable press consisted of three instead of six releases each year. Everything else was equal to Study 1. Interitem correlations (displayed in Table 5) are quite low or negative. Thus, we also found a negative Cronbach's alpha of −.39 that is comparable to the alphas found in three replicated CEO NI samples of Van Scotter (2019).
We used a German translation of the NPI of Emmons (1984) from Schütz et al. (2004) that has good internal consistency and satisfactory convergent and discriminant validity . Furthermore, we included the same manipulation checks as in Study 1. Complete materials of the study are available from the first author on request.

Analytical Strategy
The goal of the analyses was to replicate the main results from Study 1. First, we intended to measure convergence of the NI with the NPI (Research Question 1). Second, we aimed to test whether the financial performance influenced the NI (Research Question 2). Therefore, we ran a paired samples t test to account for the two within-subject manipulations analogous to the approach in Study 1.

Results
Manipulation Checks. We tested whether our manipulation was effective using a paired samples t test. Participants perceived the performance significantly better in the high (M = 5.48, SD = 1.01) than in the low financial condition (M = 3.40, SD = 1.28, t = 12.07, p < .001).
Convergent Validity. As in Study 1, we averaged and standardized the NI across the two financial conditions to yield an average NI for each individual. To answer our Research Question 1, we tested convergent validity with correlations that are displayed in Table 5. The correlation between the NPI and the NI were positive but even smaller than in the MTurk sample and not significant (r = .15, n.s.). The correlations between the NPI and the single components of the NI revealed a slightly higher and significant correlation for the variable pay (r = .21, p < .05) and smaller or even slightly negative correlations for the other two components. The single factors of the NPI were also not higher correlated with the NI. The highest correlation between self-sufficiency/self-admiration and pay was also only of a small to moderate magnitude (r = .26, p < .05). Thus, these results cannot confirm convergent validity, a finding that is in line with the results of Study 1.
Effects of Financial Performance. We next compared the effects of the two financial situations on the NI with a paired samples t test to provide further evidence for Research Question 2. Table 6 displays means and standard deviations of the NIs for the two financial conditions. We found that the mean of the NI was significantly lower in the high (M = −0.08, SD = 0.50) than in the low performance condition (M = 0.08, SD = 0.56, t = −2.27, p < .05). This shows Note. Person-level data (n = 97). NPI = Narcissistic Personality Inventory; NI = Narcissism index. a NI is the sum of the three standardizes variables picture, pay, and press. b Picture, pay, and press are averaged across the two situations. *p < .05. **p < .01. again that the NI reflects the context, although this effect was in the opposite direction to what we have hypothesized and found in Study 1.
To lower concerns that demand characteristics created this result, we again took only the first year of each CEO into account and tested whether the manipulation had different effects between participants. An independent samples t test showed that on average the NI in the first year was significantly lower for participants who were confronted with a high performance than for those who were confronted with a low financial performance (results are not displayed).

Discussion of Results
Our work directly addresses Chatterjee and Hambrick's (2007) call to assess the correlation between the NI and Emmons' (1984) NPI. The results of our two studies indicate that the decisions that form the NI are positively related to the NPI. Nonetheless, we cannot establish strong convergent validity since we find in two different samples that the correlations between the two measures of narcissism are only small to moderately high (r = .26 in a sample with participants from various occupations and r = .15 in a sample of managing directors and CEOs). Nor did we find single indicators (picture, pay, press) to be better measures. The results also indicate that the NI reflects various factors both endogenous (e.g., psychopathy) and exogenous to the CEO (here financial performance).
Consistent with previous work, our results exemplify problems about the measurement of CEO personality and invalid inferences (Hollenbeck et al., 2006;Pitcher et al., 2000). We question the validity of the NI and find that the NI is only weakly related to the common and validated NPI. This extends results from Van Scotter (2019), who found that the reliability estimates of the NI are mostly insufficient. Furthermore, we find evidence that context indeed affects the NI. Interestingly, the effect of financial performance on the NI differs in our two studies. When the simulated financial performance is high, the NI is significantly higher in the MTurk sample with participants mostly from the United States (Study 1) and significantly lower in the sample with managing directors and CEOs from Germany (Study 2). In the theoretical part of our manuscript, we assumed that the NI is higher following good financial performance based on self-serving biases (Bradley, 1978;Miller & Ross, 1975). However, we found the opposite in Study 2. That is, managing directors and CEOs in Germany are more inclined to give priority to their executive team members when performance was high, but were more likely to refer to themselves when performance was low. This result is surprising and we can only speculate on its causes. First, cultural differences might affect how participants react to their company's performance. According to GLOBE, institutional collectivism is valued more in Germany than in the United States, which indicates that group loyalty and group cohesion are more encouraged in the German culture (House et al., 2004). Consequently, German participants might feel stronger obliged to accept part of the blame for low company performance. Moreover, prior cross-cultural research has indicated some differences in the use of impression management practices between countries (Bolino et al., 2016). In line with our findings, self-enhancement and emphasizing individual excellence have been sown to be rather common in the United States, but relatively rare in Germany (Bye et al., 2011;Sandal et al., 2014). It has been reasoned that individuals within societies with lower economic inequality (such as Germany) are less dependent on self-presentation tactics (Sandal et al., 2014). Moreover, Germany compared with the United States scores higher (lower) on Schwartz's (2006) cultural value orientations of autonomy (vs. embeddedness), harmony (vs. mastery), and egalitarianism (vs. hierarchy; Schwartz, 2008) that have all been proposed to be negatively related to emphasizing individual excellence (Sandal et al., 2014). Second, the two samples also differed with regard to leadership experience. Managing directors and CEOs, who were surveyed in Study 2, might have a better understanding of team dynamics that arise when they claim success, but blame others for failures. In addition, CEOs might also possess higher self-monitoring capacities to control expressive behavior and self-presentation oriented toward social appropriateness (Snyder, 1974;Sosik et al., 2002). Overall, we could not disentangle in our studies, how the context affects the NI, but both results indicate that context matters. This finding is in line with Cragun et al. (2020), who provide further indications that the NI might be influenced by context. We propose to carefully interpret the findings of studies using the NI to measure CEO narcissism and to be cautious when drawing the conclusion that the strategies and outcomes investigated result from a narcissistic personality. Our results have far-reaching implications since the NI has become the most prominent measure in CEO narcissism research in the management literature (Cragun et al., 2020) and has recently also sparked interest among other research areas such as marketing or accounting (Kashmiri et al., 2017;Olsen et al., 2014). Our work calls for attention to other studies using unobtrusive measures. If researchers use weak (unobtrusive) measures with low construct validity, their work, even if thoroughly conducted, may exhibit low statistical conclusion validity and, thus, low internal and external validity. Wrong conclusions due to weak measures may further lead to deficient practical implications (e.g., for selection and placement of CEOs) and distort theoretical implications and frameworks that rely on findings from studies using the NI and other unobtrusive measures. This is all the more relevant since research on CEO personality in general (Araujo-Cabrera et al., 2017;Colbert et al., 2014;Wong et al., 2017), and narcissism in particular (Gupta & Misangyi, 2018;O'Reilly et al., 2018;Zhang et al., 2017), continues to engender interest among researchers. Our findings also cast doubt on previous studies that showed that the NI and similar measures predict outcomes such as financial performance as it reverses the common assumption of cause and effect. Looking at the CEO studies using the NI included in the meta-analysis of Cragun et al. (2020), we identified that only half (12 out of 23) of the manuscripts discussed endogeneity concerns or controlled for some potential antecedent and contemporaneous variables to reduce these concerns. While the metaanalysis of Cragun et al. (2020) provides indications that the NI is influenced by firm size, our study demonstrates that financial performance affects the NI. Taken together, our findings highlight that endogeneity is a serious problem in these studies that should not be neglected, but needs to be addressed accordingly. We encourage future work to consider and critically reflect on the problems of reversed causality and omitted variable biases. Before making causal claims, researchers need to thoroughly identify the source of endogeneity and follow the recommendations provided in the literature (e.g., Antonakis et al., 2010;Bascle, 2008).
Although we illustrate concerns about the NI, our intention is not to discourage researchers' pursuit of using or developing unobtrusive measures. Nor do we intend to reflect poorly on studies investigating CEO narcissism. We acknowledge that Chatterjee and Hambrick (2007) who were very open about the limitations of the NI provide a solid theoretical framework for how CEO narcissism may affect organizational outcomes and a starting point to address the difficulty in gathering personality data from CEOs. However, we believe that their study should not encourage researchers to use the NI in its current version or with slight modifications. Rather, we derive important implications for future research. We call for more and proper validity tests that include convergent and discriminant validities before using unobtrusive measures that have not been thoroughly validated. Most studies only provide theoretical arguments for why their measures should relate to the construct. While they are mostly logical, they do not provide justification for the use of these measures. Only few studies contain convergent validity tests, for example, for indicators of power (Finkelstein, 1992) and political ideologies (Chin et al., 2013). Assessments should, in addition, comprise different constructs that are relevant in the specific context. Future studies can transfer our approach to other personality measures and include similar simulations.
When developing a new unobtrusive measure, researchers should make sure that all relevant aspects of the concept are reflected in the measure, and that the measure is not contaminated by components that are not reflected by the concept. Chatterjee and Hambrick (2007) make use of publicly available documents that could offer comprehensive data of exclusive individuals, if observed properly. Considering that the information sources used are usually produced for many different purposes, the NI or other unobtrusive measures should not only be the result of a simple categorical algorithm applied by sifting through these public documents (e.g., size of picture, or word count). They rather need to be thoroughly analyzed. If self-ratings are not available, including a huge variety of sources that also show how individuals interact with others and how they react to criticism (e.g., public speeches, TV shows, and radio interviews) would provide deeper insights as these are important aspects of personality and especially narcissism.
Other researchers on CEO narcissism have used approaches which seem to be more promising. While some authors were able to collect NPI data from CEOs (Peterson et al., 2012;Reina et al., 2014;Wales et al., 2013;Zhang et al., 2017), other authors asked current employees as internal informants to rate their CEOs narcissistic personality (O'Reilly et al., 2014;O'Reilly et al., 2018). Recent work also used video-metric approaches (Gupta & Misangyi, 2018;. For example, Petrenko et al. (2016) used the NPI, but instead of self-ratings, relied on expert ratings of video samples of CEOs. Resick et al. (2009) used a more complex approach where assessors evaluated and rated 75 CEOs of Major League Baseball organizations on narcissism using very comprehensive biographical information packets (including direct quotes from the CEOs). Thus, we are optimistic that there are opportunities to capture narcissism and other personality constructs of CEOs.

Limitations and Future Research
When interpreting our results, there are several restrictions that must be considered. We did not run the experiment with CEOs of large U.S. companies, who we doubt would participate in this kind of study. In Study 1, the platform MTurk is a source of reliable and high-quality data (Buhrmester et al., 2011;Sprouse, 2011) that allowed us to test our concerns in a large sample and to include several measures which would have reduced the likelihood of participation in the sample of CEOs. Previous research relied on similar samples to validate scales that were later successfully used for samples of CEOs (Carpenter & Golden, 1997;Peterson et al., 2012;Resick et al., 2009). In Study 2, we used a sample with managing directors and CEOs that is closer to the sample used by Chatterjee and Hambrick (2007). The results of both studies indicate that our main conclusions are robust across samples.
We collected the NI in a simulated setting rather than gathering participants' real-life decisions. Therefore, we translated the unobtrusive measures into a controlled experimental setting following past research (e.g., Carey et al., 2015). In line with previous work (Fazio et al., 1995;Paulhus et al., 2003), the NI in our study serves as an "unobtrusive" measure only in the sense that it indicates an indirect measurement (i.e., the true purpose of the measure is covert), while interfering with the subjects under study. Our experimental design was essential to show the cause-andeffect relationship (financial performance influences the NI) within subjects in a controlled setting. In addition, our approach guaranteed participants to be treated anonymously. Since risks of dishonest responses and experimenter effects are very low in such kind of studies (Sprouse, 2011), we expect participants to provide their true NPI responses and that their decisions in our controlled setting does not differ substantially from real behavior. Nonetheless, we acknowledge that it would be desirable to conduct a study with CEOs in their actual settings.
Demand characteristics due to our repeated measures design could have created the effects of the financial performance on the NI. However, we yield qualitatively same results in both studies, when we neglect the repeated measures and take only the first year of each participant into account. This lowers concerns that demand characteristics influence our results.
In our studies, we focused on financial performance as a contextual variable. However, there are many other potentially confounding variables of the NI. For example, firm size has been shown to be significantly related to the NI (Cragun et al., 2020) and is also related to many of the outcomes under study. Additionally, CEOs might tend to choose larger pictures of themselves and stress their importance in press releases (i.e. the dimensions picture and press in the NI), the more they have been criticized in the media or when they need to cumulate power to enforce major change projects. In a similar vein, the postulated effects of the NI on innovation and growth might be confounded as innovations may increase the likelihood that CEOs are (by choice) more often or more prominent in the news (press dimension of the NI). Furthermore, CEOs might choose higher relative cash and noncash compensation (pay dimension in the NI) when their tenure ends in the foreseeable future or when they are opposed to greater risk, for instances. Our approach might be a fruitful avenue for future research to assess how these factors may affect the NI. For example, future studies may use our experimental approach and create different (e.g., high vs. low risk) scenarios. Looking at field data and investigating whether these variables are antecedents of the NI would also be an interesting possibility for future endeavors.

Conclusion
Overall, our results cast doubt on the use of unobtrusive measures, but we do not call for abandoning these measures from organizational research. Rather, we acknowledge that unobtrusive measures can be valuable in many settings. However, they offer a fruitful avenue for further studies only if they adequately gauge the proposed construct. We call for the inclusion of validity tests, make suggestions for the development of unobtrusive measures and for measuring CEO narcissism.