This statement is an Expression of Concern regarding the article “Is It Light or Dark? Recalling Moral Behavior Changes Perception of Brightness” (Banerjee et al., 2012). The Expression of Concern is prompted by the observation that some of the mean values reported in the article are mathematically impossible, given the manner in which the data are reported to have been collected. The authors of the article attempted to locate the original data in an effort to resolve the errors, but they were unsuccessful. Because the errors cannot be resolved, we decided to issue an Expression of Concern about the confidence that can be held in the reported results. The corresponding author on the article, Promothesh Chatterjee, was invited to contribute to the Expression of Concern as a coauthor but declined the invitation.
In the two studies in this article, participants judged the brightness of the room in which they were tested, and in Study 2, they also indicated their preferences for three light-producing products: a lamp, a candle, and a flashlight. In Study 1, brightness was reported to have been judged on a 7-point scale (1 = low, 7 = high). In Study 2, product preferences were reported to have been judged on a 7-point scale (1 = low, 7 = high), and brightness was reported to have been judged in watts (whether on an integer or a continuous scale is not specified in the article). In both studies, the critical manipulation (between subjects) was that before making the judgments, some participants were tasked with retrieving a memory of a time they behaved in an ethical manner, and other participants were tasked with retrieving a memory of a time they behaved in an unethical manner. The expectations were that participants in the ethical condition would judge the testing room to be brighter and that participants in the unethical condition would express stronger preference for light-producing products.
The errors in the data reported in the article were brought to the attention of the Editor in Chief by Aaron Charlton, who conducted a statistical check known as a test for granularity-related inconsistency of means (GRIM; Brown & Heathers, 2017). The GRIM test assesses whether reported statistics could have been created from a data set. In the present case, the question was whether the reported mean values are mathematically possible given (a) the number of participants in the studies and (b) the reported scale of measurement of the variables. Charlton observed that on the basis of his calculations, five of the 10 means reported in the article were mathematically inconsistent.
On receipt of Charlton’s message of concern, the Editor in Chief recruited the assistance of the coauthor of this statement, Gregory Francis, one of the Statistical Advisers for Psychological Science. Francis conducted an independent analysis of the values reported in the article. He concluded that given the reported sample sizes and scale of measurement, there are two means reported in the article that are mathematically impossible. The other inconsistencies noted by Charlton can be resolved with assumptions about how participants were apportioned to conditions. The results of this analysis are provided in Table 1.
|
Table 1. A Summary of the Granularity-Related Inconsistency of Means Analyses

The article reported that a total of 40 participants in Study 1 judged the brightness of a room on a 7-point scale (1 = low, 7 = high). The number of participants in the ethical and unethical conditions is not specified in the article, but the sample sizes must sum to 40. As indicated in the first two rows of Table 1, if it is assumed that equal numbers of participants were assigned to each condition, then the mean value for the unethical condition is mathematically impossible (this result was reported by Charlton) because there is no integer sum of 20 scores that will produce the reported mean. This particular inconsistency can be resolved by supposing that the sample sizes were 23 and 17 participants in the ethical and unethical conditions, respectively (Rows 3 and 4 of Table 1).
The analysis is more complicated in Study 2, in which a total of 74 participants took part. The article specifies that preferences for light-producing products were judged on a 7-point scale (1 = low, 7 = high). If we assume equal numbers of participants were in the ethical and unethical conditions (n = 37 per condition), then application of the GRIM test yields three means that are mathematically impossible (see Table 1). The number of inconsistencies reduces to two if the sample sizes are 35 participants for the ethical condition and 39 participants for the unethical condition, but an exhaustive search of all possible sample-size assignments reveals that the six reported means always produce at least two GRIM inconsistencies. We did not apply the GRIM test to the reported means of judged room brightness for Study 2 because it is not clear that the measurements were based on integer values.
In summary, if the scale of measurement involves integers, as described in the article, then the GRIM test reveals errors in some of the reported mean values. Either the reported numbers are incorrect, the reported sample sizes are incorrect, or the description of measurement is incorrect. The problems could reflect simple mistakes in reporting (e.g., typos or missing data), but because the original data for this article cannot be located, the errors cannot be corrected. With this in mind, the Editor in Chief decided not to change the official publication record of the article through a Corrigendum. Instead, we issue this Expression of Concern and note that the errors undermine confidence in these data and the conclusions drawn from them.
—Editor in Chief
Gregory Francis
—Statistical Adviser
References
|
Banerjee, P., Chatterjee, P., Sinha, J. (2012). Is it light or dark? Recalling moral behavior changes perception of brightness. Psychological Science, 23(4), 407–409. https://doi.org/10.1177/0956797611432497 Google Scholar | |
|
Brown, N. J. L., Heathers, J. A. J. (2017). The GRIM test: A simple technique detects numerous anomalies in the reporting of results in psychology. Social Psychological and Personality Science, 8(4), 363–369. https://doi.org/10.1177/1948550616673876 Google Scholar |
