Psychological science relies on behavioral measures to assess cognitive processing; however, the field has not yet developed a tradition of routinely examining the reliability of these behavioral measures. Reliable measures are essential to draw robust inferences from statistical analyses, and subpar reliability has severe implications for measures’ validity and interpretation. Without examining and reporting the reliability of measurements used in an analysis, it is nearly impossible to ascertain whether results are robust or have arisen largely from measurement error. In this article, we propose that researchers adopt a standard practice of estimating and reporting the reliability of behavioral assessments of cognitive processing. We illustrate the need for this practice using an example from experimental psychopathology, the dot-probe task, although we argue that reporting reliability is relevant across fields (e.g., social cognition and cognitive psychology). We explore several implications of low measurement reliability and the detrimental impact that failure to assess measurement reliability has on interpretability and comparison of results and therefore research quality. We argue that researchers in the field of cognition need to report measurement reliability as routine practice so that more reliable assessment tools can be developed. To provide some guidance on estimating and reporting reliability, we describe the use of bootstrapped split-half estimation and intraclass correlation coefficients to estimate internal consistency and test-retest reliability, respectively. For future researchers to build upon current results, it is imperative that all researchers provide psychometric information sufficient for estimating the accuracy of inferences and informing further development of cognitive-behavioral assessments.

Appelbaum, M., Cooper, H., Kline, R. B., Mayo-Wilson, E., Nezu, A. M., Rao, S. M. (2018). Journal article reporting standards for quantitative research in psychology: The APA Publications and Communications Board task force report. American Psychologist, 73, 325. doi:10.1037/amp0000191
Google Scholar | Crossref
Aust, F., Barth, M. (2018). papaja: Prepare APA journal articles with R Markdown (R package Version 0.1.0.9842) [Computer software]. Retrieved from https://github.com/crsh/papaja
Google Scholar
Bar-Haim, Y., Holoshitz, Y., Eldar, S., Frenkel, T. I., Muller, D., Charney, D. S., . . . Wald, I. (2010). Life-threatening danger and suppression of attention bias to threat. American Journal of Psychiatry, 167, 694698. doi:10.1176/appi.ajp.2009.09070956
Google Scholar | Crossref | ISI
Barry, A. E., Chaney, B., Piazza-Gardner, A. K., Chavarria, E. A. (2014). Validity and reliability reporting practices in the field of health education and behavior: A review of seven journals. Health Education & Behavior, 41, 1218. doi:10.1177/1090198113483139
Google Scholar | SAGE Journals | ISI
Borsboom, D., Kievit, R. A., Cervone, D., Hood, S. B. (2009). The two disciplines of scientific psychology, or: The disunity of psychology as a working hypothesis. In Valsiner, J., Molenaar, P. C. M., Lyra, M. C. D. P., Chaudhary, N. (Eds.), Dynamic process methodology in the social and developmental sciences (pp. 6797). New York, NY: Springer.
Google Scholar | Crossref
Brown, H. M., Eley, T. C., Broeren, S., MacLeod, C., Rinck, M., Hadwin, J. A., Lester, K. J. (2014). Psychometric properties of reaction time based experimental paradigms measuring anxiety-related information-processing biases in children. Journal of Anxiety Disorders, 28, 97107. doi:10.1016/j.janxdis.2013.11.004
Google Scholar | Crossref
Brown, W. (1910). Some experimental results in the correlation of mental health abilities. British Journal of Psychology, 1904-1920, 3, 296322. doi:10.1111/j.2044-8295.1910.tb00207.x
Google Scholar | Crossref
Button, K., Lewis, G., Penton-Voak, I., Munafò, M. (2013). Social anxiety is associated with general but not specific biases in emotion recognition. Psychiatry Research, 210, 199207. doi:10.1016/j.psychres.2013.06.005
Google Scholar | Crossref
Champely, S. (2018). Pwr: Basic functions for power analysis (R package Version 1.2-2) [Computer software]. Retrieved from https://CRAN.R-project.org/package=pwr
Google Scholar
Cisler, J. M., Koster, E. H. W. (2010). Mechanisms of attentional biases towards threat in anxiety disorders: An integrative review. Clinical Psychology Review, 30, 203216. doi:10.1016/j.cpr.2009.11.003
Google Scholar | Crossref | ISI
Cooper, S. R., Gonthier, C., Barch, D. M., Braver, T. S. (2017). The role of psychometrics in individual differences research in cognition: A case study of the AX-CPT. Frontiers in Psychology, 8, Article 1482. doi:10.3389/fpsyg.2017.01482
Google Scholar | Crossref
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297334.
Google Scholar | Crossref
Cronbach, L. J. (1957). The two disciplines of scientific psychology. American Psychologist, 12, 671684.
Google Scholar | Crossref | ISI
Cronbach, L. J. (1975). Beyond the two disciplines of scientific psychology. American Psychologist, 30, 116127. doi:10.1037/h0076829
Google Scholar | Crossref | ISI
Cronbach, L. J., Hartmann, W. (1954). A note on negative reliabilities. Educational and Psychological Measurement, 14, 342346. doi:10.1177/001316445401400213
Google Scholar | SAGE Journals | ISI
De Schryver, M., Hughes, S., Rosseel, Y., De Houwer, J. (2016). Unreliable yet still replicable: A comment on LeBel and Paunonen (2011). Frontiers in Psychology, 6, Article 2039. doi:10.3389/fpsyg.2015.02039
Google Scholar | Crossref
Dunn, T. J., Baguley, T., Brunsden, V. (2014). From alpha to omega: A practical solution to the pervasive problem of internal consistency estimation. British Journal of Psychology, 105, 399412. doi:10.1111/bjop.12046
Google Scholar | Crossref | ISI
Enock, P. M., Hofmann, S. G., McNally, R. J. (2014). Attention bias modification training via smartphone to reduce social anxiety: A randomized, controlled multi-session experiment. Cognitive Therapy and Research, 38, 200216. doi:10.1007/s10608-014-9606-z
Google Scholar | Crossref | ISI
Enock, P. M., Robinaugh, D. J., Reese, H. E., McNally, R. J. (2012, November). Improved reliability estimation and psychometrics of the dot-probe paradigm on smartphones and PC. Poster session presented at the annual meeting of the Association of Behavioral and Cognitive Therapies, National Harbor, MD.
Google Scholar
Fisher, R. (1954). Statistical methods for research workers. Edinburgh, Scotland: Oliver and Boyd.
Google Scholar
Flake, J. K., Pek, J., Hehman, E. (2017). Construct validation in social and personality research: Current practice and recommendations. Social Psychological and Personality Science, 8, 370378. doi:10.1177/1948550617693063
Google Scholar | SAGE Journals | ISI
Gawronski, B., Deutsch, R., Banse, R. (2011). Response interference tasks as indirect measures of automatic associations. In Klauer, K. C., Voss, A., Stahl, C. (Eds.), Cognitive methods in social psychology (pp. 78123). New York, NY: Guilford Press.
Google Scholar
Gonthier, C., Macnamara, B. N., Chow, M., Conway, A. R. A., Braver, T. S. (2016). Inducing proactive control shifts in the AX-CPT. Frontiers in Psychology, 7, Article 1822. doi:10.3389/fpsyg.2016.01822
Google Scholar | Crossref
Gotlib, I. H., Joormann, J. (2010). Cognition and depression: Current status and future directions. Annual Review of Clinical Psychology, 6, 285312. doi:10.1146/annurev.clinpsy.121208.131305
Google Scholar | Crossref | ISI
Hedge, C., Powell, G., Sumner, P. (2018). The reliability paradox: Why robust cognitive tasks do not produce reliable individual differences. Behavior Research Methods, 50, 11661186. doi:10.3758/s13428-017-0935-1
Google Scholar | Crossref
Henry, L., Wickham, H. (2019). Purrr: Functional programming tools (R package Version 0.3.2) [Computer software]. Retrieved from https://CRAN.R-project.org/package=purrr
Google Scholar
Hussey, I., Hughes, S. (2018). Hidden invalidity among fifteen commonly used measures in social and personality psychology. PsyArXiv. doi:10.31234/osf.io/7rbfp
Google Scholar | Crossref
Ioannidis, J. P. A. (2005). Why most published research findings are false. PLOS Medicine, 2(8), Article e124. doi:10.1371/journal.pmed.0020124
Google Scholar | Crossref | ISI
Ioannidis, J. P. A., Tarone, R., McLaughlin, J. K. (2011). The false-positive to false-negative ratio in epidemiologic studies. Epidemiology, 22, 450456. doi:10.1097/EDE.0b013e31821b506e
Google Scholar | Crossref | ISI
Jones, A., Christiansen, P., Field, M. (2018). Failed attempts to improve the reliability of the alcohol visual probe task following empirical recommendations. Psychology of Addictive Behaviors, 32, 922932. doi:10.31234/osf.io/4zsbm
Google Scholar | Crossref
Kanyongo, G. Y., Brook, G. P., Kyei-Blankson, L., Gocmen, G. (2007). Reliability and statistical power: How measurement fallibility affects power and required sample sizes for several parametric and nonparametric statistics. Journal of Modern Applied Statistical Methods, 6, 8190. doi:10.22237/jmasm/1177992480
Google Scholar | Crossref
Kappenman, E. S., Farrens, J. L., Luck, S. J., Proudfit, G. H. (2014). Behavioral and ERP measures of attentional bias to threat in the dot-probe task: Poor reliability and lack of correlation with anxiety. Frontiers in Psychology, 5, Article 1368. doi:10.3389/fpsyg.2014.01368
Google Scholar | Crossref | ISI
Koo, T. K., Li, M. Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15, 155163. doi:10.1016/j.jcm.2016.02.012
Google Scholar | Crossref | ISI
Kruijt, A.-W., Field, A. P., Fox, E. (2016). Capturing dynamics of biased attention: Are new attention variability measures the way forward? PLOS ONE, 11(11), Article e0166600. doi:10.1371/journal.pone.0166600
Google Scholar | Crossref
LeBel, E. P., Paunonen, S. V. (2011). Sexy but often unreliable: The impact of unreliability on the replicability of experimental findings with implicit measures. Personality and Social Psychology Bulletin, 37, 570583. doi:10.1177/0146167211400619
Google Scholar | SAGE Journals | ISI
Love, J., Selker, R., Marsman, M., Jamil, T., Dropmann, D., Verhagen, J., . . . Wagenmakers, E.-J. (2019). JASP: Graphical statistical software for common sta-tistical designs. Journal of Statistical Software, 88(2). doi:10.18637/jss.v088.i02
Google Scholar | Crossref
Luck, S. J. (2019, February 19). Why experimentalists should ignore reliability and focus on precision [Blog post]. Retrieved from https://lucklab.ucdavis.edu/blog/2019/2/19/reliability-and-precision
Google Scholar
MacLeod, C., Grafton, B. (2016). Anxiety-linked attentional bias and its modification: Illustrating the importance of distinguishing processes and procedures in experimental psychopathology research. Behaviour Research and Therapy, 86, 6886. doi:10.1016/j.brat.2016.07.005
Google Scholar | Crossref
MacLeod, C., Mathews, A., Tata, P. (1986). Attentional bias in emotional disorders. Journal of Abnormal Psychology, 95, 1520.
Google Scholar | Crossref | ISI
MacLeod, C. M. (1991). Half a century of research on the Stroop effect: An integrative review. Psychological Bulletin, 109, 163203.
Google Scholar | Crossref | ISI
MacLeod, J. W., Lawrence, M. A., McConnell, M. M., Eskes, G. A., Klein, R. M., Shore, D. I. (2010). Appraising the ANT: Psychometric and theoretical considerations of the Attention Network Test. Neuropsychology, 24, 637651. doi:10.1037/a0019803
Google Scholar | Crossref | ISI
Marsman, M., Wagenmakers, E.-J. (2017). Bayesian benefits with JASP. European Journal of Developmental Psychology, 14, 545555. doi:10.1080/17405629.2016.1259614
Google Scholar | Crossref
Marwick, B. (2019). Wordcountaddin: Word counts and readability statistics in R markdown documents (R package Version 0.3.0.9000) [Computer software]. Retrieved from https://github.com/benmarwick/wordcountaddin
Google Scholar
McGraw, K. O., Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1, 3046.
Google Scholar | Crossref | ISI
Michalke, M. (2018a). koRpus: An R package for text analysis (R package Version 0.11-5) [Computer software]. Re-trieved from https://reaktanz.de/?c=hacking&s=koRpus
Google Scholar
Michalke, M. (2018b). sylly: Hyphenation and syllable counting for text analysis (R package Version 0.1-5) [Computer software]. Retrieved from https://reaktanz.de/?c=hacking&s=sylly
Google Scholar
Michalke, M. (2019). koRpus.lang.en: Language support for ‘koRpus’ package: English (R package Version 0.1-3) [Computer software]. Retrieved from https://undocumeantit.github.io/repos/l10n/pckg/koRpus.lang.en/index.html
Google Scholar
Morey, R. D., Lakens, D. (2016). Why most of psychology is statistically unfalsifiable. Retrieved from https://github.com/richarddmorey/psychology_resolution/blob/master/paper/response.pdf
Google Scholar
Müller, K., Wickham, H. (2019). tibble: Simple data frames (R package Version 2.1.3) [Computer software]. Retrieved from https://CRAN.R-project.org/package=tibble
Google Scholar
Parsons, S. (2019a). splithalf package documentation. Retrieved from https://sdparsons.github.io/splithalf_documentation/
Google Scholar
Parsons, S. (2019b). splithalf: Robust estimates of split half reliability (R package Version 5) [Computer software]. doi:10.6084/m9.figshare.5559175.v5
Google Scholar | Crossref
Peters, G.-J. Y. (2014). The alpha and the omega of scale reliability and validity: Why and how to abandon Cronbach’s alpha and the route towards more comprehensive assessment of scale quality. European Health Psychology, 16, 5669.
Google Scholar
Price, R. B., Kuckertz, J. M., Siegle, G. J., Ladouceur, C. D., Silk, J. S., Ryan, N. D., . . . Amir, N. (2015). Empirical recommendations for improving the stability of the dot-probe task in clinical research. Psychological Assessment, 27, 365376. doi:10.1037/pas0000036
Google Scholar | Crossref | ISI
R Core Team . (2018). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.
Google Scholar
Revelle, W. (2018). psych: Procedures for psychological, psychometric, and personality research (R package Version 1.8.12) [Computer software]. Retrieved from https://CRAN.R-project.org/package=psych
Google Scholar
Richmond, L. L., Redick, T. S., Braver, T. S. (2016). Remembering to prepare: The benefits (and costs) of high working memory capacity. Journal of Experimental Psychology: Learning, Memory, and Cognition, 41, 17641777. doi:10.1037/xlm0000122
Google Scholar | Crossref
Rodebaugh, T. L., Scullin, R. B., Langer, J. K., Dixon, D. J., Huppert, J. D., Bernstein, A., . . . Lenze, E. J. (2016). Unreliability as a threat to understanding psychopathology: The cautionary tale of attentional bias. Journal of Abnormal Psychology, 125, 840851. doi:10.1037/abn0000184
Google Scholar | Crossref
Rouder, J. N., Haaf, J. M. (2018a). Power, dominance, and constraint: A note on the appeal of different design traditions. Advances in Methods and Practices in Psychological Science, 1, 1926.
Google Scholar | SAGE Journals
Rouder, J. N., Haaf, J. M. (2018b). A psychometrics of individual differences in experimental tasks. PsyArXiv. doi:10.31234/osf.io/f3h2k
Google Scholar | Crossref
Rouder, J. N., Kumar, A., Haaf, J. M. (2019). Why most studies of individual differences with inhibition tasks are bound to fail. PsyArXiv. doi:10.31234/osf.io/3cjr5
Google Scholar | Crossref
Schmidt, F. L., Hunter, J. E. (1996). Measurement error in psychological research: Lessons from 26 research scenarios. Psychological Methods, 1, 199223.
Google Scholar
Schmukle, S. C. (2005). Unreliability of the dot probe task. European Journal of Personality, 19, 595605. doi:10.1002/per.554
Google Scholar | Crossref | ISI
Shrout, P. E., Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86, 420428.
Google Scholar | Crossref | ISI
Sigurjónsdóttir, Ó., Sigurðardóttir, S., Björnsson, A. S., Kristjánsson, Á. (2015). Barking up the wrong tree in attentional bias modification? Comparing the sensitivity of four tasks to attentional biases. Journal of Behavior Therapy and Experimental Psychiatry, 48, 916. doi:10.1016/j.jbtep.2015.01.005
Google Scholar | Crossref
Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74, 107120. doi:10.1007/s11336-008-9101-0
Google Scholar | Crossref | ISI
Slaney, K. L., Tkatchouk, M., Gabriel, S. M., Maraun, M. D. (2009). Psychometric assessment and reporting practices: Incongruence between theory and practice. Journal of Psychoeducational Assessment, 27, 465476. doi:10.1177/0734282909335781
Google Scholar | SAGE Journals | ISI
Spearman, C. (1904). The proof and measurement of association between two things. The American Journal of Psychology, 15, 72101. doi:10.2307/1412159
Google Scholar | Crossref
Spearman, C. (1910). Correlation calculated from faulty data. British Journal of Psychology, 1904-1920, 3, 271295. doi:10.1111/j.2044-8295.1910.tb00206.x
Google Scholar | Crossref
Staugaard, S. R. (2009). Reliability of two versions of the dot-probe task using photographic faces. Psychology Science Quarterly, 51, 339350.
Google Scholar
Strauss, M. E., McLouth, C. J., Barch, D. M., Carter, C. S., Gold, J. M., Luck, S. J., . . . Silverstein, S. M. (2014). Temporal stability and moderating effects of age and sex on CNTRaCS task performance. Schizophrenia Bulletin, 40, 835844. doi:10.1093/schbul/sbt089
Google Scholar | Crossref
Stroop, J. R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18, 643662.
Google Scholar | Crossref
Vacha-Haase, T., Henson, R. K., Caruso, J. C. (2002). Reliability generalization: Moving toward improved understanding and use of score reliability. Educational and Psychological Measurement, 62, 562569. doi:10.1177/0013164402062004002
Google Scholar | SAGE Journals | ISI
Vasey, M. W., Dalgleish, T., Silverman, W. K. (2003). Research on information-processing factors in child and adolescent psychopathology: A critical commentary. Journal of Clinical Child & Adolescent Psychology, 32, 8193. doi:10.1207/S15374424JCCP3201_08
Google Scholar | Crossref
Viladrich, C., Angulo-Brunet, A., Doval, E. (2017). A journey around alpha and omega to estimate internal consistency reliability. Anales de Psicología/Annals of Psychology, 33, 755782. doi:10.6018/analesps.33.3.268401
Google Scholar | Crossref
Waechter, S., Nelson, A. L., Wright, C., Hyatt, A., Oakman, J. (2014). Measuring attentional bias to threat: Reliability of dot probe and eye movement indices. Cognitive Therapy and Research, 38, 313333. doi:10.1007/s10608-013-9588-2
Google Scholar | Crossref | ISI
Waechter, S., Stolz, J. A. (2015). Trait anxiety, state anxiety, and attentional bias to threat: Assessing the psychometric properties of response time measures. Cognitive Therapy and Research, 39, 441458. doi:10.1007/s10608-015-9670-z
Google Scholar | Crossref | ISI
Wagenmakers, E.-J., Love, J., Marsman, M., Jamil, T., Ly, A., Verhagen, J., . . . Morey, R. D. (2018). Bayesian inference for psychology. Part II: Example applications with JASP. Psychonomic Bulletin & Review, 25, 5876. doi:10.3758/s13423-017-1323-7
Google Scholar | Crossref
Wickham, H. (2016). ggplot2 (R package Version 3.2.0) [Computer software]. Retrieved from https://ggplot2.tidyverse.org
Google Scholar
Wickham, H. (2017). tidyverse: Easily install and load the ‘tidyverse’ (R package Version 1.2.1) [Computer soft-ware]. Retrieved from https://CRAN.R-project.org/package=tidyverse
Google Scholar
Wickham, H. (2019a). forcats: Tools for working with categorical variables (factors) (R package Version 0.4.0) [Computer software]. Retrieved from https://CRAN.R-project.org/package=forcats
Google Scholar
Wickham, H. (2019b). stringr: Simple, consistent wrappers for common string operations (R package Version 1.4.0) [Computer software]. Retrieved from https://CRAN.R-project.org/package=stringr
Google Scholar
Wickham, H., François, R., Henry, L., Müller, K. (2019). dplyr: A grammar of data manipulation (R package Version 0.8.3) [Computer software]. Retrieved from https://CRAN.R-project.org/package=dplyr
Google Scholar
Wickham, H., Henry, L. (2019). tidyr: tidy messy data (R package Version 0.8.3) [Computer software]. Retrieved from https://CRAN.R-project.org/package=tidyr
Google Scholar
Wickham, H., Hester, J., Francois, R. (2018). readr: Read rectangular text data (R package Version 1.3.1) [Computer software]. Retrieved from https://CRAN.R-project.org/package=readr
Google Scholar
Wilkinson, L. & Task Force on Statistical Inference, American Psychological Association, Science Directorate . (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594604.
Google Scholar | Crossref | ISI
Xie, Y. (2019). formatR: Format R code automatically (R package Version 1.7) [Computer software]. Retrieved from https://CRAN.R-project.org/package=formatR
Google Scholar
Xie, Y., Allaire, J. J. (2019). tufte: tufte’s styles for R Markdown documents (R package Version 0.5) [Computer software]. Retrieved from https://CRAN.R-project.org/package=tufte
Google Scholar
Yiend, J. (2010). The effects of emotion on attention: A review of attentional processing of emotional information. Cognition & Emotion, 24, 347. doi:10.1080/02699930903205698
Google Scholar | Crossref | ISI
Zimmerman, D. W., Zumbo, B. D. (2015). Resolving the issue of how reliability is related to statistical power: Adhering to mathematical definitions. Journal of Modern Applied Statistical Methods, 14(2), 926. doi:10.22237/jmasm/1446350640
Google Scholar | Crossref
Access Options

My Account

Welcome
You do not have access to this content.



Chinese Institutions / 中国用户

Click the button below for the full-text content

请点击以下获取该全文

Institutional Access

does not have access to this content.

Purchase Content

24 hours online access to download content

Research off-campus without worrying about access issues. Find out about Lean Library here

Your Access Options


Purchase

AMP-article-ppv for $37.50

Cookies Notification

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Find out more.
Top