Psycholinguistic Norms for 3,783 Two-Character Words in Simplified Chinese

Over 70% of the more than 56,000 most frequently used words in simplified Chinese are two-character words (2C-words). The present study collected data on subtitle frequency, number of strokes, number of meanings, familiarity, concreteness, imageability, age of acquisition, subjective frequency, subjective number of meanings (sNOMs), compositionality, emotional experience rating, sensory experience rating (SER), and semantic transparency (ST) for 3,783 commonly used 2C-words. Correlative patterns were identified between the 13 variables, all of which, with the exception of imageability, SER, and sNOM, were significantly predictive of the changes in participants’ RTs in LDs on the target words. In conclusion, skilled readers’ awareness of 2C-words’ features such as concreteness, imageability, compositionality, sNOM, and ST is closely associated with their semantic perception of the constituent characters.


Chinese 2C-Words
In Chinese, the basic writing units are Chinese characters, almost all of which are meaningful in their own right. Many Chinese characters can be used as one-character words (1C-words), but most Chinese characters can join with one or more other Chinese characters to form words that are more than one Chinese character in length. Indeed, 72% and 22.3% of the 56,008 frequently used words (State Language Affairs Commission, 2008) are 2C-words and words of more than two Chinese characters, respectively.
On the one hand, what is most prominent with 2C-words, for example, seems to be their semantic relationships with the constituent characters (Lv, 1979). A corpus-based study suggests that the constituent characters in a 2C-word have a probability of over .80 to mean the same as when they are in isolation (Yuan & Huang, 1998). According to Ge (2018), the meaning of a 2C-word can be semantically related to its constituent characters in one of six ways: The meaning of a 2C-word can be the simple composition of its constituent characters, a 2C-word's meaning can be inferred from its constituent characters, a 2C-word largely means the same as one of its constituent characters, a 2C-word is simply the semantic repetition of one Chinese character, a 2C-word has nothing to do with its constituent characters in meaning and a 2C-word's constituent characters themselves are not meaningful at all.
On the other hand, as will be briefly reviewed in the next sub-section, Chinese characters, 2C-words, and words of more than two Chinese characters have normative features. Thus, in addition to the degrees of semantic overlap, there might be different strengths of association between a 2C-word and its constituent characters in features. For example, high-frequency 2C-words are likely to be composed of high-frequency Chinese characters, and low-frequency 2C-words of low-frequency Chinese characters.
The possibility of complex relationships between 2C-words and their constituent characters in semantic overlap and in feature association may lead to special correlations between the words' normative features. However, much of this concern remains to be settled at the present time.

Normative Studies in Chinese
Many studies have been conducted on the normative features of Chinese characters, 2C-words, and expressions of more than two Chinese characters in simplified and traditional Chinese (e.g., Chang et al., 2016;Li et al., 2016;Liu et al., 2007;Sun et al., 2018;Sze et al., 2015;Tsang et al., 2018;Tse & Yap, 2018;Tse et al., 2017;Wang et al., 2008Wang et al., , 2020Yao et al., 2017;Yee, 2017). For example, Liu et al. (2007) investigated 15 features of 2,423 simplified Chinese characters. They suggested that for skilled readers, the reaction times (RTs) in naming may be predicted by cumulative frequency, phonological frequency, age of acquisition (AOA), number of word formations, number of components, number of strokes (NOSs), familiarity, concreteness, imageability, and regularity of Chinese characters. Similarly, Chang et al. (2016) examined 10 features of 3,314 traditional Chinese characters. The initial phonemes, semantic ambiguity, familiarity, frequency, consistency, and regularity of Chinese characters were significant predictors of naming RTs for skilled readers, but the semantic combinability, homophone density, phonetic combinability, and NOS were not. Wang et al. (2008) examined five features (valence, arousal, dominance, appulsion, and familiarity) of 1,500 emotional 2C-words in simplified Chinese and showed that valence, arousal, dominance, and familiarity communicated 98% of the emotional information of these words. Yao et al. (2017) further studied the relationships between the affective (valence and arousal) and semantic (concreteness, imageability, context availability, and familiarity) attributes of emotional 2C-words. Consistent with Guasch et al. (2016), they found that valence and arousal were not predictive of RTs in lexical decisions (LDs) for skilled readers. Adopting the mega-study methodology, Tse et al. (2017) conducted a project on more than 25,000 2C-words in traditional Chinese and concluded that frequency was the strongest predictor of LD RTs for skilled readers. Tse and Yap (2018) also examined the constituent characters' frequencies, NOS, phonological consistency, homophonic density, and semantic transparency (ST) of 18,983 2C-words in traditional Chinese. The orthographic and semantic variables were more important than the phonological variables of the constituent characters in their influences on the recognition of 2C-words.
It is noteworthy that Tsang et al. (2018) conducted an elegant mega-study on 1,020 1C-words, 10,022 2C-words, 949 three-character words, and 587 four-character words. They showed that objective indexes such as word length and frequency of words of more than one Chinese character and cumulative character frequency, number of meanings (NOMs), NOSs, and number of pronunciations of Chinese characters were significantly predictive of the changes in LD RTs. Furthermore, they revealed a U-shape relationship between word-length and LD RTs. Similarly, Sun et al. (2018) conducted a lexical database study on 3,913 1C-words, 34,233 2C-words, 7,143 three-character words, and 3,355 four-character words in a naming task as well as an LD task.
Chinese characters, 2C-words, and expressions of more than two Chinese characters all have normative features that are similar to words or fixed expressions in other languages such as English. Meanwhile, however, there are studies that seem to have revealed special aspects of Chinese expressions. In addition to the features themselves such as NOS and phonology-related attributes that are particular of Chinese characters, for example, the U-shape correlation between word length and LD RT (Tsang et al., 2018) and the complex relations between Chinese four-character-idioms' features and internal structures  are promising to inspire studies into Chinese lexical processing. Considering the possible relationships between 2C-words and their constituent characters, one may argue that there should be particular aspects of 2C-words' normative features. Studies in support of this argument will be of significance since most commonly used Chinese words are 2C-words.

The Present Study
Words are the basic units in the mental lexicon (Dronjic, 2011) and in sentence reading (Bai et al., 2008) by Chinese skilled readers, but a 2C-word can be perceived both as a whole entity and according its constituent characters in its early stage of recognition (Du et al., 2013;Liu et al., 2010;Peng et al., 1999;Tong et al., 2014;Wang et al., 2017;Yang, 2013). Indeed, several models (e.g., Peng et al., 1999;Taft & Zhu, 1997;Zhou & Marslen-Wilson, 2009) have been formulated on 2C-word recognition, which potentially indicate that the recognition of a 2C-word can be influenced by the character-level as well as by the word-level features. According to the multilevel interactive-activation framework (Taft & Zhu, 1997), for example, skilled readers are supposed to implement information processing at the levels of strokes, components, constituent characters, and words in 2C-word recognition. Similarly, the Inter-and Intra-Connection Model  and the Lexical Representation Framework for Chinese Compound Word (Zhou & Marslen-Wilson, 2009) assume inter-relationships between the representations for 2C-words and their constituent characters. In fact, it is repeatedly indicated that feature manipulations at the word level and at the level of constituent characters do seem promising to yield new insight into the mechanism of 2C-word recognition (Huang & Lee, 2018;Miwa et al., 2014;Sun et al., 2018).
As reviewed in the previous sub-section, frequency, familiarity, NOM, AOA, imageability, and concreteness are among those that are often examined to indicate the normative characteristics of expressions consisting of one or more Chinese characters. Considering the relevant findings of both normative and psychological studies, an examination of NOS, compositionality, emotional experience rating (EER), sensory experience rating (SER), subjective frequency (sF), ST, and subjective number of meanings (sNOMs) would also be of a significant value.
NOS reflects the visual complexity of 2C-words. The efficiency of recognition of Chinese characters Liversedge et al., 2014;Sze et al., 2015) and Japanese kanjis (Tamaoka & Kiyama, 2013) tends to be influenced by NOS. SF is often measured by asking participants to estimate word usage frequencies on a Likert scale (Desrochers et al., 2010). It is a powerful predictor of participant performance in lexical processing (e.g., Balota et al., 2004;Chen & Dong, 2019). Indeed, normative descriptions of sF are available for words in alphabetic languages (Balota et al., 2004;Desrochers et al., 2010;Gonthier et al., 2009). Similarly, sNOM is measured by asking participants to rate the NOMs of a word on a Likert scale. It is an assessment of how many meanings that the users believe a word to have, and it can also be adopted as a variable in cognitive tasks (Chang et al., 2016).
Compositionality is often used in studies on words of more than one morpheme, and is defined as the degree to which the constituent morphemes contribute to the overall meaning of the whole expression (Cieślicka, 2013). In addition to experimental studies (Cieślicka, 2013), compositionality has been regarded as a variable in normative studies on complex expressions (Dasgupta et al., 2016;Li et al., 2016). It was taken into consideration in the present study because nearly all 2C-words can be regarded as composed of two morphemes of Chinese characters.
EER refers to the degree of ease with which a word arouses the user's emotional experiences (Newcombe et al., 2012), and SER represents the degree of ease with which a word evokes sensory and perceptual experiences in the users (Juhasz et al., 2011). For example, "fountain" has a low EER score, whereas "relief" has a high score; "bag" scores low for SER, whereas "thirst" scores high. Both EER (Newcombe et al., 2012) and SER (Juhasz et al., 2011) have been shown to influence participants' performance in cognitive tasks and are potentially important in cognitive processing according to theories of embodied cognition (Barsalou, 2008;Goldman & de Vignemont, 2009). In addition, normative data on SER are available in English (Juhasz et al., 2011) and French (Bonin et al., 2015). ST means the degree to which the constituents are semantically related to the word itself (Tse et al., 2017). It indicates the degree to which the word can be understood literally through its constituent(s), and it has been proved to be significantly influential for word recognition.
In summary, NOS is a particular feature of Chinese characters, but all the other 12 ones (frequency, NOM, familiarity, AOA, sF, sNOMs, concreteness, ST, imageability, compositionality, EER, and SER) have been commonly examined in many languages. One may expect to achieve similar results to those of previous studies, concerning the correlations between the 13 features and the features' capability of predicting word recognition in LDs. Considering the possibility of complex relationships between 2C-words and their constituent characters, however, we propose two questions: What correlations between these 13 features might be special of 2C-words? Which of these features will be particular of 2C-words in predicting word recognition in an LD task?

Methods
This section consists of three sub-sections: word selection, feature scores collection and the implementation of an LD task. We expected to achieve an insight into the effects of these features on participants' LD performance and thus have a deeper understanding of the inter-relationships between these normative characteristics. This study was approved by the Ethics Committee of Zhejiang University.

Word Selection
To determine those that were representative of the most frequently used 2C-words in simplified Chinese, we first turned to Modern Chinese Frequency Dictionary (Language Teaching and Research Institute of Beijing Language and Culture University, 1986) and HSK Chinese Proficiency Test Vocabulary Guideline (A Dictionary of Chinese Usage: 8,000 words) (Chinese Proficiency Oral Test Center of Beijing Language and Culture University, 2009). Modern Chinese Frequency Dictionary is a reference of word frequency for 8,000 of the most commonly used words in the 1980s, while the 8,822 words in HSK Chinese Proficiency Test Vocabulary Guideline (A Dictionary of Chinese Usage: 8,000 words), which are ranked into four categories according to usage frequency, are presumably the most common in use at the present time. The initially selected target words were the 4,197 2C-words that appear in both of these two resources and the 183 2C-words that fall into the first two categories in HSK Chinese Proficiency Test Vocabulary Guideline (A Dictionary of Chinese Usage: 8,000 words) but are not listed in Modern Chinese Frequency Dictionary. All these 4,380 words are compound words and belong to the word classes of nouns, verbs, or adjectives according to Modern Chinese Standard Dictionary (Li, 2014). By a compound word we mean a 2C-word that consists of two Chinese characters each of which is meaningful in its own right according to Ge (2018).
A word that is frequently used as a response in a wordassociation task should be more common in use than a word that is not often used as a response in such a task. Thus, as a second step to determine the target words, we subsequently conducted a task of single-word association with the initially selected words as the stimuli. We randomly divided them into 63 groups. Each group consisted of 69 to 75 words, which were printed on a single piece of paper. We obtained 50 copies of the 63 sheets of words, and 3,150 college students (1,727 females, 1,423 males; M = 20.3 years, age range: 18-25 years) participated in the survey. Each student received a list of stimuli and was required to write down the first word that they thought of as the response to each stimulus. In the end, 3,796 of the stimulus words were selected that also appeared at least twice among the responses.

Feature Scores Collection
Objective features. The NOM and NOS scores were obtained from Modern Chinese Polysemous Words Dictionary (Yuan, 2001) and Xinhua Dictionary (http://zidian.cibiao.com/), respectively. Following Tse et al. (2017), we took the score for each word from the corpus of contextual diversity subtitle frequency (Cai & Brysbaert, 2010) as the representative of the word's objective frequency (referred to as subTF, hereafter).
Subjective features. Adults might be unable to remember the exact learning age for a specific word (Morrison et al., 1997), and they are likely to have acquired most words in written forms as the result of school education. Thus, similar to previous studies (Barca et al., 2002;Cortese & Khanna, 2008;Liu et al., 2007;Schock et al., 2012), we required participants to report in which period of school life they thought they learned the words. The 3,796 words were randomly divided into 44 groups, with each group consisting of 86 or 87 words. A seven-point Likert scale of AOA was designed for each group of words, which were listed on a single sheet with the instructions (see Table 1) printed at the top of the paper (See Appendix for the original Chinese version of the instructions). Similarly, 44 scales were created to measure compositionality, concreteness, EER, familiarity, imageability, SER, ST, and sF.
According to Modern Chinese Polysemous Words Dictionary (Yuan, 2001), 62% and 96% of the 3,796 words have one meaning and not over three meanings, respectively. Thus, they were divided into 29 groups at random, with each group consisting of 130 to 133 words, and a three-point Likert scale of sNOM was designed for each group of words. In the end, we created 425 single-sheet scales.
It appeared acceptable to obtain around 15 scores in the evaluation of an attributive feature of the materials in a wordrecognition study in Chinese Zhang et al., 2013). Thus, we made 17 copies of the 425 single-sheet scales. The 7,225 sheets were randomly sent to 7,225 students from seven universities

AOA
For each word, please indicate when you first saw, heard, or used it by selecting one of the seven numbers next to it. Numbers 1, 2, 3, 4, 5, and 6 refer to when you were a student of grades 3, 5, 8, 9, 11, and 12, respectively, and 7 indicates when you were enrolled as a university student. The bigger the number, the later you think you acquired it. Compositionality For each word, please decide the extent to which both the constituent characters contribute to the overall meaning of the word as a whole by selecting one of the seven numbers next to it. The bigger the number, the more you think the constitute characters contribute to the overall meaning of the word.

Concreteness
For each word, please decide the extent to which it makes you think of specific entities or actions by selecting one of the seven numbers next to it. The bigger the number, the more easily you associate it with specific things or actions. EER For each word, please rate the degree to which it evokes your emotional experiences (happiness, anger, sadness, pain, or fear) by selecting one of the seven numbers next to it. The bigger the number, the more easily your emotional experiences may be evoked. Familiarity For each word, please decide the extent to which you are familiar with it by selecting one of the seven numbers next to it. The bigger the number, the more familiar it is to you. Imageability For each word, please rate the degree to which it arouses mental images by selecting one of the seven numbers next to it. The bigger the number, the clearer the images it arouses in your mind. SER For each word, please decide the degree to which it arouses your sensory experiences (taste, touch, sight, sound, or smell) by selecting one of the seven numbers next to it. The bigger the number, the easier your sensory experiences may be aroused. sF For each word, please rate the extent to which it is frequently used in your daily life by selecting one of the seven numbers next to it. The bigger the number, the more frequently you think it is used in everyday life.

ST
For each word, please decide the extent to which it may be understood literally through its constituent character(s) by selecting one of the seven numbers next to it. The bigger the number, the more easily it may be understood through its constituent characters. sNOM For each word, please indicate the number of meanings it has by selecting one of the three numbers next to it. Numbers 1, 2, and 3 indicate that it has one, two, and three or more meanings, respectively. The rating task was conducted collectively in classroom settings with each sheet delivered to one student during the break times of classroom courses. The teachers were the experimenters who explained the instructions. Students were required to carefully read the instructions before starting to rate the stimuli. The raters did not participate in the abovementioned task of single-word association.

The LD Task
Participants. One hundred and seventy college students (113 females, 57 males; M = 22.0 years, age range: 18-28 years) were recruited on the campus of a university by means of a flyer advertisement. They were all native speakers of Chinese and had normal or corrected-to-normal vision. After the experiment, they each received RMB 30 (USD 4.5) as a reward. Participants gave informed consent in written form in accordance with the Declaration of Helsinki.
Materials. The 3,796 words were randomly divided into six groups of 632 to 636 words. For every group, we first transposed the constituent characters of each word to make the same number of pre-non-words. Then, we exchanged the left constituent character of a pre-non-word with the left constituent character of another pre-non-word and created the same number of two-character non-words as the word items in the group. The non-words as clusters of two Chinese characters did not exist as wholes in Modern Chinese Dictionary (Institute of Linguistics, Chinese Academy of Social Science, 1988). The first group of words was further divided into three sub-groups of roughly equal size, as was the corresponding non-word group. Every sub-group of words was mixed randomly with the same number of non-words to create one block of trials. The three blocks of trials were considered as materials group 1. Similarly, materials groups 2, 3, 4, 5, and 6 were created in the same manner.
Procedure. Six programs were designed for the six materials groups using DMDX (Forster & Forster, 2003). In a computer room, each program was run on five personal computers. No more than 30 students at a time collectively participated in the experiment, so that each of the six programs was run by roughly the same number of participants. Participants pressed the space bar to initiate a trial: after the red cross of fixation was shown for 500 ms on the center of the screen, the mask ( ) appeared at the position where the red cross of fixation had disappeared, and remained there for 200 ms. The target stimulus was subsequently displayed on the center of the screen. Participants made lexical decisions on the target. Half participants were required to press the keys "Z" and "/," and the other half were required to press the keys "/" and "Z" if the target was a word and a nonword in Chinese, respectively. The target remained on the screen for 2,500 ms, or until a key press was received. The inter-trial interval was 1,000 ms. Prior to the experiment, the experimenter orally delivered the instructions, which were also presented at the beginning of the programs. Participants were told not to start the programs before they had carefully read and understood the instructions. In each program, there were 12, 4, and 4 trials of practice prior to the first, second, and third blocks of the experimental trials, respectively. Eventually, 28 or 29 participants were assigned to each of the six programs, and 28 or 29 responses were collected for each of the 3,796 2C-words.

Results
Six of the originally determined words were removed because they are not included in Cai and Brysbaert (2010). With reference to Liu et al. (2007), seven more words were removed because participants' LD error rates on them were higher than .50. Therefore, the results consisted of the data for the remaining 3,783 2C-words. Also, as suggested in Liu et al. (2007), two participants' LD data were excluded since their error rates were higher than 20%. The remaining 168 participants achieved an overall accuracy of .96. After the incorrect responses were removed, the LD RTs shorter than 200 ms or 3 SD above the overall average were excluded. Finally, 5.7% of the data was discarded. Table 2 summarizes the results.
The lowest Cronbach's Alpha was over .96, suggesting a high reliability of the 10 subjective measurements. One might note that the mean RT on the 2C-words was 750 ms in Tsang et al. (2018) but was 598 ms in the present study. There are at least two reasons to account for the nearly 150 ms difference in LD RTs. Firstly, as indicated in 2.1, the 3,783 words in the present study must be more common in use than those in Tsang et al. (2018). Moreover, it seems that the 2C-words in Tsang et al. (2018) were of a lower frequency (M = 303) than those in the present study (M = 600), with reference to Cai and Brysbaert (2010). Secondly, participants in the present study were from the same campus, while those in Tsang et al. (2018) appeared to be rather heterogenous in background. A large degree of diversity among participants could result in a big increase in the averaged RT in LDs. Indeed, the SD in Tsang et al. (2018) (SD = 91 ms) appears to be much larger than the SD in the present study (SD = 51 ms).
We conducted a Pearson-correlation analysis and a principal component analysis (PCA) for the feature scores and a regression analysis with the feature scores and the LD RTs as the predictors and the dependent variable, respectively.

Pearson-Correlation Analysis Results
To avoid the influence of distributional skewedness, the subTF and NOM scores were log-transformed before the implementation of the Pearson-correlation analysis. As indicated in Table 3, 64 of the 78 pair-wise correlations between the 13 variables were significant (ps < .05 or <.01), for instance, the positive correlations between NOM and sNOM, between subTF and sF, between SF and EER, between familiarity and sF, between Compositionality and NOS, SER, and ST, between concreteness and ST and SER, between EER and SER, between SER and imageability and between SER and concreteness, ST, and EER, and the negative correlations between SubTF and AOA, between AOA and sF and ST, between familiarity and AOA, between concreteness and sNOM, between SER and AOA and between ST and AOA and sNOM.
We also revealed a few new correlations between the normative features in comparison with previous studies: (a) ST is positively correlated with compositionality and concreteness but negatively correlated with AOA and sNOM; (b) subTF is positively correlated with EER but negatively correlated with AOA; (c) SER is positively correlated with EER, compositionality, concreteness, and ST; (d) compositionality is negatively correlated with NOM, and concreteness is negatively correlated with sNOM.

Regression Analysis Results
Similar to previous studies (e.g., Liu et al., 2007), a multiple regression analysis was conducted with the 13 variables as the predictors and the RT as the dependent variable. In order to avoid the influence of subject variability, the analysis was conducted on the z-scored RTs. The result showed that 10 features (subTF, NOM, EER, familiarity, AOA, concreteness, ST, compositionality, sF, and NOS) were significantly powerful in predicting changes in participants' LD RTs to the 2C-words (see Table 5) (F(13, 3,772) = 181.320, p < .001, R 2 = .325). However, the other three features (Imageability, SER, and sNOM) were not significantly predictive of the LD RTs. Moreover, we conducted the same analysis for the 2,510 2C-words common in the present study and in Tsang et al. (2018) and achieved the same pattern of results (see Table 5) (F(13, 2,496) = 72.074, p < .001, R 2 = .273).

Discussion
Both results that were similar to those of previous studies and results that appeared to be particular of 2C-words were obtained, concerning the correlations between the 13 features and the features' predictability of LD RTs.

Similarities to Previous Studies
Pearson-correlations. As indicated in Table 3, the more meanings a 2C-word has in the dictionary, the more meanings the users have for it in everyday life. Similar to the case of English disyllabic words (Schock et al., 2012), 2C-words with   high subTFs are likely to be acquired early in life. The more frequently a 2C-word is used in everyday life, the more likely it is to evoke emotional experiences in the users. The earlier a 2C-word is learned, the more frequently it is used in everyday life and the more semantically transparent it tends to be. Similar to the case of Chinese idioms , the more familiar 2C-words are to the users, the more commonly they are used in everyday life and the earlier they are likely to be acquired. The more often a 2C-word may be considered to be a composition of its individual characters, the more visually complicated it is, the more likely it is to arouse user sensory experiences and the more likely it can be understood literally through its constituent characters. The more concrete a 2C-word is in meaning, the more semantically transparent it is and the easier it is to evoke user sensory experiences. However, concrete 2C-words tend to have fewer meanings for the users in their daily lives. Similar to the case of Chinese characters (Liu et al., 2007), the more concrete a 2C-word is, the more likely it is to evoke mental images. However, this correlation between concreteness and imageability (r = .101) is seemingly weaker than in Liu et al. (2007;r = .796), which will be explained in the next sub-section.
A 2C-word that easily arouses emotional experiences in users also tends to evoke sensory experiences. Consistent with Bonin et al. (2015), 2C-words that more easily arouse  Note. 3,783 2C-words were the targets in the present study, and 2,510 2C-words were the common words in the present study and in Tsang et al. (2018). *p < .1. **p < .01. ***p < .001.
sensory experiences in the users are typically acquired earlier in life. Furthermore, in agreement with Juhasz et al. (2011) and Bonin et al. (2015), 2C-words that more easily evoke mental images are also more likely to evoke sensory experiences. The 2C-words that more easily arouse sensory experiences in the users are usually more concrete in meaning, are more semantically transparent and are more likely to evoke emotional experiences. The more semantically transparent a 2C-word is, the earlier it is likely to be acquired and the fewer meanings it tends to have in the users' daily lives.
Principal components. According to Table 4, the most frequently used 2C-words in daily life, which are familiar to the users and are acquired early by them, also arouse emotional experiences in the users more easily in word recognition. Similar to this case is the way in which bilinguals recognize emotional words in the second language (L2). For example, those who began to learn L2 at a late age in classroom settings are likely to fail to process the affective information of emotional words in L2 during cognitive tasks . After all, one of the two important ways of enlarging one's vocabulary is to enrich his or her life experiences (Andrews et al., 2009). It is understandable that the second and third principal component represented the dimensions of semantics and meaningfulness, respectively. However, it is interesting that compositionality clustered with NOS, which will be explained in the next sub-section.
Features significantly predictive of LD RTs. In agreement with previous findings, subTF (Cai & Brysbaert, 2010;Tsang et al., 2018;Tse et al., 2017), NOM (Peng et al., 2003), EER (Newcombe et al., 2012), and familiarity and AOA (Chen et al., 2004) predicted changes in participants' RTs to the 2C-words. That is, consistent with Liu et al.'s (2007) finding on Chinese characters, 2C-words are similar to 1C-words when measured in concreteness. Similar to the case of English compounds (Juhasz et al., 2015), the more semantically transparent a 2C-word is, the more easily it is to be recognized.
The simple correlation was not significant between compositionality and z-scored RT but the partial correlation was. That is, the effect of compositionality must have been affected by other variables such as NOS and ST, which were strongly correlated with compositionality. Compositionality might have a weak power in predicting changes in the LD RTs. However, the beta value was positive for compositionality but negative for ST, which will be explained in the next sub-section.
The cases of sF and NOS also tended to be somewhat controversial. SF has been shown to be a reliable determinant of participant performance in cognitive tasks concerning monosyllabic words in English (Balota et al., 2001(Balota et al., , 2004. Similarly, it was significantly predictive of changes to participants' RTs in the present study. However, Brysbaert and Cortese (2011) held the view that the weight of sF predominantly relied on the frequency quality. The influence of sF could be omitted when frequency values were reliably measured. Furthermore, sF did not appear to be predictive of participants' RTs in picture naming in Liu et al. (2011). Thus, as indicated in Table 5, sF might have been weaker than subTF in predicting participants' RTs in the present study.
Similarly, the finding that NOS appeared to predict changes in RTs in participants is compatible with the findings of previous studies on Chinese expressions (Liu et al., 2007;Tsang et al., 2018;Tse et al., 2017) and Japanese kanjis (Tamaoka & Kiyama, 2013). However, Chang et al. (2016) reported that NOS had a null effect on participants' naming of traditional Chinese characters. In other words, sF or NOS might be slightly more constrained than subTF, concreteness, familiarity, ST, NOM, EER, or AOA in terms of their influences on 2C-word recognition by skilled readers.
Features not significantly predictive of LD RTs. Similar to concreteness, imageability involves words' referents of external objects. Referring to the extent to which a word arouses mental pictures, sounds (Paivio et al., 1968), senses (Westbury et al., 2013), or sensory experiences (Paivio et al., 1968), however, imageability is also regarded as a composite measure (Dellantonio et al., 2014). It is possibly related to sensory experience ratings (Juhasz et al., 2011) or body interaction ratings (Siakaluk et al., 2008). It is similar to SER in indicating the internal bodily-related sensory experiences (Juhasz et al., 2011;Siakaluk et al., 2008).
SER is associated with words' perceptual information which is grounded on sensory experiences, and can be construed as indications of simulation (Zdrazilova & Pexman, 2013). When seeing the word "gloves," for example, the reader will be likely to unconsciously simulate the action of how to put gloves on and/or retrieve memories for the experiences of feeling warm and comfortable with gloves on in cold winter. Due to its similarity and significant correlation with SER (Juhasz et al., 2011), imageability may also reflect words' simulation information to some extent. However, activation of simulation information is rather slow (Barsalou, 2008). For example, Zdrazilova and Pexman (2013) conducted an LD task on abstract-noun recognition and revealed the context availability effect, but failed to do the influences of SER and valence. LD tends to involve a lexical pattern match (Compton & Bradshaw, 1975) and does not seem to be as capable as semantic categorization of tapping the sensory and emotional aspects of words' meanings (Zdrazilova & Pexman, 2013). In other words, changes in SER or imageability may simply not be sensitive enough to be reflected in LD RTs for a 2C-word in general. In fact, neither SER nor imageability was included in 2C-word normative studies by Sun et al. (2018), Tsang et al. (2018), Tse and Yap (2018), Tse et al. (2017), or Wang et al. (2008. Yao et al. (2017) did seem to reveal a significant imageability effect in LDs on 2C-words, but they did so on a sample of emotional words.
Words with high scores in valence or arousal are likely to have high scores in imageability (Altarriba & Bauer, 2004;Citron et al., 2014;Westbury et al., 2013).
SNOM was most strongly correlated with NOM. We conducted a regression analysis including all the variables except NOM in order to understand the result that sNOM was not significantly predictive of 2C-word LD RTs. Unfortunately, however, sNOM did not contribute significantly to changes in the dependent variable (β = −.008, t = −0.537, p = .592), but all the variables except NOM that are listed in Table 5 did (F(9, 3773) = 199.244, p < .001, R 2 = .322). More discussion is provided in the next section.

Findings Particular of 2C-Words
The significant correlations between AOA and subTF and SER and that between SER and imageability for 2C-words are similar to the cases with words in European languages. That is, consistent with the conclusion that words are the basic units in the mental lexicon (Dronjic, 2011) and in sentence reading (Bai et al., 2008) by skilled readers, the examined 2C-words were taken as whole entities by participants in the present study. The significant correlation between concreteness and imageability of 2C-words is similar to the case of Chinese characters (Liu et al., 2007), confirming the conclusion that 2C-words are similar to 1C-words in semantic representations (Ma et al., 2016).
In comparison with these two points, the present study seemed to have yielded four more interesting results: (1) The correlation was relatively weak between concreteness and imageability.
(2) Compositionality clustered with NOS in the PCA. (3) The beta values for compositionality and ST were positive and negative, respectively, in the regression equation. (4) SNOM diverged from NOM in predicting the changes in LD RTs.
On the one hand, the most important thing for a Chinese native speaker to do in literacy learning is to master a few thousand Chinese characters that are most often in use . He or she learns how to read and write a certain number of basic Chinese characters and how to use them in a few cases. Indeed, one meets the minimum requirement for literacy if she/he masters the 3,500 basic Chinese characters in simplified Chinese (State Language Affairs Commission, 1991). Even grade-three primary school students are able to be clearly aware that a 2C-word consists of two Chinese characters (Jiang, 2011). On the other hand, it is repeatedly confirmed that the constituent characters are inevitably perceivable in the recognition of a visually presented 2C-word by skilled reader (Du et al., 2013;Liu et al., 2010;Peng et al., 1999;Tong et al., 2014;Wang et al., 2017;Yang, 2013). Therefore, it should have been participants' processing of the 2C-words both at the word level and at the constituent-character level that led to the above-mentioned four interesting results.
Concreteness and imageability. The relatively weaker correlation between concreteness and imageability can be understood from two aspects. The recognition of a 2C-word is similar to that of a 1C-word in retrieving semantic representations, but 1C-words are more likely than 2C-words to be taken as wholes in cognitive tasks (Ma et al., 2016). Indeed, one has to assign cognition resources to the processing of its individual characters in the early stage of 2C-word recognition .
It is indicated that the more concrete a word is in meaning, the more imageable it is likely to be (Bonin et al., 2003;Guasch et al., 2016;Yao et al., 2017). Scholars even tend to take concreteness and imageability interchangeably (Westbury et al., 2016;Xia et al., 2016). However, the similarity between these two semantic features appeared to be of a lower degree for 2C-words than for Chinese characters. Of course, more studies are needed to elaborate this point of view.
Compositionality and NOS. The factor that compositionality clustered with NOS seemed to suggest participants' perception of the 2C-words' visual information. Firstly, the processing of configurational information of both 1C-words and 2C-words helps trigger activations of the semantic representations. For example, 两 (two) and 拳击 (boxing) are similar to 雨 (rain) and 泰山 (Taishan Mount), respectively, only in configuration, and perceptions of 两 and拳击 resulted in activations of semantic representations for雨 and 泰山, respectively (Ma et al., 2016).
Secondly, the degree to which the constituent characters can be seen as consisting of components plays a role in 2C-word recognition by skilled readers. For example, Wang et al. (2019) conducted a lexical decision task on 2C-words, the first constituent characters of which were composed of two components with one vertically above the other. The researchers also manipulated whether or not the first constituent characters were presented upside-down (presentation mode). If the first constituent characters had a high score in being rated as compositions of two components, participants tended to recognize the 2C-words according to the constituent characters and the N400 amplitude was significantly influenced by presentation mode; if the first constituent characters had a low score in being rated as compositions of two components, however, participants tended to recognize the 2C-words as wholes and no significant change of N400 amplitude was observed under the influence of presentation mode. N400 is an event-related potential component that is supposed to indicate semantic processing.
Thirdly, the constituent characters' relative positions are also regarded as important in the representation of visual features for 2C-words (Tan & Perfetti, 1999). There is a probability of .97 to obtain a non-word if the constituent characters' relative positions are exchanged in a 2C-word (Gu et al., 2015). For example, 学校 (school) is a word, but 校学 is not. Indeed, new learners of Chinese are found to be quite confusing with the constituents' relative positions (Hao & Li, 2015). It seems very important for the learner to take carefully the constituent characters' relative positions in a word of two or more Chinese characters.
Skilled readers should be able to be unconsciously aware of the configurational features of the components, constituent characters and words and aware of the spatial positions of the components and constituent characters in the perception of 2C-words according to Taft and Zhu (1997). Similar to the event that compositionality of 2C-words' first constituent characters affected participants' LD performance (Wang et al., 2019), the higher compositionality score a 2C-word had, the longer its LD RT was in the present study. In other words, the compositionality scores might have been loaded with a large amount of the raters' awareness of 2C-words' visual characteristics. NOS indicates the visual complexity of a 2C-word at the stroke level (Liu et al., 2007;Wang et al., 2020), and compositionality seems to suggest the visual characteristics of configuration and intra-structure at the level of components, constituent characters and words. This might be the reason why compositionality clustered with NOS as shown in Table 4.
Compositionality and ST. Both the words as wholes and the constituent characters can be inevitably perceived in 2C-word recognition , and this perception may be a function of the relationship diversity between the two constituent characters in 2C-words (Jia et al., 2013). There is a copulative (e.g., 争夺-seize and take by force), determinative (e.g., 鸭毛-duck's feather), numeral (e.g., 四季-the four seasons), descriptive (e.g., 大雪-heavy snow), adverbial (e.g., 难 看-ugly), or reciprocal (e.g., 白痴-idiot) relationship between the constituent characters of a 2C-word. Those with high scores in ST but low scores in compositionality (e.g., 争夺) might be more likely to be taken as whole entities than those with low scores in ST but high scores in compositionality (e.g., 难看). That is, participants were more likely to assign resources of cognition to the individual morphemes and consume cognitive resources in integrating their comprehension of the constituent characters in LDs on a 2C-word of high than of low compositionality. However, ST might function to indicate the semantic features of 2C-words as wholes. Thus, RTs were shorter to the 2C-words that had high scores than to those that had low scores in ST. As a consequence, the beta value was positive for compositionality but negative for ST as indicated in Table 5. In other words, compositionality is different from ST in indicating the semantic relationship between a 2C-word and its constituent characters.
SNOM and NOM. Words in English usually have more than one meaning (Khanna & Cortese, 2011), but it might not be that case for Chinese 2C-words (Xiao, 1991). Indeed, most 2C-words have one meaning according to the dictionary, but participants thought that most words had one and half meanings in the present study. The reason why sNOM diverged from NOM in variance probably lied in the semantically complex relationships between 2C-words and their constituent characters and in how participants behaved in 2C-word recognition.
On the one hand, the number of frequently used words is 2,000 to 5,000 in English (Schmitt, 2007, p. 828). However, a Chinese middle-school student needs to master over 3,000 of the most commonly used Chinese characters. We randomly selected 25 of the 3,500 most commonly used Chinese characters, and found that their mean NOMs was 3.44 (SD = 1.73). A 2C-word usually has fewer meanings than a Chinese character (Peng et al., 2003). On the other hand, skilled readers have developed representations for commonly used 2C-words as wholes and also for the constituent characters . Both commonly used 2C-words as whole entities and their constituent characters are perceivable in the early stage of word recognition Peng et al., 1994Peng et al., , 1999Zhang & Peng, 1992). Therefore, participants' sNOM rating of 2C-words might have been incorporated with their perceptions of several other features of the words. However, empirical evidence is still needed for further research.
Of course, there seem to be at least two limitations to the generalization of the present findings. First, a direct investigation into the semantic relationships between the 2C-words and their constituent characters could have been valuable in better understanding the data. Second, there might have been more factors than the 13 features that affected the data, since the four principal components accounted less than 60% of the variance. Undoubtedly, however, the present study has a potentially significant reference value for further research into Chinese lexical processing. Furthermore, the original data for the 3,783 2C-words are available as Supplemental Material.

Conclusion
Using objective data obtained from dictionaries regarding subTF, NOS, and NOM, as well as subjective data obtained by means of questionnaires concerning familiarity, concreteness, imageability, AOA, sF, sNOM, compositionality, EER, SER, and ST, a normative description was provided for 3,783 2C-words in simplified Chinese. Correlative patterns were identified between the 13 variables, all of which, with the exception of imageability, SER, and sNOM, were significantly predictive of the changes in participants' RTs in LDs on the target words. In conclusion, skilled readers' awareness of 2C-words' features such as concreteness, imageability, compositionality, sNOM, and ST is closely associated with their semantic perception of the constituent characters.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Social Science Foundation of China under Grant 17BYY016.

Supplemental Material
Supplemental material for this article is available online.