Development of Ocean Literacy Inventory for 16- to 18-Year-Old Students

Ocean literacy is of particular importance to an island nation like Taiwan. In this study, a Chinese version testing tool for ocean literacy was developed specifically for senior high students in Taiwan. The ocean literacy inventory can be administered using a group test format and consists of the following seven subscales: features of ocean, ocean and its life shape earth, weather and climate, earth habitable, diversity of life and ecosystems, ocean and humans are interconnected, and ocean largely unexplored. In this study, a valid sample consisting of 1,027 participants was collected and various psychometric assessments including Cronbach’s alpha, the multitrait-multimethod matrix (MTMM), confirmatory factor analysis (CFA), Multiple Indicators Multiple Causes (MIMIC), and multiple-group CFA were performed to test the reliability and validity of the tests. The results of the various analyses confirmed that the ocean literacy inventory met the requirements for standardized tests. A conversion table that listed the raw scores, standardized Z scores, and cumulative percentages for the various subscales and the full test was also compiled, so as to provide a quick reference for future users who will be using the inventory to assess ocean literacy ability.


Original Research
This study based its foundation on the ocean literacy scope and English version of the 9-to 12-grade ocean literacy knowledge scale released by the National Marine Educators Association (NMEA, 2010). The ocean literacy surveys developed by Fauville, Strang, Cannady, and Chen (2018), Markos, Boubonari, Mogias, and Kevrekidis (2015), and Guest, Lotze, and Wallace (2015) were also referenced to develop a Chinese ocean literacy inventory that possesses various psychometric characteristics and meets the requirements for a standardized test.
The necessity of marine education has been attracting increasing attention since the 1970s (Charlier & Charlier, 1971;Fortner & Lyon, 1985;Markos et al., 2015;McFadden, 1973;Picker, 1980). However, the ocean knowledge concept only appears sporadically in classroom teaching and rarely shows up in official K-12 evaluations of course materials, textbooks, or classes. In Taiwan, the new curriculum guidelines that were implemented in 2018 added and included ocean education as one of four important topics, and the subject was integrated into general courses (Ministry of Education, 2017). Therefore, even if there is a lack of oceanrelated scientific knowledge in the curricula for the other sciences, students will still have limited opportunities to learn about the concepts of ocean knowledge (Chang & Lwo, 2016).
Individuals form their attitude toward the ocean environment during their childhood. Given that elementary and junior high education form the foundation of Taiwan's education system, these two stages thus serve as key periods for inculcating lifestyle habits and values (Chang & Lwo, 2016). During the teaching process, teachers use different teaching methods and strategies to achieve ocean education objectives, and this will have a crucial impact on the ultimate aim of cultivating environmentally responsible citizens. To this end, it is necessary to begin ocean education at an early stage, making the elementary and junior high stages the ideal stages for implementing ocean education. As the senior high stage (16-18 years old) is the final stage of Taiwan's compulsory education framework, it is therefore necessary to develop a Chinese ocean literacy inventory for senior high students, and for senior high teachers to administer ocean literacy surveys to their students, so as to build a reference for designing ocean education courses and teaching approaches.
Taiwan is an island nation, a place where the ocean plays an integral role in the lives of its people. It is thus crucial for students in Taiwan to understand and refine their ocean literacy knowledge, with ocean literacy being defined as follows: "Ocean literacy is an understanding of the ocean's influence on you and your influence on the ocean" (Ocean Literacy Network, 2013). Furthermore, the institution has also proposed seven basic principles for ocean literacy, namely, (a) the earth has one big ocean with many features, (b) the ocean and life in the ocean shape the features of earth, (c) the ocean has a major influence on weather and climate, (d) the ocean made earth habitable, (e) the ocean supports a great diversity of life and ecosystems, (f) the ocean and humans are inextricably interconnected, and (g) the ocean is largely unexplored. An ocean-literate individual will be able to understand the oceans' basic origins and its fundamental concepts, utilize meaningful ways to pass down ocean knowledge, and make clear and important conclusions about the oceans and their resources (Kean, Posnanski, Wisniewski, & Lundberg, 2004). At present, numerous studies on ocean literacy had also based their test construction framework on the aforementioned definition (e.g., Fauville et al., 2018;Greely, 2008;Guest et al., 2015;Markos et al., 2015;Plankis & Marrero, 2010;Steel, Smith, Opsommer, Curiel, & Warner-Steel, 2005).
A number of testing tools for ocean literacy and knowledge have already been developed (e.g., Greely, 2008;Markos et al., 2015;Plankis & Marrero, 2010;Steel et al., 2005). Greely (2008) developed an ocean literacy testing tool that consists of 57 multiple choice questions (MCQ) and was designed for evaluating the ocean literacy (and other related knowledge) of students in the United States. The items of this testing tool consist of closed-ended questions, which facilitated the application of statistical analysis. Distractors were used to measure the level of misconceptions about scientific ocean knowledge among students. Greely's (2008) research results showed that the ocean knowledge content and environmental attitudes that students possess will significantly affect their ocean literacy performance. Markos et al. (2015) subsequently revised the ocean literacy and experience testing tool developed by Greely (2008) and developed a Greek version of the Survey of Ocean Literacy and Experience (SOLE), which was designed for use by preservice elementary school teachers. For the construction procedure of this test, Greely's (2008) analysis model was also used, with the Rasch analysis method being performed (to ensure the internal consistency and construct validity of the test's Greek version) and the items being tested for differential item functioning (DIF), so as to ensure the quality of the research results.
Despite the fact that Taiwan is surrounded by an ocean, there has been a relative lack of research on the development of a Chinese version testing tool for scientific ocean knowledge or ocean literacy. Lwo, Chang, Tung, and Yang (2013) utilized concept maps and open-ended questions to test the ocean concept knowledge and literacy of senior high students in Taiwan. The results of their study indicated that the average passing rate was only approximately 50%, with about half or more of the items having passing rates that were lower than 50%. These results demonstrated that these senior high students did not acquire (during the elementary and junior high school stages) a sufficient level of understanding and literacy with respect to scientific ocean concepts. They also revealed that there was a relative lack of courses that had teaching materials which contained ocean-related information. The researchers also compared the results between the various schools that the students were studying at and uncovered significant differences among the schools in terms of their students' ocean literacy. Chang, Yang, and Lwo (2014) conducted a study involving vocational high school students as participants and utilized the questionnaire and concept map approach to carry out qualitative rating. A total of 285 vocational high school students took part in the study, which aimed to understand the level of ocean literacy among vocational high school students and provide a reference for the development of ocean-related courses for vocational high schools. The results of this study indicated that there was much room for improvement with regard to the ocean literacy of vocational high school students, and that the average passing rate for the ocean literacy questionnaire was less than 50%.
Chinese versions of such testing tools have been developed and studies on ocean literacy have been conducted, and the testing tools and questionnaires used in these studies were shown to possess a certain degree of reliability and validity. However, those questionnaires were not broken down into subtests by construct or specific ocean literacy sub-competencies. Moreover, the tools were used to assess open-ended tasks and, thus, could not be used to measure the ocean literacy ability of students in a quick and simple manner.
The scoring of open-ended tasks is also likely to be influenced by the scorer, which means that such tools are not well suited for testing the ocean literacy ability of most senior high students. It is therefore necessary to develop an ocean literacy inventory that meets the requirements for standardized tests.

Measures
The measurement framework for the ocean literacy inventory developed for this study was based on the ocean literacy and knowledge and seven basic principles of ocean literacy. Greely (2008) developed an ocean literacy testing tool called the SOLE and modified by Markos et al. (2015) for use with Greek preservice teachers. Greely's instrument consists of 57 MCQ and was designed for evaluating the ocean literacy (and other related knowledge) of students in the United States. The full test in this study consists of the following subscales: P1-features of ocean, P2-ocean and its life shape earth, P3-weather and climate, P4-earth habitable, P5-diversity of life and ecosystems, P6-ocean and humans are interconnected, and P7-ocean largely unexplored. The inventory contains 48 items, of which 38 are single-answer items and 10 are multiple-answer items. The number of items for each subscale is shown in Table 1. The distribution of the number of items for each of this inventory constructs is similar to that of the test developed by Markos et al. (2015), although there is a comparatively smaller number of items for the P2, P4, and P7, and more items for the P1.
Before the actual test was implemented, a pilot study was conducted first to analyze the difficulty, discrimination, and DIF of the test items. The study applied the item response theory (IRT) to the item analysis. The Rasch analysis method and partial credit model were used to estimate the difficulty of the single-choice items and the multiple-choice items, respectively. The overall item difficulty was found to be in the range of −3.55 to 2.89. Some items were more difficult, but this was necessary as the study aimed to determine whether the students managed to improve their ocean literacy competency via normal learning. Classical test theory was utilized to calculate item discrimination, which was found to be between .139 and .558. It is speculated that the lower item discrimination of some items might have been due to the fact that they were MCQ, which made it possible to guess the correct answers. In addition, the study also utilized a DIF-free-then-DIF strategy (Wang, Shih, & Sun, 2012) to perform a gender-oriented DIF test of the items. The results revealed that two of the items were better-suited for female respondents, whereas another two were better for male respondents. With respect to these four DIF items, the research team invited two senior high earth science teachers and one university ocean education teacher to convene an expert meeting for the purpose of discussing the stems and options of these items. These three experts came to the consensus that these four items were not affected by any DIFrelated problems or contained stems or options narrated specifically for a particular gender. They therefore recommended that the original 48 items can be used in the actual test of ocean literacy that was to be administered to senior high students. Of the items used in the ocean literacy inventory for Taiwan students, 36 items were the same or similar to Greely's original SOLE survey items. For the current study, the original 48 items that were used for students in Taiwan are listed in the appendix.

Participants
The senior high students (aged 16-18 years) of Taiwan were the population from which the sample of this study was collected. Three-stage stratified random sampling was performed in this study. First, stratified random sampling was performed involving schools from the northern, southern, eastern, and central regions of Taiwan. Second, two classes were randomly selected from each of the schools selected in the first step. Finally, 40 classes consisting of 1,050 students, in total, were selected for the implementation of the actual test. The 50-min paper-and-pencil test allowed the 16-to 18-year-old students to provide open-ended answers. The test was administered by members of the test administration committee who had received standardized testing training and visited each school to administer the test. The test was administered from May to June 2017. After the test was completed, the test results were compiled and double-checked to verify the data's accuracy. After the test was administered and 23 incomplete questionnaires were removed from the sample, the final valid sample comprised a total of 1,027 students. Of the participants that made up the valid sample, 604 (58.8%) were male and 423 (41.2%) were female, with their average age being 16.79 years old and the standard deviation thereof being 0.77 years.

Data Analysis
The data analysis of the actual test included the establishment of its reliability, validity, and norm. For the analysis of reliability, Cronbach's alpha was used as an indicator to evaluate the internal consistency of the full test and its subtests. For the analysis of validity, the multitrait-multimethod matrix (MTMM) was used to determine whether the test possessed a good level of convergent validity and divergent validity. Confirmatory factor analysis (CFA) was then performed to determine whether the latent traits of the test items are unidimensional constructs. The Multiple Indicators Multiple Causes (MIMIC) and multiple-group CFA (MG-CFA) models were used to analyze the scale scores for potential ocean literacy competency by gender and to determine whether there were any construct-related differences that stemmed from the difference in gender. The parameter estimation procedures of the MIMIC model and MG-CFA model were implemented by the software Mplus (Muthén & Muthén, 1998-2012. Finally, the raw scores, standardized Z scores, and cumulative percentages of the full test and its subtests were calculated to establish the test's norm.

Reliability and Validity
Reliability analysis. Cronbach's alpha was the indicator used to determine the internal consistency of the full test and its subtests, and the results of this analysis are shown in Table 2 The MTMM test was performed to analyze the seven subtests that made up the test of ocean literacy competency, and the correlation coefficients between the total scores for each subtest are presented in Table 3. The correlations between the subtests ranged between .093 and .641 and reached the significance level of .01. This indicated that, even though bias was present in each subtest during the design process, the presence of small to medium degrees of positive correlation may be explained by the fact that they were all used to assess ocean literacy competency. Conversely, the correlation coefficients between each subtest and the full test ranged between .355 and .865, indicating a mid-to-high degree of correlation that reached the significance level of .01. These results showed that there was a considerable level of consistency between the subtests and the full test in terms of the ocean literacy competencies that were being evaluated. In summary, the test was shown to have possessed a good degree of convergent validity and divergent validity.
The unidimensional validity test was performed to better determine whether the latent traits of each subtest's test items met the requirements for unidimensional constructs, or in other words, whether each item of a subtest was able to measure a single ocean literacy competence trait. All of the items of each subtest were subjected to a single-factor CFA, during which model fit indices were used to determine whether the items of each subtest had met the requirements for unidimensional constructs. As all of the item score results for the ocean literacy test were binary data that indicated only true or false states, they were, statistically speaking, dichotomous variables. Based on the recommendations of Muthén andMuthén (1998-2012), the CFA for this study should utilize estimation methods and model fit indices that are well suited for these data, hence the application of the weighted least squares means and variance (WLSMV) estimation method (and its robust model fit indices), which covered the root mean square error of approximation (RMSEA) and comparative fit index (CFI), as the basis for assessing the goodness of fit of the study's structural equation modeling (SEM) models (Muthén & Muthén, 1998-2012. Due to space constraints, the study was only able to present the unidimensional factor analysis path diagram of the P1-features of ocean subtest. In the figure, f 1 represents the potential competency of the features of ocean subtest, and I I I 1 2 15 , ,. . ., represent the items of the subtest. Table 4 presents the goodness of fit of the seven subtests' test models. These goodness of fit indicators can provide a basis for determining the goodness of fit of these models and data. As the χ 2 test of model fit is more easily affected by sample size, the relatively more stable RMSEA and CFI indicators were used for this study. If the RMSEA indicator is smaller than or equal to .05, the goodness of fit is good. RMSEA indicators from .05 to .08 indicate an acceptable goodness of fit, whereas anything greater than .08 or smaller than .1 is considered a moderate goodness of fit. As for a figure greater than .1, this would indicate a bad goodness of fit (Browne & Cudeck, 1993). A CFI indicator greater than .95 has a good goodness of fit, whereas one that is greater than .90 is considered to have an acceptable goodness of fit (Hair Jr, Black, Babin, & Anderson, 2010). As shown in Table 4, the RMSEA of the seven subtests ranged between .000 and .072, indicating that the models' goodness of fit was within the acceptable to good range. The seven subtests' CFIs ranged between .901 and 1.00, indicating that the requirements for goodness of fit have been met. Based on these results, it was thus inferred that these models met the requirements for unidimensional constructs.
Past studies have proposed that gender-related differences were one of the primary reasons why tests of ocean literacy competency produced differing results. To verify whether the propositions of the ocean literacy competency test and its subtests differed in terms of scores and constructs due to gender differences, the MIMIC and MG-CFA models were used to determine whether the students' genders influenced their latent trait scale scores for ocean literacy and the tests' constructs. Due to space constraints, the study was only able to present the MIMIC model path diagram of the P1 features of ocean subtest (see Figure 2). Compared with Figure 1, Figure 2 features the addition of the covariate X 1 , and the factor f 1 is regressed with respect to covariate X 1 . The results of the analysis are shown in Table 4, which shows that the seven models' RMSEA values were all below .058 and thus demonstrated an acceptable to good degree of goodness of fit. As for the seven models' CFI values, they were all higher than .967, indicating that the requirement for the models' goodness of fit to be greater than .9 was fully met.
The MG-CFA model was used to verify whether differences would arise between the constructs of the potential competencies and the items. The factor loadings of the seven models were, respectively, fixed and equated within the two gender groups, so as to ascertain whether the seven models' goodness of fit had met the requirements for goodness of fit indicators. This procedure was also used to verify whether the two gender groups shared an identical measurement structure. As shown in Table 4, the seven models' RMSEA values were all below .058, indicating an acceptable to good goodness of fit, and that their CFI values were higher than .058, indicating that the requirement for the models' goodness of fit to be greater than .9 has been fully met. These results showed that the single-factor constructs of the subtests did not differ as a result of gender.  Note. P1 = features of ocean; P2 = ocean and its life shape earth; P3 = weather and climate; P4 = the ocean made earth habitable; P5 = diversity of life and ecosystems; P6 = ocean and humans are interconnected; P7 = ocean largely unexplored; CFA = confirmatory factor analysis; MIMIC = Multiple Indicators Multiple Causes; MG-CFA = multiple-group confirmatory factor analysis; RMSEA = root mean square error of approximation; CFI = comparative fit index.

Norm Established
To provide first-line educators with a convenient and easyto-use reference and basis for assessing the ocean literacy competency of students, raw scores, standardized Z scores, and cumulative percentages (which are among the easiest indicators to calculate) were used as the basis for establishing norm. By doing so, even a student would be able to make simple conversions and know his or her raw scores and relative ranking with respect to the full ocean literacy test or its subtests. Table 5 shows the conversion table for raw scores, standardized Z scores, and cumulative percentages. Figure 3 shows the distribution for the standardized Z scores. Otherwise, this study also provides all students' abilities through Rasch model measures. Figure 4 shows the distribution for the abilities of all 1,027 students, which ranged from −2.41 to 1.62. The average ability was .01, and the standard deviation was .78. In this study, raw scores (which are the most commonly used indicator) were used to perform an independent sample t test for gender, so as to understand the performance of male and female students in terms of overall ocean literacy competency and the various constructs of ocean literacy competency, and to determine whether there is a significant difference between the two genders when this commonly used scoring method is used. Table 6 presents the t test summary table of the male and female students' ocean literacy scores. The results showed that the female students outperformed the male students in the P3 and P6 subtests, whereas the male students outperformed the female students in the P7 subtest. The P3 and P6 test questions primarily covered ocean climate change and human-ocean interactions. These topics were brought up more frequently in daily teaching activities. Past studies have indicated that female students perform better for questions that are more closely related to the teaching locations (e.g., Coley, 2001;Ding, Song, & Richardson, 2007;Lohman & Lakin, 2009;Willingham & Cole, 1997). The P7 subtest covered issues that were rarely brought up in textbooks and daily teaching activities and contained questions that reflected stronger elements of scientific exploration and inference. In this area, the male students performed better and demonstrated a stronger ability to solve less familiar problems that were based on scientific inquiry. As for the students' performance with respect to the other constructs and the full test, there were no gender-related differences.

Discussion
This study provided an easy-to-use and feasible standardized testing tool for assessing the ocean literacy competency of senior high students. In addition to overall ocean literacy competency, subtests were also developed to test students on their competency in the subtopics that were often emphasized in the past literature (features of ocean, ocean and its life shape earth, weather and climate, earth habitable, diversity of life and ecosystems, ocean and humans are interconnected, and ocean largely unexplored). A variety of statistical methods were used to evaluate the psychometric properties of these items, and the results indicated that the design of the test items was in line with the study's test development objectives. Moreover, the actual data also proved that these items met the reliability and validity requirements for test development. The psychometric properties of the overall test were also within a reasonable and acceptable range. The testing tools of this study can be used in academic research  Note. f 1 is the latent variable (ability) measured by 15 items ( ,. . ., ).
I I 1 1 5 Table 5. Raw Score and Standardized Z Score Table. P1  conducted for noncommercial purposes. Please contact us if you are a researcher who is interested in using the tool, and it will be provided at no cost. Globally speaking, there has been a general lack of tools developed or designed for assessing the ocean literacy competency of senior high students (16-18 years old). In Taiwan, the only available tools that can be applied to senior high students are the Ocean Science Literacy Questionnaire developed by Chang et al. (2014) and the Ocean Science Misconception Assessment developed by Lwo et al. (2013). These two tests use open-ended questions to test the ocean literacy competency of students, and they are not broken down into subtests according to specific constructs or subcompetencies. As a result, their final assessment results can only serve as an indicator of overall ocean literacy competency. A clearer understanding of student performance with respect to specific constructs or sub-competencies of ocean literacy cannot be achieved through these testing tools, which means that they cannot fully cater to the different needs that may arise. To address this issue, a full set of tests for assessing various ocean literacy sub-competencies was developed.
With regard to the ocean literacy competency of students, research institutes tend to prefer using large samples and focus on collecting large volumes of data, and this applies regardless of the research methods and testing methods (for testing tools) that they adopt. This mentality is driven by the desire to facilitate the actual promotion and application of these tools. Chang et al. (2014) and Lwo et al. (2013) utilized the concept map method to investigate the ocean literacy of senior high students. However, as the tests that they had developed were made up of open-ended tasks, it was impossible for these tests to measure the ocean literacy competency of students in a swift and simple manner. Furthermore, the scoring of open-ended tasks is more easily affected by the bias of the scorer, which meant that it was not possible to rate the responses of students in a consistent manner. Therefore, testing tools that are based on open-ended tasks tend to incorporate simpler topics and fewer items, making them unsuitable as tools for testing the ocean literacy competency of a large number of senior high students.
In response to the lack of related tests for ocean literacy, an ocean literacy inventory for senior high students (16-18 years old) was developed. This inventory possessed a good degree of reliability and validity and exhibited good item properties. It is hoped that the tool can be used to assess the ocean literacy competency of students and monitor their growth curves, such that teachers who teach ocean literacy can better understand the prior experience and competency levels of their students before they begin teaching them. With respect to practical teaching, effective measures may be used in response to the varying standards of competency among students, and the appropriate guidance and assistance can be provided to meet the educational goal of catering to the specific needs of each individual student.   Note. P1 = features of ocean; P2 = ocean and its life shape earth; P3 = weather and climate; P4 = the ocean made earth habitable; P5 = diversity of life and ecosystems; P6 = ocean and humans are interconnected; P7 = ocean largely unexplored. *p < .05. **p < .01.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.