Translation and Cross-Cultural Validation of Korean Version of the Menstrual Distress Questionnaire

Given the increase in cross-cultural studies, there is a need for adequately translated and validated study instruments. Using instruments translated into participants’ native language can lower barriers to study participation and increase study validity. The purpose of this study was to describe the translation and validation processes of the Korean version (MDQ-K) of the Menstrual Distress Questionnaire (MDQ). The MDQ was translated into Korean through a forward-and-backward translation process, followed by expert review and pilot testing among 100 bilingual Korean students. The equivalence of MDQ-K to MDQ was tested through bivariate Pearson’s correlations and paired t tests. The psychometric properties of the MDQ-K were evaluated through internal consistency and construct validity (confirmatory factor analysis). The reliability of the questionnaire was good (Cronbach’s α = .96). The results of confirmatory factor analysis revealed an acceptable model fit to the data. Overall, the MDQ-K demonstrated acceptable psychometric properties, although paired t tests found significant differences (p < .05) between the MDQ and MDQ-K in the means of three items: “restlessness” (Item 22), “bursts of energy and activity” (Item 40), and “blind spots and fuzzy vision” (Item 46). Possible explanations for these discrepancies include the participants’ varying English proficiency levels, issues with understanding medical terminology, and absence of words with the same meanings in different languages. We also discussed possible deletion of questionnaire items through further factor analysis.


Introduction
Once a month, most reproductive-aged women experience menstruation (Anjum & Jami, 2018). Women often have negative perceptions of menstruation due to multiple unfavorable physical and psychological symptoms as the menstrual period approaches (Gollenberg et al., 2010;Kollipaka et al., 2013). Many studies have examined menstruationrelated distress among women with the aims of addressing poor perceptions of menstruation related to a lack of education and improving negative attitudes toward menstruation (Gollenberg et al., 2010;Kollipaka et al., 2013). As women's cultural circumstances play important roles in shaping menstrual experiences, the types and degrees of menstrual distress vary widely, depending on the specific culture (Anjum & Jami, 2018;Tan et al., 2017). Therefore, studies on menstrual distress in different countries have taken care to consider unique cultural factors and to use their native language when assessing women to better capture the subtle nuances (Chang et al., 1999;Murakami et al., 2008). In conducting such studies, many researchers have utilized preexisting instruments, translated, and used in their studies. Some of the commonly used instruments for menstrual symptoms include the Menstrual Distress Questionnaire (MDQ; Moos, 2010), Shortened Premenstrual Assessment Form (SPAF; Allen et al., 1991), and Prospective Record of the Impact and Severity of Menstrual Symptoms (PRISM; Moher et al., 2009).

The MDQ
The MDQ is the most commonly used instrument in the United States to access premenstrual symptoms (Y. Lee & Im, 2016). The purported purpose of the MDQ was to assess menstrual cycle symptomatology (Moos, 2010). The initial evaluation of the MDQ done with 839 women who were wives of graduate students in large Western universities (Moos, 2010). It is a 46-item scale focusing on eight symptomatic factors of reproductive-aged women: (a) pain (Items 1-6), (b) water retention (Items 7-10), (c) autonomic reaction , (d) negative affect , (e) impaired concentration (Items 23-30), (f) behavior change (Items 31-35), (g) arousal , and (h) control (Items 41-46; Moos, 2010). The scale uses 5-point Likerttype scale. A summed total score, total mean score, or scores by eight factors can be used from this scale (Moos, 2010). Although several Korean versions of the modified MDQ exist, it only uses selected items from the original MDQ, which limits its use (e.g., employing either the scoring system or interpretation guideline of the original MDQ and comparing findings of studies that used the original MDQ; Kim, 2004;Moos, 2010). The use of same validated instruments across different cross-cultural and international studies allows ease of comparison of the study findings. However, well-validated translated instruments are often unavailable, and researchers may be faced with a multitude of translated versions of a single instrument (Chang et al., 1999).

Translation and Validation
Although most researchers agree on the importance of adequate instrument translation and verification processes, approaches to these processes are inconsistent and vague (Cha et al., 2007;Squires et al., 2013). To retain adequate cross-cultural validity, five criteria should be considered in the translation process: content equivalence, semantic equivalence, technical equivalence, criterion equivalence, and conceptual equivalence (Squires et al., 2013). Adherence to these criteria can be validated by the development of a translation guide, forward-and-back translation, and an expert panel review (Squires et al., 2013;Wild et al., 2005). In the forward-only translation method, an instrument in the original language is translated into the target language (C. C. Lee et al., 2009). In contrast, in forward-and-back translation, the instrument must be translated at least twice by different translators: original-to-target-to-original language (C. C. Lee et al., 2009). The original instrument and back-translated instrument (also in the original language) are then compared with respect to equivalence, and the forward-and-back method is continued until the translators reach a consensus (C. C. Lee et al., 2009). The expert panel review can be done using a scoring process, such as the Likert-type scale (Squires et al., 2013).
The reliability and validity of an instrument can be tested using various measures (Mokkink et al., 2010). Its reliability is often based on internal consistency and test-retest correlations, whereas its validity is determined by content validity and construct validity (Che et al., 2017). This process can be done through monolingual or bilingual testing (Sperber, 2004). The bilingual testing method is considered more rigorous and precise, as the translated instruments are tested using individuals who understand both languages and compare the instruments in both languages (Son et al., 2000;Sperber, 2004). The committee approach, in which a number of experts discuss the translated instrument as a team or conduct pilot tests of the translated instrument, is often used in determining the reliability and validity of an instrument (Maneesriwongul & Dixon, 2004). Within the confines of time and resources, a combination of multiple techniques can help to establish equivalence between the original and translated versions of instruments (Maneesriwongul & Dixon, 2004). The purpose of this study was to describe the translation and validation process of a translated Korean version of the Menstrual Distress Questionnaire (MDQ-K).

Method
The development of the MDQ-K involved four steps: (a) obtaining permission to translate, (b) forward-and-back translation, (c) expert review, and (d) pilot testing with bilingual students. The content validity was established through an expert review. Through the pilot tests with bilingual students, the reliability and construct validity of the questionnaire were analyzed.

Obtaining Permission to Translate
The research team contacted the copyright holder (Mind Garden) of the MDQ and obtained permission to translate the instrument into Korean. No financial or other conflict of interest was incurred in this process.

Forward-and-Back Translation
The two research assistants, Korean-English bilingual female doctoral students in nursing conducted the forwardand-back-translation (Chang et al., 1999). Translator A translated the original version of the instrument (English Version 1) to Korean. Translator B translated the Korean Version 1 to English (English Version 2). Each translation process was blinded (i.e., Translators A and B performed their translation separately and did not communicate). The English Versions 1 and 2 were compared item-by-item, and differences were identified, discussed, and modified in the presence of the two translators and the research project investigator.

Expert Review
The agreed-upon translated version was sent to five experts to establish the instrument's content validity and necessary revisions. The experts were female professors in Korean nursing schools whose research area was menstrual health. The content validity indexing technique was used in the evaluation (Squires et al., 2013). The experts rated each questionnaire item on a 4-point Likert-type scale (1 = inappropriate, 2 = somewhat appropriate, 3 = appropriate, and 4 = very appropriate) (Squires et al., 2013;Yu, 2010). Items that scored less than "2" by two experts were revised for clarity (Yu, 2010).

Pilot Testing With Bilingual Students
Participants. The revised questionnaire from the expert review was named the MDQ-K and pilot tested with 100 bilingual Korean female students studying in the United States. The Institutional Review Board approval was obtained from the primary investigator's affiliated institution (#820966) before initiating the study. The sample size of the study was calculated to be 90 based on G*Power 3.1.9.2 software, with a power of .80, alpha level of .05, and an effect size of 0.3. The inclusion criteria for the study were the following: (a) female students; (b) aged between 20 and 40 years; (c) South Korean identity; (d) ability to read, write, speak, and understand Korean and English; (e) enrolled in U.S. institutes, with an official school email address; (f) access to the internet; (g) presence of menstruation; and (h) experienced symptomatic changes in the menstrual cycle. The Korean international students studying in the United States were targeted for their bilingual language abilities and them being generally young adults, considering menstrual distress is more often seen in young reproductive-aged women (Meers et al., 2020).
Patient and public involvement. Patient and public were not involved in the design of this study. Korean international students (healthy women; public) were the participants in this study. Once the study has been published, the participants will receive a copy of this article through their provided emails.
Data collection. The study data were collected between September and October of 2014. Participants were recruited using a convenient sampling method through online communities (e.g., Korean international students' associations, Korean Americans online talk lounges). The study was advertised in online communities for Korean international students studying in the United States. To minimize the possible selection biases, the study advertisement was posted on free bulletin boards where any Korean international students can read without log-ins. The participants were asked to use school emails for the study, so that the researchers can verify them being students.
A total of 100 potential participants were asked to sign electronic informed consent and answer screening questions, which were available through the online study advertisement link. The eligible participants were assigned a study ID and asked to answer questions on sociodemographics and provide baseline information in the original English version of the MDQ and the MDQ-K on the same day. The requirement to complete the questionnaires on the same day was due to inherent natural variability in answers of symptoms on a daily basis due to the menstrual cycle. To minimize possible recall bias, the sequence of items in the translated version of the questionnaire was mixed (Son et al., 2000). For example, when the MDQ was asked from Items 1 to 46, the MDQ-K was asked with mixed order of items. After completing the 10-min-length study questionnaires, each participant received US$15 online gift cards.
Study variables and instruments. Alongside answering questions from the original English version of the MDQ and the MDQ-K, to better interpret the findings of the study, the participants were also asked to answer questions associated with menstrual health (e.g., age, gravidity, age at menarche, and duration and regularity of menstruation) and language/cultural proficiency (e.g., educational status, major, and degree of acculturation; Y. Lee & Im, 2016;Sadler et al., 2010). The participants' degrees of acculturation was assessed using the 21-item Korean version of Suinn-Lew Asian Self-Identity Acculturation Scale, which includes questions about language, self-identity, community, and cultural preferences (Suinn et al., 1992). The scale has been tested among Korean Americans and demonstrated good construct validity and internal consistency, with a Cronbach's alpha of .91 (Jackson et al., 2006;Shim & Schwartz, 2008).
Data analysis. The participants' answers were automatically coded through the REDCap survey system. The SPSS 22.0 and Mplus 7.3 were used for the data analysis. Two participants missed answering a question on smoking and one participant missed a question on caffeine intake. The missing data of the study was assessed for missing at randomness. Without imputation or deletion for missing data, all analyses were conducted with Full Information Maximum Likelihood (FIML; Enders & Bandalos, 2001).
Descriptive analyses were reported using frequency, percentage, mean, and standard deviation. The equivalence of MDQ-K to MDQ was tested through bivariate Pearson's correlations and paired t tests. The reliability of the MDQ-K was calculated by analyzing its internal consistency using Cronbach's alpha, and the construct validity was assessed using confirmatory factor analysis to confirm the underlying dimensions of an instrument (Son et al., 2000). The model fit was evaluated using chi-square statistics, comparative fit index (CFI), Tucker-Lewis index (TLI), and root mean square error of approximation (RMSEA; Browne & Cudeck, 1993;Elavsky & Gold, 2009;Kilbride et al., 2003;Reid et al., 2015). Generally, the acceptable model is represented by χ 2 /df of below 4, and CFI and TLI more than 0.90 (Browne & Cudeck, 1993;Elavsky & Gold, 2009). The RMSEA below 0.05 represents close model fit and below 0.08 represents reasonable model fit (Browne & Cudeck, 1993;Elavsky & Gold, 2009).
In addition to the purpose of this study, we attempted to screen out relatively insignificant items. Each of the 46 items' factor loading and each factor's construct reliability (CR) and average variance extracted (AVE) were accessed (Pedrosa et al., 2016). Commonly acceptable factor loading is defined as a value more than 0.5, CR as a value more than .7, and AVE as more than 0.5 in previous studies (Chen et al., 2015;Han et al., 2015;Pedrosa et al., 2016).

Expert Review: Content Validity
On the initial version of the translated questionnaire, two items received a score of less than "2" by two experts: "affectionate" (Item 36) and "orderliness" (Item 37). The term "affectionate" was translated as "da-jung-han" by the translators, a word that in Korean is used to describe someone who is kind and pleasant. However, the experts pointed out that the term affectionate has a stronger intonation than "da-jung-han." Based on the experts' suggestions, we selected the word "ae-jung-i-num-chi-nun," which means very affectionate. The word "Orderliness" was translated as "yu-soon-ham," which is used to describe someone who is pleasant and well-mannered. However, the experts suggested that "Orderliness" in English can mean organized and neat, which needed to be included in the translated version. Thus, we added this meaning by including the expression "jil-sojung-yeon-ham" in the Korean translated version.

Pilot Testing With Bilingual Students
Sociodemographic data. After the expert review, the questionnaire was named as MDQ-K and tested in a pilot test with 100 bilingual students. The sociodemographic and baseline data on the participants are summarized in Table 1. The mean age of the participants (N = 100) was 25.94 years. More than half the participants were enrolled in a graduate program and majoring in more than 20 different majors including applied science (e.g., bioengineering, computer science, environmental science), health professions (e.g., nursing, pharmacology, public health), and business/management/finance/economics majors. The participants' degree of acculturation score was 2.29, with a score of 1.0 indicating "Asian identified" and a score of 3 denoting "bicultural." The majority of the participants was single, never pregnant, Christian, had regular menstrual cycles, and had a positive perception of their general health.
Equivalence of MDQ-K to MDQ. The results of the bivariate Pearson's correlations for the paired items showed that there were significant correlations between each of the 46 items and the total scores of the English version and Korean translated version (p < .001), with scores ranging between .572 ("Dizziness and faintness") and .985 ("Total score"). The results of paired t test used to test equivalence of eight factors showed that factor "Arousal" was significantly higher in Korean translated version (M = 2.41, SD = 3.01) than English version (M = 2.68, SD = 3.25); t(100) = −2.038, p < .05 (Table 2). The results of a paired t test comparing each item between MDQ and MDQ-K showed that the means of three items were significantly different between the English and Korean versions ( Table 3). The mean of Item 22 "Restlessness" was significantly higher in the English version (M = 1.10, SD = 1.17) than the Korean translated version (M = 0.75, SD = 1.03); t(100) = 3.697, p < .001. The mean of Item 40 "Bursts of energy and activity" was significantly higher in the Korean translated version  Reliability and construct validity. The reliability of the instrument was determined using Cronbach's alpha (Table 4). The reliability coefficient alpha of the eight factors of both MDQ and MDQ-K ranged between .76 and .92. The reliability coefficient alpha of both the total MDQ and total MDQ-K was .96. The confirmatory factor analysis of eight factors of MDQ-K was performed to estimate construct validity. The data from MDQ-K represented an acceptable overall fit: χ 2 /df was 1.546, CFI was .933, TLI was .911, and RMSEA (90% confidence interval) was .074 [.060, .087-->. Based on additional analysis on each item's factor loading and each factor's CR and AVE, 18 items with factor loading below 0.5 and factor "control" were removed. Each of the remaining seven factors showed CR above .7 and AVE above 0.5 (Table 5).

Discussion
The overall MDQ-K demonstrated good reliability and construct validity, as shown by the pilot test with the bilingual students. However, the scores for three of the 46 items, "Restlessness" (Item 22), "Bursts of energy and activity" (Item 40), and "Blind spots and fuzzy vision" (Item 46), were significantly different in the MDQ-K versus the original English version. There are several possible reasons for these differences.
First, it is possible that the participants' English proficiency levels differed and that some of the participants found the meanings of some words difficult to understand. After  completion of the questionnaires, some of the participants stated that they found the English version of the instrument slightly difficult to understand. Although the inclusion criteria for the study required that the participants could read, write, speak, and understand Korean and English and that they were enrolled in U.S. institutes, these criteria do not necessarily mean that all the participants were fluent in English. It is possible that the participants found the expression "Bursts of energy and activity" (Item 40) relatively difficult to understand, as many of the other items were single words. Previous studies have emphasized the importance of considering study participants' reading comprehension level when administering a pre-existing survey questionnaire, as the level does not always meet the targeted literacy level of the questionnaire (Flory & Emanuel, 2004;Suka et al., 2014). Assessing the readability grades of MDQ and MDQ-K would have been helpful. Second, some of the medical terminologies could have confused the participants. For example, the word "Restlessness" (Item 22) may have been more easily understood if more commonly used terminology, such as nervousness or anxiety, had been used. Many Koreans may understand "Restlessness" as having no rest (i.e., "rest" plus "less") and being tired. Another possibility is that, ironically the translators' efforts to avoid the use of medical terminology could have contributed to the difference in understanding of certain items between the two languages. For example, many English-Korean dictionaries describe "blind spots" (Item 46) as "meng-jum," which is a word originated in Chinese characters and more often used in medical settings than in daily life. The translators added a brief explanation of the word in parentheses. This effort could have resulted in the difference in understanding the item between the two languages.
Third, the absence of words with the same meanings in different languages may have generated differences in items between the two languages. When translating a single word into another language, it is almost impossible to find a word with the exact same meaning and nuance (Khalaila, 2013). For example, the Korean translation of "Orderliness" (Item 37) required two concepts "jil-so-jung-yeon-ham" and "yusoon-ham." There was no single word in Korean that included both concepts, as described in the "Results" section.
The overall reliability and validity, including results from confirmatory factor analysis, revealed that the MDQ-K can be used as it is. However, additional analysis on each item's factor loading and each factor's CR and AVE has suggested research questions for further studies. Although this study was conducted to retain all 46 items from the original MDQ in creating the MDQ-K, it is possible that the 18 items screened out for having weak factor loading or are relatively unimportant items to Korean women. In addition to addressing the aforementioned three items (Items 22,40,and 46), several refinements of the MDQ-K through exploratory factor analysis and confirmatory factor analysis would be helpful for enhancing the quality of the questionnaire.
This study has several limitations. First, we did not assess the participants' English proficiency level prior to the study. We could have achieved more reliable study findings had we included participants with a language level matched to that of the participants in the original MDQ. A second limitation is that the study consisted of only Korean international students enrolled in U.S. institutes. Although we intentionally limited the study participants to this population because their Korean and English proficiency levels were inferred to be suitable for comparing questionnaires in two languages, the homogeneous characteristics of the study population (e.g., the participants' educational level) limit the generalizability of the study findings. Third, culturally specific menstrual symptoms or factors affecting the symptoms were not explored in this study. Future qualitative studies can explore these points to further test the adequacy of MDQ-K to Korean women. Finally, this study is limited for conducting factor analysis by its small sample size. Adequate sample size for factor analysis is determined by the ratio of the number of items to the number of samples (i.e., 1:3 means three samples are needed to measure one item; Bujang et al., 2012). For factor analysis in the medical field, where recruitment of participants may be difficult, statisticians recommend a ratio of 1:2 (Bujang et al., 2012). Therefore, our sample size of 100 participants for a 46 item questionnaire may be considered adequate. However, this is the minimum acceptable sample size. Especially for the confirmatory factor analysis, when there are six to 12 factors in a measure, a sample size of approximately 500 should be used whenever possible (Koran, 2016). As inadequate sample size can risk the power of the study findings, we recommend further studies with a larger sample size to confirm the results of factor analysis of the MDQ-K. Moreover, there is no single analytic test for testing hypothesis. We recommend future studies to test MDQ-K using various analytic methods (e.g., congruence coefficient analyses, concurrent validity tests, content validity tests from women with menstrual distress; Mokkink et al., 2010).

Conclusion
Many of the reproductive-aged women experience multiple menstrual symptoms. A good scale can support the adequate assessment of these symptoms. This study introduced a rigorous four-stage process for the translation and validation of a pre-existing instrument (the MDQ) into a new language, Korean. Overall, the MDQ-K demonstrated acceptable psychometric properties. However, some of the items may be open to further modification (Items 22, 40, and 46). Possible explanations for discrepancies between some of the items in the MDQ and MDQ-K and possible deletion of items through further factor analysis were discussed. This study has a number of implications for future research and practice. First, when administering a translated questionnaire, it is important to determine whether the equivalence of the translated version to the original has been established before administering the instrument to the study participants. Second, the translation process of a questionnaire into a different language needs to be further explored. The rigorous fourstage questionnaire translation process introduced in this study was based on the literature (C. C. Lee et al., 2009;Squires et al., 2013). Although the MDQ-K carefully followed the steps described in the literature, some items may require further modification. We suggest repeating some of the steps to improve the equivalence of the translated version of the questionnaire to the original. Third, more studies are necessary to ensure a comprehensive validation of the MDQ-K. In addition to calculating the internal consistency of the instrument and conducting paired t tests or confirmatory factor analysis, other validation techniques, including validation against other instruments or assessment methods (e.g., biomarkers), could be useful. Moreover, as the original MDQ is based on symptoms experienced by Western women, it would be useful to conduct studies to screen out MDQ items irrelevant to non-Westerners, newly group the items through exploratory factor analysis, or explore potential ethnic-specific symptoms that could be added.
As we are living today in an era of supraterritoriality, where physical place is becoming less and less meaningful, the importance of cross-cultural and international studies will continue to grow. We expect this study to be of help to researchers who intend to conduct studies using either English or previously translated questionnaire to non-native English speakers, or who plan to personally translate an English questionnaire for use in their study.