Advance Translation—The Remedy to Improve Translatability of Source Questionnaires? Results of a Think-Aloud Study

Advance translation is a method of source questionnaire development for multilingual survey projects to enhance translatability and (inter)cultural portability. The aim is to minimize translation issues in the final translation stage. I empirically tested the results of a previously conducted advance translation in a think-aloud study and analyzed the utterances made in a mixed-method approach, calculating chi-square statistics and cross-checking these by observational notes of the think-aloud sessions. My study confirms the usefulness of advance translation in making source items better to translate, thus improving final translation quality. It appears to be particularly useful for comprehensibility issues of the source text, irrespective of the target language. I recommend that advance translations be carried out into all languages and cultures into which the final source questionnaire is to be translated. This will improve source questionnaire translatability and, thus, final translation and overall cross-cultural data quality.


Introduction
Questionnaire translation errors are known to be a potential source of errors in cross-cultural survey projects. 1 As such they do, for instance, form part of the Cross-National Error Source Typology developed by Fitzgerald et al. (2011). To avoid such errors, much effort has been invested in optimizing the methodology for carrying out such translations. Above all, the committee or team approach, with its more elaborate form of TRAPD (Translation-Review-Adjudication-Pretesting-Documentation), as developed by Harkness and colleagues, needs to be cited as the state-of-the-art approach when translating questionnaires. TRAPD consists of (1) a multidisciplinary team of translators/linguists and survey methodologists/social scientists; and (2) a multi-step translation process in which the translations run through several steps, allowing enough time and different constellations for being fine-tuned (Behr and Shishido 2016;Harkness 2003;Harkness et al. 2004Harkness et al. , 2010. The importance of correct questionnaire translations for the comparability of the resulting data in cross-cultural survey projects is discussed, for instance, by Mohler and Johnson (2010).
In the translation industry as well as in translation studies it is known that, for the translation process and the quality of the resulting translations, the source text (the text out of which one translates) plays an important role. Source text issues are frequently detected when translating these source texts (Hauck 2004). Janet Harkness and Alisú Schoua-Glusberg noted in the context of questionnaire translation that "many translation problems linked to source text formulations only become apparent, even to experienced cross-cultural researchers, if a translation is attempted" (Harkness and Schoua-Glusberg 1998: 105), and Harkness recommended carrying out advance translations as a "problem-spotting tool" (Harkness 2007:89).
Consequently, survey researchers have become increasingly aware of the need to take into account the source questionnaires' translatability at their development stage to minimize translation problems and errors and thus increase the overall quality of the final questionnaire translations: If the difficulty or even impossibility-if only for one language-of translating a term or expression, or even a concept, can be detected during the questionnaire design stage, a solution needs to be found during the questionnaire design stage (Dorer 2020:35). Measures are needed to detect the problematic elements in the source questionnaire before its finalization, thus, before translating the source questionnaire into multiple target languages (languages to translate into).
The two methods that have been implemented for detecting and minimizing issues related to translatability and (inter)cultural portability of source questionnaires are Translatability Assessment (see, e.g., Acquadro et al. 2018or Stathopoulou et al. 2019 and Advance Translation (Dorer 2020). Both methods use the activity of translating while the source questionnaire is developed. Although the implementation of both methods differs between projects, the overall difference is that advance translation, as applied, for instance, in the European Social Survey (ESS), is based on the full interdisciplinary team approach involving at least two translators and a reviewer, whereas translatability assessment relies on more languages assessed, but only by one person for each of these, mostly translators or linguists, and no social scientists.
Enhancing translatability is understood as making a text less problematic to translate.
Issues of "cultural portability" refer to situations where the concept being measured does not exist in all countries. Or the concept exists but in a form that prevents the proposed measurement approach from being used (i.e., you can't simply write a better question or improve the translation). For example, to measure religiosity a different question might be needed in a Christian country compared to a Muslim one. (Fitzgerald et al. 2011:570) Advance translation has been applied during source questionnaire development by several cross-cultural surveys, the most prominent one being the ESS, where advance translations have been implemented systematically since round 5 (i.e., 2009-2010) (Dorer 2015(Dorer , 2020. Advance translation consists of translations of pre-final versions of a source questionnaire, with the purpose of detecting and minimizing translation problems and mistakes in the final translation step. In the ESS, a small number of national teams translate a pre-final version of the source questionnaire into their mother tongues. These translations follow the interdisciplinary and multi-step team or committee approach (TRA in TRAPD) (see above). The advance translation teams receive a translation template in which they write down their translations and the comments they have in terms of translatability and (inter)cultural portability of the source text (Lyberg et al. 2021:55). They are asked to comment both in their own words and by selecting from a preselected list of problem categories. The comments made are more important than the translations delivered. All team members agree on comments in a Review meeting, and these are considered in the further source questionnaire development process. Modifications have mainly three forms: (1) adding annotations to explain the source text; (2) rewording source text expressions; and (3) restructuring the source text by modifying the design of source items. In some items, advance translation triggers only one change, in other items, more than one.
An example of a change triggered by advance translation from the ESS: an item before advance translation read: "To what extent do you receive help and support from other people when you need it?" Advance translation teams asked whether financial or personal help or support should be referred to. In the final source questionnaire, an annotation was added explaining that "help and support" was to be understood "whether emotional or material." Had this precision-provided after advance translation-not been made, possibly some of the 30+ language versions may have translated help and support in the financial sense, and others in a personal sense, and the resulting survey data would not have been comparable. This example shows how important advance translation is for the quality of the final survey data. For a detailed explanation of advance translation carried out in the ESS, see Dorer (2020).
While translatability assessment in the medical field has been subjected to usability checks (Acquadro et al. 2018;Conway et al. 2014), prior to my study in the field of social sciences, neither translatability assessment nor advance translation had been subject to empirical testing. And not much had been known about the mechanisms determining the outcome of advance translations: Is the type of changes triggered by advance translation decisive for the method's success (success of advance translation understood as the success in making source questionnaires less problematic to translate and increasing their cultural portability)? Do the languages and/or cultures of the advance translation need to be the same as those of the final translations of the survey project? My study was supposed to fill this gap.
My main research question was whether advance translation indeed contributes to enhancing the translatability of source questions. My secondary research questions-which should help determine the features that contribute to the success or failure of advance translations-were to study (1) to what extent the number and type of changes made after advance translation contribute to the success of this method; and (2) to what extent the choice of languages of advance translation and think-aloud study contributes to the success of advance translation.
Linked to these research questions, the following hypotheses guided my study: Main research hypothesis: Advance translation enhances translatability of source items of a cross-cultural survey.
Sub-hypothesis 1: The success of advance translations depends on the type and/ or number of changes made after the advance translation.
Sub-hypothesis 2: The choice of languages and cultures of the advance translation and of the think-aloud study effects the results of a think-aloud study and the success of advance translations.
It should be remembered that the function of think-aloud in my study can be compared to a placeholder. The reactions of the translators to the texts in my think-aloud study can be seen as a proxy for how translators would react to these texts in the actual translation of questionnaires in the real survey projects.

Methods and Data
I applied the method of thinking-aloud in combination with retrospective probes. Thinking-aloud is a method for gaining insight into humans' thought processes by asking them to verbalize everything that passes through their minds, irrespective of whether this is related to a particular research question or activity or not. It is a method of gaining concurrent insight into test persons' thought processes (Ericsson andSimon 1993 [1984]; Jääskeläinen 2017).
Ideally, in thinking-aloud sessions, the study supervisor should not intervene in the test persons' verbalization flow. However, Lörscher (1991) noted that in cases where introspective (i.e., think-aloud) data do not yield sufficient data, retrospective procedures should be added for receiving the level of information sought (Lörscher 1991:279f.). In my case, the test persons (i.e., the translators in my think-aloud study) were under high cognitive burden, given the relatively long thinking-aloud sessions (between two and five hours, including breaks). Thus, where I saw the need for more in-depth verbalization, I applied targeted retrospective probes, asking the test persons retrospectively for thoughts about specific source text features. Retrospective means I asked these probes in the natural breaks that occurred within the thinkaloud sessions between two items. I only probed where I saw the need and did not interfere in the translation flow of one item (Dorer 2020).
In my think-aloud study, experienced questionnaire translators (the "test persons") translated 22 questionnaire items from ESS rounds 5 and 6 where the changes recommended by advance translations had been incorporated in the final source questionnaires. They translated these 22 items in their versions before and after advance translation (i.e., the final source questionnaire versions of ESS5 and ESS6).
The languages of these advance translations had been French (Switzerland) and Polish (Poland) in ESS5, and German (Germany), Czech (Czech Republic), and Turkish (Turkey) in ESS6.
The languages of the think-aloud study were French and German: Out of 12 translators, six translated into French and six into German. They were asked to translate the selected items out of the ESS questionnaires' source language English into their mother tongues, translating as they would translate items for a professional assignment. And they were asked to think aloud while translating. These think-aloud sessions took place in the cognitive lab at GESIS, were video-and audio-recorded and then transcribed, resulting in 264 think-aloud protocols (TAPs). These TAPs were coded in MaxQDA (Bazeley 2010) by me and a second coder according to a coding scheme developed inductively by me. An inter-coder reliability check yielded a Cohen's Kappa of 0.709, which was assessed to be sufficiently reliable.
My coding scheme included 6 levels of hierarchy, with the second level differentiating between "problematic" and "non-problematic" codes (level 1 differentiates between different aspects of the translation processes, such as methods applied or problems). Problematic codes were attributed to utterances where in the think-aloud translation sessions, the translators mentioned any type of problem. Non-problematic codes were attributed to utterances not expressing a problem or even a positive observation. Examples for problematic codes: footnote too vague or problem in answer category; examples of non-problematic codes: source clear/easy to understand or translation sounds nice/good (for details about the coding and the coding scheme, see Dorer (2020)).
The analysis of my think-aloud data followed a mixed-methods approach. The quantitative analysis consisted of a chi-squared test of the codings made (i.e., the frequencies of problematic versus non-problematic codes). Only the differentiation between problematic and non-problematic codes was considered in my quantitative analysis, as this information corresponded best to my main research question: Poor translatability of source items would trigger more problematic codes, whereas a better level of translatability would lead to more non-problematic codes.
The qualitative analysis consisted of observational notes-that is, evaluations of the translation process in which I summarized, for each of the 264 think-aloud protocols, whether the translation process was problematic or not.

Results
Regarding my main hypothesis (advance translation enhances translatability of source items of a cross-cultural survey), the chi-squared statistic calculated across all 22 items was X 2 (1289, 1) = 107.9786, p < .01. The result is significant, and my main research hypothesis was thus supported as the think-aloud study confirmed that the advance translations in ESS5 and ESS6 had made the source items that were the object of this study better to translate. Table 1 (see Supplemental Material) shows the result of the chi-squared test calculated across the 22 items: Thinking-aloud findings on translations of the source items before advance translation were compared to thinking-aloud findings on the translations of the source versions after advance translation; problematic codings were compared to non-problematic codings. Overall, the effect of advance translation is positive, as the improvement was significant across all 22 items. The observed frequencies confirm the usefulness of the changes made after advance translation: The numbers of problematic codings decreased from 514 to 271; the numbers of non-problematic codings increased from 181 to 323.
I also calculated chi-squared statistics for each of the 22 items, and this yielded a mixed picture: In some items, the chi-squared numbers showed an improvement in translatability; for other items such an improvement could not be shown (the chi-squared statics for each of the 22 items are included in Table 2 in the Supplemental Material).
For explaining the quantitative results at the level of individual items, I use the findings from my qualitative analysis (see Creswell and Creswell 2018). This should also help us gain a deeper insight into the mechanisms of advance translation and determine which factors may contribute to its success.
When looking at both the quantitative and qualitative results at the level of individual items, I differentiate three groups Group 1: For nine items, a clear improvement of translatability by advance translation is revealed both by the quantitative and the qualitative analyses.
Group 2: For four items, the advance translation did clearly trigger no improvement of the translatability in my study, as this was neither shown in the qualitative nor in the quantitative analysis.
Group 3: For nine items, the results from the quantitative and qualitative analysis contradict each other: the quantitative analysis did not show a significant improvement of the translatability, but the qualitative result suggested an improved translatability.
I had two secondary research questions: (1) Did the type and/or number of changes have an effect on the success of advance translation; and (2) Did the choice of language(s) and/or culture(s) of the advance translation and of the think-aloud sessions have an effect on the results?
Regarding the first secondary research question, my study resulted in a negative answer. The main types of changes triggered by advance translation were (1) footnotes added; (2) rewording; and (3) structural changes: changes of the structure or design of an item, such as changing the answer scale, or turning a gradation (such as "How satisfied are you with… ?") into a direct question (such as "Are you satisfied with… ?"). All three types are present in Groups 1 and 3, and Group 2 includes (1) and (2). As Group 2 is very small (only four items), it is not possible to determine whether the type of changes made had an effect or not. Also, no direct link between the number of changes made and the success of advance translation could be determined: In all three groups, there are both items with only one change and items with two or more changes.
When looking at the second secondary research question, I could provide evidence that the distance between the languages and/or cultures of advance translation and thinking-aloud was decisive for the success of advance translation. In my study, the languages of the advance translations were Czech, French, German, Polish, Turkish, and the languages of the think-aloud study were French and German. So, in part, the same languages were used for advance translation and thinking aloud (French and German), whereas Czech, Polish, and Turkish were only used in the advance translation.
Regarding language groups and cultural similarity, these five languages can be grouped into three blocks. French and German are West European languages (one Romance and one Germanic language) with similar political systems (West European democracies, at least since after World War II). Czech and Polish are Slavic languages spoken in post-communist countries. Turkish is a Turkic language spoken in a country with a different position regarding freedom of opinion, which was particularly relevant for the results on items about democracy.
In Group 1, where both the qualitative and the quantitative analyses supported the usefulness of advance translation, most of the changes made to the source after advance translation had been triggered by the Frenchlanguage team, or the translatability issues detected can be classified as language-independent.
An example of a change triggered by the French-language advance translation team is an annotation added to the expression "doing the right thing," as this had not been clear enough for translation into French. When the explanation was added that this meant doing the right thing in the sense of morally from a personal point of view, its usefulness was confirmed in my study (translating into French and German).
Many of the changes triggered by advance translation and confirmed in the think-aloud study, pertaining to Group 1, can be classified as universal comprehensibility issues of the source text, independent of the target language. Some examples: The addition of the word "coalition" to the expression "two or more parties in government," as adding this signal word simplifies immediate understanding; or in the expression "to what extent do you take notice of…," advance translators found it incorrect to ask about a certain "extent" to which one takes notice of certain things. Thus, in the final source questionnaire, this expression was rephrased into "How often do you take notice of… ?" In Group 2, neither the qualitative nor the quantitative analysis could determine an improvement of the translatability. Here, the advance translation comments had been mainly made by the teams from Czech Republic, Poland, and/or Turkey, thus linguistically and culturally/politically more distant from the French-and German-speaking translators in my think-aloud study (from Germany, France, and Belgium). In addition, in three of these four items, not mainly the linguistic, but cultural distance between both country groups (Czech Republic, Poland, Turkey versus Belgium, France, Germany, and Switzerland) played a role. An example is the translation of the expression "media provide citizens with reliable information to judge the government" into Turkish. Here, the Turkish advance translation team asked to rephrase the verb "judge"-not because the source text would be difficult to understand, but because the direct translation of the verb would sound harsh and intimidating to Turkish native speakers and a word meaning "evaluate/assess" would be better suited in this context. In these cases, my think-aloud study involving French and German translators could not confirm the success of advance translation in improving source text translatability or cultural portability. Had the languages of the think-aloud study been closer or identical to the languages of the advance translation, the result may have been different. Here further research is needed.
In Group 3, the qualitative analysis showed evidence of an improved translatability whereas the quantitative analysis did not.
In these cases, the distance between advance translation and thinking-aloud languages and cultures was less relevant, but two interpretations seem possible. First, they could be classified as difficult, but not problematic terms. There are translation tasks (whether individual words or whole expressions) that will require a relatively long deliberation process. For instance, where extensive research would be required or where a decision between several correct options needs to be considered carefully, perhaps involving external experts, any translator will need a more or less long process to decide on the final translation. However, in such cases, a satisfactory or even very good translation can be found. In my think-aloud study in these cases, the quantitative result did not suggest that advance translation had improved translatability because all the steps required to find the final translation would be problematic codes and thus increase the negative side in the chi-squared test calculation. However, in the qualitative analysis-my observational notesmy overall assessment was not that these items were overall problematic as a certain level of difficulty can just not be excluded for these translation tasks. An example: The question before advance translation was "To what extent do you think it is always your duty to accept the decisions made by the police in [country]?" In the final version, an annotation on "duty" was added ("Duty" in the sense of a citizen's moral duty to the state) and "accept" was replaced by "back," explained in a footnote. The qualitative analysis of the think-aloud study only showed a slight improvement in the source text following the advance translation, and the quantitative analysis is not significant. The changes made after advance translation did make the source item easier to translate into French and German, but less clearly than in the items of Group 1. This may be linked to the fact that, on the one hand, duty was not too difficult to translate into German, even without an explanation, and on the other hand, one test person found also the final version (the verb "to back a decision" with the footnote) difficult to translate. As the whole item was complex and demanding to translate, even in the post-advance-translation version, the number of problematic codes was relatively high also in the post-advance-translation version. The second possible interpretation of Group 3 is that my study's results may have been more positive had the same type of change (rewording and annotating) been applied, but differently-rewording "accept" by a different word than "back" or adding a different annotation for duty. Overall, the usefulness of advance translation could not be proven for this item as the results from both analytical approaches were contradictory.

Discussion
My study was the first empirical test of advance translation. Overall, I could show the usefulness of advance translation for enhancing the translatability of source questionnaires. When looking closer at the mechanisms that determine advance translation, I detected the following: I could not show a direct effect of the type nor number of changes made on the success of advance translation. This means that all types of changes (annotations added, rewordings, and restructuring) have the same potential for improving the final text's translatability.
However, I could show that the choice of languages and/or cultures of advance translation and think-aloud study did have an effect on the final result.
The two major aspects of advance translations resulting from my study are (1) issues of general comprehensibility of the source text, independent of the target language and/or culture, are especially likely to be resolved by advance translations; (2) where changes triggered by advance translation are very language-and/or culture-specific, the success of these changes in the final source text is not easily confirmed when translating into other languages (e.g., when advance translation comments were made in French, the changes they trigger in the source text will not necessarily be useful for translations into Russian). This means that, in these cases, the success of advance translation will be noted in the final translation step for the same or similar languages and/ or cultures. The greater the distance between the language and/or culture of advance translation and think-aloud, the lower the likelihood that the same problems are replicated-and solutions confirmed-when translating into these more distant languages.
With regard to (1), according to my study, the general comprehensibility of source texts, independent of the target language and/or culture, was successfully improved by advance translation. This highlights the importance of advance translation for general survey quality beyond mere translation: On the one hand, it is known that comprehensibility plays an important role in translation in general, both of the source and of the target texts (Dorer 2015;Maksymski et al. 2015). A high level of comprehensibility is known to be an important aspect for any text. Text comprehensibility can be modeled by, for instance, the "Karlsruhe comprehensibility concept" developed by Göpferich (2009) (for an application of the Karlsruhe comprehensibility concept to the comprehensibility issues resolved by advance translation, see Dorer 2020).
On the other hand, in survey research, Fowler (1992) and Lenzner (2012) showed in experimental studies that understanding problems of survey questions had negative effects on the resulting data quality, in monolingual contexts. So the role of advance translations in enhancing the languageindependent comprehensibility both of source and of target texts is a strong argument in favor of advance translations.
Regarding (2), where changes triggered by advance translation prove to be language-and/or culture-specific, it is not surprising that such changes have weak or no effects when translated into other target languages and/or cultures. In my study, this applied, for instance, where advance translators from Turkey expected that a close translation of the expression "to judge the government" would be felt to be less acceptable and should be softened by "assess/evaluate" for use in Turkey.
With regard to Group 3, a trait, known to be valid for any translation, seems to be confirmed also for questionnaire translation: That there are cases where a certain level of translation difficulty has to be accepted, and the possibilities of measures like advance translation to facilitate the translation process are limited. This comes back to the observation that the activity of translation is often underestimated by those setting up cross-cultural survey projects (de Jong et al. 2020:244;Lyberg et al. 2021:58-60). Translation can easily become a challenging and complex task where it is not possible to massively facilitate this process by simplifying the source text. This is why it is a requirement to work with appropriately trained and prepared translation staff. In the case of the particular text type (Reiss 1981) of a questionnaire that needs to work as intended in all target populations, it is of the utmost importance to have a team of experienced experts both in translation and in survey design/ development. So, this is not a weakness of advance translation as a method.
From these main findings, we see that advance translation not only contributes to enhancing the translatability of source questionnaires, it also contributes to the overall quality of the final translations, and thus data. The more precisely the source questionnaire expresses what should be asked across all participating countries, the better the individual translations will express the concepts that individual items are intended to measure. Thus, overall comparability of the entire cross-cultural survey data also depends on a clearly and unambiguously formulated source questionnaire that has been cross-checked for its translatability and cultural portability into all target languages and cultures.

Conclusions
Overall, my results confirm the usefulness of advance translation for enhancing translatability of source questionnaires, albeit only for the language combinations and items selected for my think-aloud study.
In my study, the success of advance translation depended less on the number or type of changes made and more on the type of issues detected by advance translation. Advance translation proved to be an effective method for detecting and resolving general, language-independent comprehensibility issues. Depending on each project's questionnaire development approach, other methods, such as cognitive interviewing, may also help to detect general comprehensibility issues. Any recommendations as to which methods to choose will be case-specific. However, advance translation has the additional advantage of also addressing the source questionnaire's translatability. For language-and/or culture-specific issues, advance translation mainly improved the translatability into the same or similar languages and/or cultures. In addition, there seem to be source text constellations where a certain level of difficulty cannot be reduced even by advance translation. But for the latter, further research would be needed to understand to what extent the quality of the changes made contributes to the results, for instance, by modifying the explanations or rewordings of source text.
Based on my study, I recommend that advance translations should be carried out ideally into all languages and cultures into which the source questionnaire is to be translated in the end. This increases the likelihood of ironing out at least the major translatability issues for as many final target languages and cultures as possible. But a limitation is that this may be challenging to implement, both in terms of operations and financial resources. Even if it is not possible to have advance translation into all final target languages, at least all linguistic groups (such as Romance or Slavic languages) and cultural/political groups (such as post-communist countries) of the final questionnaire's target populations should be covered in the advance translation step.
Further research is needed to gather more data to assess as many different combinations of languages and language groups as well as cultures and political systems between the advance translation and the think-aloud study as possible. This may involve testing the items of my study being translated into other languages, modifying the changes triggered by advance translation (annotations, rewordings or structural changes), and studying these again in think-aloud sessions into French and German; or setting up completely new studies, selecting different items, and having different combinations of advance translation and testing languages and/or cultures.
A growing database would increase understanding the mechanisms of advance translation (e.g., whether different types of translatability issues are particularly relevant for specific language combinations, if certain topics are more prone to translatability issues than others, or which comprehensibility or translatability issues can be classified as language independent).
Enhanced translatability not only leads to less problematic translations but also to better ones. In multilingual surveys, advance translation not only contributes to the overall quality of translated questionnaires but also to the overall comparability and quality of the resulting data. In my study, I found that it is worth making advance translation a systematic step within source questionnaire development for cross-cultural survey projects.