Interviewer-Observed Paradata in Mixed-Mode and Innovative Data Collection

In this research note, we address the potentials of using interviewer-observed paradata, typically collected during face-to-face-only interviews, in mixed-mode and innovative data collection methods that involve an interviewer at some stage (e.g., during the initial contact or during the interview). To this end, we first provide a systematic overview of the types and purposes of the interviewer-observed paradata most commonly collected in face-to-face interviews—contact form data, interviewer observations, and interviewer evaluations—using the methodology of evidence mapping. Based on selected studies, we illustrate the main purposes of interviewer-observed paradata we identified—including fieldwork management, propensity modeling, nonresponse bias analysis, substantive analysis, and survey data quality assessment. Based on this, we discuss the possible use of interviewer-observed paradata in mixed-mode and innovative data collection methods. We conclude with thoughts on new types of interviewer-observed paradata and the potential of combining paradata from different survey modes.


Introduction
Face-to-face interviewing has long been considered the 'gold standard' among data collection methods in the market and social research, which is mainly due to the long-time higher response rates, the better reaching of hard-to-reach target groups, and thus the less biased and more representative survey data (Schober, 2018;Villar & Fitzgerald, 2017).Another advantage of faceto-face interviewing is the unique opportunity for the interviewer to collect additional data, so-called interviewer-observed paradata, about the respondents and nonrespondents, their living environment, and the interview situation itself.These paradata allow researchers and practitioners to learn more about improving fieldwork processes and ensure a high quality of survey data (Groves & Heeringa, 2006;Kirchner et al., 2017;Kreuter, 2013).
In recent years, and further reinforced by the COVID-19 pandemic, there have been increasing calls in the market and social research to switch from face-to-face-only interviewing to mixedmode designs (Luijkx et al., 2021;Wolf et al., 2021), or other innovative survey data collection methods (Conrad et al., 2022;Endres et al., 2022;Jeannis et al., 2013;Schober, 2018;West et al., 2022).While the idea of mixed-mode data collection and its benefits are not new (de Leeuw, 2005;2018;Dillman, 2005;Scherpenzeel, 2017), they have become even more important in the postpandemic era (Cleary et al., 2021;Kuenzi et al., 2022;Kantar Public, 2021).In addition, innovative methods that involve an interviewer in some way, such as knock-to-nudge contact strategies or remote video interviewing, gained prominence (Cornick et al., 2022;West et al., 2022).
Mixed-mode and innovative data collection methods allow for rapidly adapting fieldwork processes to changing conditions and more flexible responses to unforeseen events (Cornick et al., 2022;SHARE-ERIC, 2022).Moreover, they enable the collection of rich interviewerobserved paradata at each step interviewers are actively involved.Even though many survey researchers and practitioners are already familiar with the common interviewer-observed paradata from face-to-face-only interviews, little is known about the meaningful use of these paradata in mixed-mode and innovative data collection methods.Therefore, this research note provides a systematic overview of the most common types and purposes of interviewerobserved paradata in face-to-face-only interviews.Based on this, we discuss their potential uses in mixed-mode and innovative data collection methods and provide initial suggestions for academic research and practice.

Interviewer-Observed Paradata in Face-to-Face-Only Interviewing
We systematically searched for previous empirical studies dealing with interviewer-observed paradata in face-to-face interviews and compiled them using the evidence-mapping methodology (Saran & White, 2018;Snilstveit et al., 2013).We included 102 articles and coded the types and purposes of interviewer-observed paradata (details of the search, screening, and coding process in the Supplementary Appendix).
Figure 1 shows in the rows the main types of interviewer-observed paradata in face-to-face studies, namely contact form data, interviewer observations, and interviewer evaluations, and their subtypes (see Table A4 in the Supplementary Appendix for a complete list of paradata types coded in our studies, including examples).Columns list the five primary purposes of interviewer-observed paradata that we identified based on our studies, including fieldwork management, propensity modeling, nonresponse bias analysis, substantive analysis, and survey data quality assessment.The size of the circles corresponds to the frequency with which the paradata occurred as (in)dependent variables in the analyses of the studies.The paradata types most often used for specific purposes are highlighted in light gray and are briefly described below based on selected studies.

Types of Interviewer-Observed Paradata
Contact history information is collected during an interview's recruitment and contact phase; they include call record data, usually available for each contact attempt (Durrant et al., 2011).Examples are the time and mode of contact, the outcome of each contact attempt, or the reasons for noncontact.Doorstep interactions are available for each contact attempt that results in contact with a household member and include the household members' initial reactions and concerns about participating in the interview (Loosveldt & Joye, 2016).Interviewer observations are usually collected only once at the first contact attempt for all sample units, including noncontacts and refusals (Durrant et al., 2011).They include observations on the neighborhood (e.g., the condition of houses in the area) and the housing unit of the sample unit (e.g., physical barriers to accessing the house).Interviewer observations also refer to the members of a housing unit (e.g., single or couple household); this information is collected only for successfully contacted households (Olson, 2013).Interviewer evaluations are usually recorded after the completion of the interview and are therefore only available for interviewed respondents (Kirchner et al., 2017); they relate to characteristics of the respondent (e.g., sociodemographic status) and characteristics of the interview situation (e.g., respondents' engagement during the interview).

Purposes of Interviewer-Observed Paradata
Fieldwork management generally aims to improve contact and cooperation during the field phase to increase response rates and sample representativeness while reducing survey costs.In particular, call record data are used in fieldwork management to optimize, tailor, or even initiate targeted solutions for making successful contact (e.g., prioritizing or stopping calls in case of unsuccessful call sequences, increasing the number of active interviewers in the field) (Durrant et al., 2019;Kennickell, 2017;Konicki & Adams, 2016;Purdon et al., 1999;Safir & Tan, 2009;Vandenplas et al., 2017;Zelenak & Davis, 2013).Doorstep interactions are used as part of the fieldwork management to gain insights into reasons for nonparticipation, such as lack of time (e.g., "too busy") or privacy concerns (e.g., "don't trust surveys") (Bates & Piani, 2005;Vercruyssen et al., 2011).This information can be used, among others, to adjust contact timing or guide interviewer training regarding refusal conversion.
Propensity modeling can provide helpful information for effective fieldwork management by predicting the likelihood of respondent contact and cooperation in a survey.Additional information about potential (non)respondents in advance or in the early stages of contacting can help determine the best timing and strategies for contact.Based on our studies, it is mostly call record data used for this purpose.Findings show, for example, that the likelihood of contact is highest for the first call and calls made in the evening or on weekends, while it decreases with the number of calls made previously (Blom, 2012;Durrant et al., 2011).Concerning the likelihood of cooperation, findings are more mixed; some studies indicate higher cooperation with later contact attempts (Durrant et al., 2013), while others show that more contact attempts mean lower cooperation (Groves & Heeringa, 2006;Kreuter & Kohler, 2009;West & Groves, 2013).Doorstep interactions reveal higher refusal rates among households that express concerns more frequently; also, refusal rates differ by type of concern (Bates et al., 2008;Bates & Piani, 2005;Vercruyssen et al., 2011;West & Groves, 2013).Identifying the types of concerns that cause interviewers major problems in obtaining cooperation can help develop appropriate interviewer training and other strategies to address these concerns successfully (e.g., transferring cases to more experienced interviewers).Interviewer observations are also commonly used for propensity modeling.Findings show that observations on the neighborhood and housing of sampled units help predict contact (e.g., contact is less likely in areas where the interviewer would feel unsafe after dark or for houses in poor condition) (Blom, 2012;Durrant et al., 2011;Durrant & Steele, 2009;Steele & Durrant, 2011), but that it seems to be very context-dependent, which observations are appropriate for predicting cooperation (Blom et al., 2011;Casas-Cordero, 2010;Durrant et al., 2013;Durrant et al., 2017;Groves & Heeringa, 2006;Krueger & West, 2014;Vercruyssen & Loosveldt, 2017;West, 2013;West & Groves, 2013).
Nonresponse bias analysis generally aims to assess the consequences of nonparticipation of sampled cases for survey estimates and to adjust for possible nonresponse bias due to systematic differences between respondents and nonrespondents (Groves, 2006).Call record data and doorstep interactions are helpful for nonresponse bias assessment.For example, respondents with multiple contact attempts (hard-to-reach) or respondents who express initial concerns (hard-topersuade) serve as proxies for true nonrespondents to examine the extent of bias (Boniface et al., 2017;Lee et al., 2018;Lynn & Clarke, 2002).These paradata are also used to examine the effect of increased fieldwork efforts (e.g., repeated contact attempts, refusal conversion) on reducing nonresponse bias (Lynn & Clarke, 2002;Moore et al., 2018).In contrast, call record data and doorstep interactions proved to be of little use in improving the quality of nonresponse adjustments (Biemer et al., 2013;Maitland et al., 2009;Peytchev & Olson, 2007;Wagner et al., 2014)-either due to low correlations between paradata-derived indicators and key survey variables (Hanly et al., 2016;Kreuter & Kohler, 2009;Peytchev & Olson, 2007;Wagner et al., 2014) or due to underreporting of contact attempts by interviewers (Biemer et al., 2013).Interviewer observations about the housing unit and its members (e.g., type of household, presence of children, receipt of unemployment benefits) also do not have substantial utility in nonresponse adjustment, as they do not predict response outcomes and key survey variables well (Kreuter et al., 2010;West et al., 2014).
Survey data quality assessment relies primarily on interviewer evaluations of respondents' behavior and their engagement and understanding during the interview to detect potential measurement problems and poor data quality (Bricker, 2014;Mellinger et al., 1982;Weissman et al., 1996;West et al., 2018).In this context, Perales and Baffour (2018) found that interviewer evaluations of respondent behavior during the interview are good predictors of data quality because they point in the same direction as results based on objective indicators of survey engagement (e.g., panel dropout and item nonresponse).

Interviewer-Observed Paradata in Mixed-Mode and Innovative Data Collection Methods
First, we briefly describe three data collection methods with interviewer participation that have gained prominence in the market and social research during the COVID-19 pandemic, including CAPI-plus, video interviewing, and knock-to-nudge (Cornick et al., 2022).Second, we discuss the use of interviewer-observed paradata for these three methods.These uses are anecdotal and do not claim to be exhaustive.
CAPI-plus, as a type of sequential mixed mode, means that computer-assisted personal interviewing (CAPI) is the default mode of data collection.If respondents decline to participate in a face-to-face interview, they are offered an alternative mode, often telephone interviews or selfadministered web surveys.Computer-assisted video interviewing (CAVI) is a form of remote interviewing in which the interviewer and respondent communicate via video call.The video interview usually does not take place during the initial contact, but appointments are made for a specific time slot (Schober et al., 2020).Knock-to-nudge (KtN) is a contact method in which faceto-face interviewers visit sampled households and ask respondents at the doorstep to participate in a non-face-to-face survey.An appointment is made for the survey, which is conducted later, usually by telephone, video interview, or mixed mode (Cornick et al., 2022;Kastberg & Siegler, 2022).

Challenges in Contact and Cooperation
A major objective of mixed-mode and innovative data collection methods is to improve contact and cooperation to increase response rates and sample representativeness.For example, offering an alternative non-face-to-face mode in CAPI-plus can make the survey attractive to those concerned about face-to-face interaction or who want to avoid an interviewer in their home (Cornick et al., 2022).Similarly, face-to-face recruitment for a non-face-to-face survey through KtN can increase response rates.However, it also affects the distribution of respondent characteristics (e.g., younger, unmarried, living in larger households and the most deprived areas), presumably due to different likelihoods of respondents being at home and responding to the interviewer's knock on the door (Kastberg & Siegler, 2022).In addition, KtN requires comprehensive call scheduling due to the postponement of the interview.Concerning CAVI, not all respondents have access to an Internetenabled device with a camera and microphone.Even if the technical requirements are met, not all respondents are ready for and comfortable with a video interview.Like KtN, CAVI involves comprehensive scheduling (Endres et al., 2022;Schober et al., 2020).Respondents' varying ability and willingness to participate and the more complex call scheduling, particularly for CAVI and KtN, underscore the importance of tailored fieldwork management, propensity modeling, and nonresponse bias analysis.
In all three data collection methods presented, contact history information can be usefully applied to fieldwork management and propensity modeling to better understand the mechanisms of successful contact and cooperation and develop an effective call scheduling and recruitment strategy, ultimately increasing response rates and sample representativeness.For example, call record data help optimize contact timing and prioritize cases most difficult in CAPI-plus and KtN to reach at home or those most likely to refuse in face-to-face mode.When different modes are combined, call sequence outcomes can improve recruitment strategy by tailoring the timing of mode switching (e.g., after how many contact attempts in CAPI mode, it is advisable to switch to another mode) and the number and type of reminders (e.g., call reminders, postal reminders, or email follow-ups).We also encourage gathering interviewer observations on the sampled unit's neighborhood and housing unit in CAPI-plus and KtN during (initial) face-to-face contact.As they have proven helpful for propensity modeling in face-to-face-only studies, they are promising for deriving tailored treatments before or early in the field phase in CAPI-plus and KtN (e.g., assigning cases to the appropriate mode).In addition, we recommend paying particular attention to doorstep concerns.It is crucial to understand respondents' concerns and barriers to data collection methods that are new and unfamiliar to many respondents.KtN and CAVI may involve concerns other than those from face-toface-only interviews (e.g., unwillingness to provide a phone number during KtN, inadequate technical equipment, or discomfort with using video in CAVI).Only when we know the specific concerns can appropriate strategies be developed to encourage respondents to participate (e.g., sending experienced interviewers specially trained in refusal conversion, conducting brief doorstep training on the use of video).In addition, contact history information and interviewer observations on all sampled cases, including nonrespondents, help assess the extent of nonresponse and the consequences of nonparticipation for sample composition and survey estimates.For example, interviewer observations of (non)respondents' sociodemographic characteristics (e.g., age, ethnicity, language spoken) or household type and composition (e.g., single-person household, presence of children) may explain why some respondents are more likely to refuse in CAVI than others or to prefer one mode over another in CAPI-plus and KtN.These paradata can also provide insight into how switches in survey mode and increased fieldwork effort counteract nonresponse (bias).Particularly in mixed-mode data collection, the success of a measure (e.g., number of reminders, amount of incentives) may vary by survey mode, so measures should be tailored to the mode (e.g., different number of reminders or incentives depending on the mode in CAPI-plus or KtN).

Challenges in Data Quality
As with all data collection methods, a challenge with mixed-mode and innovative methods is ensuring the quality of the survey data.Mixing modes results in survey data being collected under very different conditions (e.g., interviewer presence or absence, verbal or visual presentation of question stimuli, differing question formats); thus, mode effects can affect data quality and survey estimates (Conrad et al., 2022;de Leeuw & Hox, 2015;Endres et al., 2022;Lugtig et al., 2011;West et al., 2022).Moreover, when relatively new data collection methods are used that are unfamiliar to both interviewers and respondents, such as CAVI, little is known about the problems that may occur during the interview, such as interrupted speech and frozen or distorted video (Conrad et al., 2022), and about the impact of the new interview situation and the problems encountered on response behavior and data quality.These technical and other issues make it even more essential to take a closer look at the conditions under which the survey data are collected and to evaluate their quality thoroughly.
One advantage of CAPI-plus (when CAPI mode is selected) and CAVI is that interviewers and respondents can usually see each other, and interviewers can thus perceive respondents' attributes, facial expressions, and nonverbal cues.The visual interview-respondent interaction allows for an extensive collection of interviewer evaluations of respondent characteristics that can be used as proxy information for substantive analyses.Most importantly, we recommend the collection of detailed interviewer evaluations of the interview situation and respondent behavior to enable an informed survey data quality assessment.Especially in CAVI mode, new and unexpected interactions and problems may occur, which should be documented through comprehensive interviewer evaluations (e.g., screen sharing not working, technically related interruptions, acoustically related difficulty understanding questions, distractions from incoming emails and notifications on the respondent's device) to identify low-quality data and explain differences in data quality between survey modes.In addition, interviewer evaluations can help identify groups of respondents for whom CAVI is particularly problematic (e.g., less technically savvy respondents, elderly) and for whom another mode is preferable.Due to the lack of immediate proximity between interviewer and respondent, interviewers should be specifically trained to collect interviewer evaluations in CAVI mode so that they know exactly what to look for in the interview situation and how to interpret respondents' (non)verbal behaviors appropriately.

Conclusions and Considerations for Future Research
The range of interviewer-observed paradata in face-to-face interviewing is diverse, as are their purposes, as we have shown through a systematic overview of the previous literature.Moreover, we found that the usefulness of interviewer-observed paradata is often highly dependent on the interview context.Using CAPI-plus, CAVI, and KtN as examples, we have discussed the applicability of interviewer-observed paradata, typically collected in face-to-face-only interviews, in mixedmode and innovative data collection methods.We have shown that it is necessary to develop modified and new interviewer-observed paradata tailored to the specific needs of a data collection method to realize its full potential.Modified and new paradata require additional interviewer training and a thorough assessment of the quality and applicability of these paradata in the context of mixed-mode and innovative data collection methods, as the collection conditions may differ significantly from those of face-to-face-only interviews.
A worthwhile endeavor from our perspective is to combine interviewer-observed paradata with paradata from other survey modes.Mixed-mode and innovative methods that involve web-based data collection can profit from web paradata (e.g., response times, questionnaire navigation, and device information) that can be used to better understand question-answer processing on the part of respondents and to assess survey data quality (for a comprehensive overview of web paradata and their uses, see, for example, Callegaro, 2013;Kunz & Hadler, 2020;McClain et al., 2019).For example, like interviewer evaluations, response time data can indicate whether respondents have comprehension problems with individual questions or how much effort they put into answering them.These automatically collected web paradata can substitute for at least some interviewer evaluations and allow for the economical collection of paradata by saving interviewer time to record interviewer-observed paradata and increasing standardization by eliminating interviewer variability in the collection of these paradata.Or they can be collected supplementally to compare interviewer evaluations and web paradata to assess their quality per se and to decide what type of paradata will be most useful in future data collection.
Survey researchers and practitioners have recognized in the wake of the COVID-19 pandemic that future survey data collection will likely include multiple modes and different approaches to best meet respondents' needs.It is therefore necessary to further develop the practice of paradata collection and use and adapt it to the new data collection conditions, particularly mixed-mode settings.We would like to stimulate future research to provide evidence-based insights into how paradata from different survey modes can be usefully supplemented and combined to improve the efficiency of data collection and the quality of survey data.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figure 1 .
Figure 1.Evidence map on main types and purposes of interviewer-observed paradata in face-to-face interviewing.