Cheater Detection Using the Unrelated Question Model

Randomized response techniques (RRTs) are useful survey tools for estimating the prevalence of sensitive issues, such as the prevalence of doping in elite sports. One type of RRT, the unrelated question model (UQM), has become widely used because of its psychological acceptability for study participants and its favorable statistical properties. One drawback of this model, however, is that it does not allow for detecting cheaters—individuals who disobey the survey instructions and instead give self-protecting responses. In this article, we present refined versions of the UQM designed to detect the prevalence of cheating responses. We provide explicit formulas to calculate the parameters of these refined UQM versions and show how the empirical adequacy of these versions can be tested. The Appendices contain R-code for all necessary calculations.

Throughout the social sciences, many findings are based on surveys of various groups of individuals. Most such surveys rely on the assumption that respondents will provide honest answers to survey questions. However, this assumption falters when asking respondents sensitive questions (see Tourangeau and Yan 2007)-questions that are perceived as intrusive, stigmatizing, socially undesirable, or even legally incriminating (Tourangeau, Rips, and Rasinski 2000). Faced with sensitive questions, respondents may refuse to participate in the survey or may simply answer dishonestly (Tourangeau et al. 2000), especially if they are carriers of the sensitive attribute being assessed. Thus, direct questioning has frequently been found to underestimate the true prevalence of sensitive attributes, such as having received an abortion (Fu et al. 1998), having been convicted of driving while intoxicated, having engaged in doping in athletics, and many other issues.
To address this problem, several indirect questioning techniques have been developed throughout the last half-century (see Chaudhuri and Christofides 2013). One of these methods, the Randomized Response Technique (RRT), developed by Warner (1965), introduced the idea of creating anonymity by employing random encryption of the respondents' answers. In Warner's model, the respondent receives one of two questions about a sensitive issue. For example, the survey instrument might be designed so that respondents will receive the question S: Have you ever used illicit drugs? with probability p (where p 6 ¼ :5), or they will receive the negative of this question c S: Have you never used illicit drugs? with the complementary probability 1 À p. A random element (e.g., the throw of a die) determines which of the two questions the respondent receives. The survey is designed so that only the respondent knows the outcome of the randomization (e.g., the respondent is asked to throw the die out of the sight of the investigator). Since only the respondent knows which question he or she has answered the investigator cannot infer the respondent's status when the respondent answers "yes" or "no" to the survey instrument. However, even though the investigators cannot infer the status of any individual respondent, they can nevertheless estimate the prevalence of the sensitive attribute in a large survey population because the probability p underlying the randomization is known, and hence the estimated prevalence of the sensitive attribute can be derived from the proportion of "yes" answers.
Several revisions and modifications of Warner's (1965) model have been proposed over the years (e.g., Kuk 1990;Mangat 1994). One of these is the well-established unrelated question model (UQM; Greenberg et al. 1969; see Figure 1). In the UQM, as in the original Warner model, a randomization procedure determines whether the respondent is instructed to answer the sensitive question S. The alternative question, however, is not the reversed sensitive question c S, but instead an unrelated innocuous question, the neutral question N (e.g., "Think of someone close to you whose birthdate you know, and answer "yes" if that individual was born on an odd-numbered day"). Thus, the UQM is potentially more psychologically acceptable to survey respondents than the original Warner method because question N is obviously not related to the sensitive attribute and is therefore clearly not incriminating.
With the UQM, as with Warner's original method, the investigator cannot determine any individual respondent's status on the sensitive attribute. However, given a large sample of respondents, the investigator can still estimate the prevalence p of the sensitive attribute, provided that the randomization probability p and the prevalence of the neutral attribute q are known. Specifically, the prevalence p can be estimated from the observed proportionl of "yes" responses by the formula: In several studies, the UQM has elicited prevalence estimates substantially exceeding estimates derived from direct questioning (see Lensvelt-Mulders et al. 2005), such as the prevalence of induced abortion (Abernathy, Greenberg, and Horvitz 1970) and doping in elite athletics (e.g., Ulrich et al. 2018). Figure 1. Probability tree of the unrelated question model. The sensitive question S and the neutral question N are randomly received by respondents with probability p and 1 À p, respectively. The probabilities of responding "yes" and "no" to the neutral question N are q and 1 À q, and the probabilities of responding "yes" and "no" to the sensitive question S are p and 1 À p.
However, by introducing an unrelated question N, the UQM opens the possibility that some respondents ("cheaters") will be tempted to answer a self-protective "no" to either of the two alternative questions on the survey regardless of the true answer to the question. Even though a "yes" response does not necessarily imply having the sensitive attribute, a "no" response greatly reduces the possibility of that conclusion. Specifically, under the standard version of the UQM, the conditional probability PðAj"yes"Þ of being a carrier given a "yes" response is generally larger than the conditional probability PðAj"no"Þ of being a carrier given a "no" response, when p is less than one. For example, for p ¼ 0:75, q ¼ 0:5, and p ¼ 0:2 one computes PðAj"yes"Þ ¼ 0:636 and PðAj"no"Þ ¼ 0:034 using Bayes's theorem. Correspondingly, the odds that one is a carrier of the attribute would be 49 times greater given a "yes" response than given a "no" response. Interestingly, this conclusion does not depend on p. As a consequence, this difference in conditional probabilities may encourage cheating behavior in the form of answering "no" under all circumstances.
Another modification of the RRT, the cheater detection model (CDM; Clark and Desharnais 1998), addresses this drawback by dividing respondents into three mutually exclusive categories: (a) honest respondents who are carriers of the sensitive attribute, who will respond "yes" if they receive the sensitive question, (b) honest respondents who are noncarriers of the sensitive attribute, who will respond "no" if they receive the sensitive question, and (c) cheaters who choose the safe option by always responding "no" to any question regardless of whether they are carriers or noncarriers. For illustration, let A be a carrier and c A be a noncarrier. Furthermore, let H be an honest respondent and c H be a cheater. Then, the probabilities of the three subgroups can be expressed as compound probabilities. These probabilities are for subgroup (a) and for subgroup (c) Note that these three probabilities add to one. The CDM is based on another RRT variant, the forced response model (Boruch 1971). This model modifies Warner's model by replacing the inverted question c S by the instruction to simply say "yes." In other words, the forced instruction to say "yes" simply replaces the neutral question N in the UQM. Hence, if no cheating is assumed and one is therefore not attempting to assess for cheating, the forced response model is mathematically equivalent to a special case of the UQM, namely when the prevalence q of the neutral attribute equals 1. This situation is depicted in the upper part of Figure 2 starting at node H, representing honest respondents only.  Figure 2. Probability tree of the cheater detection model. Respondents are either cheaters C with probability g or honest respondents H with probability 1 À g. All respondents randomly receive either the sensitive question S or the instruction F to respond "yes" with probability p i and 1 À p i , respectively. Cheaters C always answer "no" regardless of their carrier status and regardless of whether they receive question S or instruction F. Honest respondents H respond honestly under all conditions. Specifically, if instructed to say "yes," honest participants always answer "yes." If instructed to answer the sensitive question S, honest participants answer "yes" with probability e and "no" with probability 1 À e. Thus, participants can be divided into three groups: (a) carriers of the sensitive attribute who will honestly respond "yes" with probability ð1 À gÞ Á e ¼ p when receiving S; (b) noncarriers of this attribute who will honestly respond "no" with probability ð1 À gÞ Á ð1 À eÞ when receiving S; and (c) cheaters who will respond "no" with probability g regardless of receiving S or the instruction F to respond "yes." However, note that the temptation to cheat may be especially pronounced in the forced response model because the respondent can completely eliminate any suggestion of being a carrier of the sensitive attribute by simply answering "no." Expressed more formally, in the forced response model, the conditional probability PðAj"yes"Þ must be always larger than the conditional probability PðAj"no"Þ because PðAj"no"Þ ¼ 0 (except in the implausible case where PðAj"yes"Þ is also 0). For example, for p ¼ 0:75 and p ¼ 0:2, one computes PðAj"yes"Þ ¼ 0:5 and PðAj"no"Þ ¼ 0. Correspondingly, the odds that the respondent is a carrier of the attribute would be infinitely greater given a "yes" response than given a "no" response. In other words, answering with "no" is a completely safe option. Therefore, the CDM includes a parameter to assess the extent of cheating. This is depicted in the lower part of Figure 2 starting at node C and representing cheaters. In this diagram of the CDM, the proportion of cheaters is g, whereas the proportion of honest respondents is 1 À g. The proportion of respondents carrying the sensitive attribute cannot be estimated because only the proportion p of honest carriers in the overall respondent population, but not the proportion of carriers who are cheaters in the overall population, can be identified by the model. Importantly, p in the CDM is, therefore, not equivalent to p in the UQM because in the former it is defined as the proportion of honest carriers and in the latter as the total proportion of carriers. Nevertheless, in the CDM, the total proportion of carriers in the population must lie within the range that is defined by the lower bound p ¼ ð1 À gÞ Á e and the upper bound p þ g. The proportions p and g thus represent two of the above introduced categories, namely (a) honest carriers and (c) cheaters, respectively. Therefore, the proportion of respondents in the remaining category (b)-the honest noncarriers-is simply given by 1 À ðp þ gÞ. In order to estimate the parameters p and g for computing the two bounds, two probabilities l 1 and l 2 of responding "yes" are required. They can be estimated by the observed proportion of "yes" responses in two independent samples with p 1 6 ¼ p 2 . The resulting equation system can then be solved for p and g.
Several empirical implementations of the CDM (e.g., Elbe and Pitsch 2018;Moshagen et al. 2010;Ostapczuk 2011;Ostapczuk et al. 2009;Pitsch, Emrich, and Klein 2007;Schröter et al. 2016) have provided evidence of cheating behavior-showing the importance of including a cheating parameter in RRTs. However, studies utilizing the forced response model (e.g., Höglinger, Jann, and Diekmann 2016;Kirchner 2015;Wolter and Preisendörfer 2013) have raised doubts about the validity of this particular method. Specifically, it has been shown (Coutts and Jann 2011;Höglinger et al. 2016) to elicit lower estimates than other indirect questioning techniques, and respondents have reported greater difficulties in understanding this technique. Respondents also seem to express less trust that the technique guarantees anonymity. For example, Lensvelt-Mulders and Boeije (2007) reported that respondents perceived being forced to give a "yes" response as being "forced to be dishonest" (p. 600), which seemingly triggered reluctance. Ostapczuk et al. (2009) proposed a method to reduce this problem by adding a forced "no" response to the forced "yes" response. In this symmetric design, none of the response options is conclusive of the respondents' status. Specifically, it is not only possible to be forced to respond "yes" even though one is a noncarrier but also to be forced to respond "no" even though one is in fact a carrier. This should increase compliance with the instructions, and indeed, the authors found cheating to be reduced in an empirical comparison to the original design. Still, it is plausible that a forced response can feel like an implicit response to the sensitive question, something that even this approach does not address.
In summary, although it appears important to account for possible cheating when using RRTs, a technique based on the forced response model may not be ideal. By contrast, the UQM is conceptually and mathematically similar without potentially triggering reluctance by forcing responses. Here, responses to the neutral question are clearly not responses to the sensitive question because the neutral question has content of its own. Thus, in the next section, we propose a model combining the greater psychological acceptability of the UQM's design with the CDM's concept of cheating.

Unrelated Question Model-Cheating Extension (UQMC)
Below, we introduce the UQMC, a model combining the basic idea of the CDM (Clark and Desharnais 1998) with the standard version of the UQM (Greenberg et al. 1969). The setup of the UQMC resembles that of the UQM, in that respondents receive the sensitive question S with probability p and the neutral question N with probability 1 À p. As in the CDM, participants are categorized as being either honest respondents or cheaters. Figure 3 depicts the resulting probabilities. The same parameters generated in the CDM can be estimated using this model. Specifically, g corresponds to the probability of being a cheater, and p ¼ ð1 À gÞ Á e depicts the probability of being an honest carrier of the sensitive attribute. As in the CDM, the prevalence of the sensitive attribute cannot be inferred because the proportion of carriers can only be estimated among honest respondents and not among cheaters. However, it is still possible to compute an estimated range for the prevalence, which is defined by the bounds p and p þ g.
As in the CDM, two independent samples of respondents are required to estimate p and g. Again, different values of p i must be used with the two samples, i ¼ 1; 2. Thus, the probability of responding "yes" in sample i is given by Figure 3. Probability tree of the unrelated question model-cheating extension. The prevalence of cheaters C is g and the prevalence of honest participants H is 1 À g. In both cases, the sensitive question S and the neutral question N are received by participants with probability p i and 1 À p i , respectively. Cheaters always say "no" regardless of the question received. Honest participants respond "yes" with probability q and "no" with probability 1 À q if instructed to answer the neutral question N. They answer "yes" with probability e and "no" with probability 1 À e, if instructed to answer the sensitive question S. Thus, there are three groups of participants: (a) honest participants who are carriers of the sensitive attribute, who will respond "yes" with probability ð1 À gÞ Á e ¼ p if they receive S; (b) honest noncarriers of this attribute who will respond "no" with probability ð1 À gÞ Á ð1 À eÞ if they receive S; and (c) cheaters who will respond "no" with probability g regardless of whether they receive S or N.
As l 1 and l 2 can be estimated from the corresponding observed proportionl 1 andl 2 of "yes" responses in each sample, the resulting equation system can be solved for p and g, andĝ The corresponding sampling variances of the two estimates are and VarðĝÞ ¼ 1 The covariance of these estimators is Table 1 provides a numerical example to illustrate the UQMC. This example assumes that the estimatesl 1 andl 2 of "yes" responses were obtained from two independent samples. The observed proportions of "yes" responses in this table were simulated with p ¼ 0:2, and g ¼ 0:3. Inserting the values of Table 1 into equations (6-9) yields parameter estimatesp andĝ with their standard errors, which are depicted in Table 2. These estimates can be used to generate the possible range of the prevalence of the sensitive attribute. The lower bound of this range (i.e., the lowest possible estimate of the prevalence) isp using equations (8-10). Therefore, the 95 percent confidence interval of the upper bound ranges from 0:341 to 0:648. Hence, even though the prevalence of carriers among cheaters remains unknown, one can conclude from this model that the estimated total proportion of carriers is at least 0:190 and at most 0:495 with 95 percent confidence intervals ranging from 0:149 to 0:648.
It is important to note that the size of this range is in large part due to the true cheating proportion, which is 0.3 in this example, and not merely due to random sampling error. A model that does not take cheating into account, such as the original UQM, would therefore yield an estimate with a smaller confidence interval. On first sight, this may look preferable. However, this estimate would be biased, as it disregards the true prevalence of cheating. As such, there is uncertainty in both cases, but only the UQMC makes the degree of this uncertainty explicit by taking cheating into account. If on the other hand, there is in fact no cheating, the UQMC can capture this as well (withĝ approximating 0), and the confidence interval of the prevalence estimate range will decrease correspondingly. By way of illustration, if one changes Note. n i ¼ size of sample i; p i ¼ probability of being assigned to the sensitive question in sample i; q ¼ prevalence of the neutral attribute; o yi ¼ observed frequency of "yes" responses in sample i; o ni ¼ observed frequency of "no" responses in sample i;l i ¼ proportion of "yes" responses in sample i. the true cheating prevalence in the above example to g ¼ 0:1, the estimates resulting from simulation arep ¼ 0:239 with 95 percent confidence interval ranging from 0:193 to 0:284 andp þĝ ¼ 0:239 þ 0:082 ¼ 0:321 with 95 percent confidence interval ranging from 0:193 to 0:449. As can be seen from this example positing a lower rate of cheating, the 95 percent confidence interval for the estimated range of the carrier proportion is much smaller, namely 0:193 to 0:449. In addition to estimating the above parameters, the UQMC can test whether a substantial amount of cheating is present. Indeed, Clark and Desharnais (1998) introduced a likelihood ratio test for this purpose in their initial presentation of the CDM. This test utilizes the ratio of the maximum likelihood of a model setting cheating to g ¼ 0 and the maximum likelihood of a model allowing for cheating. It can be applied to the UQMC in a similar manner, where it is formalized as In the above example, this likelihood ratio test supports the hypothesis that cheating is present, with w 2 ð1Þ ¼ 41:119, p < :001. Appendix A (which can be found at http://smr.sagepub.com/supplemental/) contains R-code that can be used for applying the calculations to one's own data.
As is true for all indirect questioning techniques, the sampling variance of the estimates is quite high. Due to the additional estimation of the cheating parameter, this variance becomes even higher than in oneparameter RRMs, such as the original UQM. An optimized choice of p i and q, and an optimized division of the sample into the two subsamples can minimize this drawback. Appendix B (which can be found at http://smr. sagepub.com/supplemental/) illustrates the influence each of these parameters has on the sum of standard errors and power of the model estimates. In short, more extreme values of p i and larger values of q make the sum of standard errors smaller and the relative size of the two subsamples within the overall sample has only a small impact, as long as the difference is not too extreme. Thus, a division of the sample into two equal subsamples is desirable. However, minimizing the standard error cannot be the only consideration when choosing the values for p i and q because in case of values for p i and q close to 0 or 1, the responses become more indicative of the respondents' status and thus anonymity protection decreases. Therefore, the applied values must be chosen to represent a compromise between efficiency and anonymity protection. Recommended values would therefore be 0.75 and 0.70 for p 1 and q, respectively. Different parameter combinations might be advantageous if the focus of the study is mainly on prevalence estimation or mainly on cheater estimation. In the former case p i should be more extreme, q should be smaller, and the larger part of the sample should be allocated to the subsample with higher p i . In the latter case, p i should be closer to 0.5, q should be higher, and the larger part of the sample should be allocated to the subsample with lower p i .
The above recommendations are based on the influence that the design parameters have on the standard error and statistical power, together with an intuitive evaluation of the influence that these parameters have on perceived privacy protection. In specific applications, the parameters should be informed by the specific sensitive question at hand and the implementation of the questioning design. In doing so, one can refer to theoretical as well as empirical work on the optimal choice of design parameters in RRMs with respect to efficiency and perceived privacy protection (e.g., Greenberg et al. 1977;Lanke 1975;Leysieffer and Warner 1976;Ljungqvist 1993;Soeken and Macready 1982). An overview on this topic is given by Fox (2016).

Partial Cheating
As explained above, the UQMC utilizes the cheating concept as initially defined in the CDM, where "cheaters" are assumed to always choose the safe option of a "no" response, regardless of the question presented. However, this may be an unduly restrictive assumption, as there might be respondents who would cheat when confronted with the sensitive question but would answer the neutral question truthfully, since they do not feel threatened by this latter question. Allowing for cheating in this broader and probably more realistic sense implies that the original categories (completely honest respondents and complete cheaters) should be extended by the category "partial cheaters" (i.e., cheating only if presented with the sensitive question). In the following, we refer to the original group of cheaters, who always respond "no," as "complete cheaters." Figure 4 depicts how partial cheating affects the probabilities for "yes" and "no" responses. Honest respondents still answer honestly to whichever question they are assigned. Complete cheaters, as before, respond "no" to whichever question they are assigned. In this figure, we add partial cheaters, who answer honestly if assigned to the neutral question, but always respond "no" to the sensitive question, regardless of whether they are carriers of the sensitive attribute. Thus, there is a new branch of the probability tree leading to a "yes" response, g p Á ð1 À p i Þ Á q. The resulting total probability for answering "yes" if there is partial cheating can be reduced to It should be stressed that not all three parameters g c , g p , and e can be estimated from empirical data. In other words, the same value of p can be P N "no" 1 − q Figure 4. Probability tree of the unrelated question model-cheating extension including partial cheating. Participants are (a) honest H with probability 1 À g c À g p , (b) partial cheaters P with probability g p , or (c) complete cheaters C with probability g c . All types of participants receive the sensitive question S and the neutral question N with probability p i and 1 À p i , respectively. (a) Honest participants respond "yes" with probability q and "no" with probability 1 À q, if instructed to answer the neutral question N. They answer "yes" with probability e and "no" with probability 1 À e, if instructed to answer the sensitive question S. (b) Partial cheaters always say "no" if they are instructed to answer the sensitive question S, regardless whether or not they are carriers, but if instructed to answer the neutral question N, they answer honestly by saying "yes" with probability q and "no" with probability 1 À q. (c) Complete cheaters always answer "no" regardless of the question that they receive and regardless of whether or not they carry the attribute.
achieved by an infinite number of combinations of g p and e, which would give rise to the same probability l i . Therefore, this extension can be only partially solved for parameters p ¼ ð1 À g c À g p Þ Á e and g c . As such, p can be inserted into equation (13) resulting in It is clear that equation (14) is equivalent to equation (5), except that g is replaced by g c . Thus, the lower bound for the estimated prevalence of the sensitive attribute is still defined by p when allowing for partial cheaters. However, the upper bound of the estimated prevalence, which was formerly given by p þ g, may no longer be given by p þ g c after allowing for partial cheaters because the remaining category now comprises not only the proportion of honest noncarriers but additionally g p . Since partial cheaters can be carriers of the attribute, g p should be added to the possible prevalence range. This results in an increased upper bound of p þ g c þ g p , which cannot be determined because g p is not identifiable.
For the above numerical example, this would mean that the estimate for the lower bound of the prevalence range would remain atp ¼ 0:188. The estimate for the upper bound, however, would potentially exceedp þĝ c ¼ 0:188þ 0:304 ¼ 0:492 because there could be an additional unknown proportion of partial cheaters. In other words, if one computes the prevalence of the sensitive attribute using the UQMC, which formally assumes only the possibility of complete cheating, the estimate of the lower bound of carrier prevalence is not affected by the presence of partial cheaters, but the upper bound of this range may be underestimated if partial cheaters are present. This consideration should be kept in mind when interpreting the results of a study using the UQMC. In other words, if one wants to address partial cheating within the UQMC framework, the same estimates can be calculated but need to be interpreted differently concerning the upper bound of the prevalence estimate.
It is worth mentioning that the same line of reasoning would apply to the CDM. That is, the possibility of partial cheating would involve a reinterpretation of the parameters estimated by the CDM. Specifically, as before, in the presence of partial cheating, the lower bound of the prevalence would remain at p. However, the upper bound could exceed p þ g if partial cheaters are present.

A Survey Design for Testing the UQMC
A limitation of RRTs in general is that their empirical adequacy cannot be tested because the number of unknown parameters usually equals the number of independent samples, and therefore, there are no degrees of freedom left for testing empirical adequacy. Thus, empirical adequacy must simply be assumed. Fortunately, this drawback can be resolved in the UQMC by varying the prevalence of the neutral attribute q. In the basic UQMC, p 1 and p 2 are applied to two independent samples in order to generate two independent equations for l 1 and l 2 , allowing for two parameters to be identified. However, if q j is varied orthogonally to p i , four independent samples can be drawn, each with a unique combination of these design parameters, ðp 1 ; q 1 Þ, ðp 1 ; q 2 Þ, ðp 2 ; q 1 Þ, and ðp 2 ; q 2 Þ. The resulting model with four independent equations for l ij (l 11 , l 12 , l 21 , and l 22 ) provides two degrees of freedom, allowing for an empirical test of adequacy. Table 3 illustrates what the setup of the UQMC with four samples could look like, including exemplary estimatesl ij . Like in the first example, the observed proportions of "yes" responses in this table were simulated with p ¼ 0:2 and g ¼ 0:3. In this case, there is no explicit solution for the estimation of the model parameters. Parameter estimatesp andĝ can be obtained by numerical maximum likelihood estimation. Furthermore, the standard errors of the estimated parameters can be numerically evaluated using the observed Fisher information. For the example in Table 3, these estimates are depicted in Table 4. The likelihood ratio test can also be conducted in the four-sample extension. In the numerical example here, the results are in favor of the hypothesis that cheating is present, with w 2 ð1Þ ¼ 55:029, p < :001. The exemplary results shown so far are equivalent to those obtainable by the UQMC with two samples. However, the four-sample extension additionally enables testing of the model's adequacy using Pearson's w 2 goodness-of-fit test. In the UQMC, this is formalized as Note. n ij ¼ size of sample ij; p i ¼ probability to be assigned to the sensitive question in samples i; q j ¼ prevalence of the neutral attribute in samples j; o yij ¼ observed frequency of "yes" responses in sample ij; o nij ¼ observed frequency of "no" responses in sample ij;l ij ¼ proportion of "yes" responses in sample ij.
where o yij and o nij are the observed frequencies of "yes" responses and "no" responses, respectively, in each sample with p i and q j . Likewise, e yij and e nij are the corresponding expected frequencies. The test supports the fit of the UQMC in the numerical example, w 2 ð2Þ ¼ 0:080, p ¼ :961. Appendix C (which can be found at http://smr.sagepub.com/supplemental/) contains R-code for parameter estimation and the goodness-of-fit test that can be applied to one's own data.

Discussion
The present article extends the UQM to allow it to assess cheating while still ensuring respondents' anonymity. This extension incorporates the basic idea of the CDM (Clark and Desharnais 1998) while preserving the more psychologically acceptable design of the UQM. Such an extension seems appropriate because there is ample evidence that many respondents cheat by always answering "no" in randomized response surveys (e.g., Elbe and Pitsch 2018;Moshagen et al. 2010;Ostapczuk 2011;Ostapczuk et al. 2009;Pitsch et al. 2007;Schröter et al. 2016), probably because a "no" response reduces the fear of embarrassment or other negative consequences. In particular, when a respondent is administered the UQM, such cheating would greatly diminish the conditional probability of being deemed a carrier of the sensitive attribute. For example, as noted earlier, Bayesian analysis reveals that for the design parameters p ¼ 0:75 and q ¼ 0:50, the odds of carrying the sensitive attribute would be 49 times higher in the presence of a "yes" response as opposed to a "no" response, if respondents were to obey the UQM's instructions. Therefore, disobeying these instructions by cheating with uniform "no" responses is potentially attractive as a self-protecting strategy.
In the present article, we have first introduced an extension of the UQM utilizing the standard assumptions of the CDM-namely the assumption that cheaters will always respond "no" regardless of whether they are directed to the sensitive or to the neutral question. For this extension of the UQM, which we have termed the UQMC, we provide explicit formulae to compute the lower and upper bound of the prevalence estimate range, together with a likelihood ratio test to statistically assess the presence of cheating.
Second, we have discussed in this article the possibility of partial cheating in addition to complete cheating-a perhaps more realistic assumption. Partial cheaters answer honestly if directed to the neutral question but always respond "no" if directed to the sensitive question, even if they are in fact carriers of the sensitive attribute. The parameters of a model including partial cheating are only partially identifiable. Currently, we are not aware of a mathematical or experimental solution for this limitation. However, we have shown that even if partial cheating is disregarded, as in the UQMC, the lower prevalence limit is not affected if partial cheaters are present, although the upper limit may be higher than that estimated by the UQMC if partial cheaters are present. Importantly, such a lower bound provides relevant information like, for example, in a study on the prevalence of doping in elite athletics using the UQM (Ulrich et al. 2018). The UQM estimates of more than 30 percent were clearly much higher than the prevalence estimates from physical doping tests, which indicated a prevalence of about 2 percent at the time (World Anti-Doping Agency 2012). Consequently, even if this only represents a lower bound to the prevalence, the implications are considerable. In addition, the UQMC can account for a very likely type of nonadherence, namely complete cheating. Thus, even if one wants to avoid overconfident conclusions and regards partial cheating, UQMC estimates can have important implications.
Third, we have also shown how the adequacy of the UQMC can be empirically tested. Finally, we have performed power analyses to show that reliable parameter estimates can be obtained even with modest total sample sizes.
The described RRT cheating models assume the presence of "no" cheating for self-protective reasons. Nevertheless, it is at least conceivable that some respondents could cheat with a false "yes" response. For example, a clean athlete might be tempted to cheat with "yes" in order to inflate the prevalence estimate of doping in the hope that this would lead to stricter antidoping policies (Elbe and Pitsch 2018). In light of this possibility, Feth et al. (2017) extended the CDM to address not only "no" cheating but also "yes" cheating. These authors regard the idea of the CDM in the context of a more general variant of the forced response method, in which there is a forced "no" response in addition to the forced "yes" response. The authors provide an in-depth discussion of the estimation of "yes" and "no" cheating within this framework and also mention the possibility of transferring this idea to the UQM. This CDM extension was recently applied to estimate the prevalence of doping among elite Danish athletes (Elbe and Pitsch 2018). Although the model revealed a high proportion of "no" cheaters, the proportion of "yes" cheaters was virtually nil. A similar conclusion was reached in a recent experimental individual-level validation study (Höglinger and Jann 2018), which examined whether cheating in a dice game could be accurately assessed by several indirect questioning techniques-and, if not, in which direction respondents misreport on their actual behavior. In case of the UQM, these investigators found a substantial prevalence of false-negative responses (i.e., "no" cheating), but not of false-positive responses (i.e., "yes" cheating). These findings are consistent with several lines of evidence indicating that misreporting usually occurs in the socially desirable direction (see Tourangeau and Yan 2007). In the present article, we have extended the standard UQM only for "no" cheating, but future extensions of the UQM could include the possibility of "yes" cheating (including, at least in theory, the possibilities of both complete and partial "yes" cheating). However, assessing for "yes" cheating would likely be useful only in rare situations where social desirability plays a subordinate role, or where there might be a plausible motivation for "yes" cheating.
In the UQMC, the estimation of two parameters requires independent subsamples. A possible limitation of this approach is that it relies on the assumption that these subsamples do not differ with respect to the true parameter values. In case of the cheating parameter, this assumption could be violated because different probabilities of receiving the sensitive question might induce different levels of trust and hence different levels of cheating. There are alternative approaches to estimate nonadherence parameters that do not rely on independent subsamples (e.g., Böckenholt and van der Heijden 2007;Böckenholt, Barlas, and van der Heijden 2009;Cruyff, Böckenholt, and van der Heijden 2016). However, these approaches usually involve the assessment of multiple RRM questions instead of using independent subsamples. Thus, these alternative approaches are not equally suited to the same research questions as approaches using subsamples. When applying the UQMC, this risk of violating the above-mentioned assumption can be minimized by defining the design parameters such that the motivation to cheat would not be expected to strongly differ between subsamples. Additionally, and most crucially, the model test proposed in this article allows one to assess the adequacy of these assumptions.
In this article, we have focused on the UQM and CDM. The Crosswise Model (Yu, Tian, and Tang 2008) provides an alternative to these two models. An advantage of this model is that it does not necessitate a randomization device, nor does it require a "yes"/"no" response. Thus, a response cannot be interpreted as a direct response to the sensitive question, which seems to increase perceived anonymity (Hoffmann et al. 2017). Despite these advantages, this model also has drawbacks. First, the sampling variance of this model's prevalence estimate is relatively high and thus samples much larger than those typically used in the original UQM are required (Ulrich et al. 2012). Second, the Crosswise Model has been shown to be susceptible to other types of instruction nonadherence, which may distort the prevalence estimate (e.g., Höglinger and Diekmann 2017;Höglinger and Jann 2018).
In summary, the present article attempts to enrich the RRT toolbox by extending one of the most common RRT models, the UQM, to allow for the estimation of cheaters. This extended model is relatively easy to implement in surveys. Therefore, we recommend that cheating and model adequacy should be routinely taken into account in future RRT surveys that will employ the UQM.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the Deutsche Forschungsgemeinschaft (DFG), grant 2277, Research Training Group "Statistical Modeling in Psychology" (SMiP).

Supplemental Material
Supplemental material for this article is available online.