High-throughput genotyping assays for identification of glycophorin B deletion variants in population studies

Glycophorins are the most abundant sialoglycoproteins on the surface of human erythrocyte membranes. Genetic variation in glycophorin region of human chromosome 4 (containing GYPA, GYPB, and GYPE genes) is of interest because the gene products serve as receptors for pathogens of major public health interest, including Plasmodium sp., Babesia sp., Influenza virus, Vibrio cholerae El Tor Hemolysin, and Escherichia coli. A large structural rearrangement and hybrid glycophorin variant, known as Dantu, which was identified in East African populations, has been linked with a 40% reduction in risk for severe malaria. Apart from Dantu, other large structural variants exist, with the most common being deletion of the whole GYPB gene and its surrounding region, resulting in multiple different deletion forms. In West Africa particularly, these deletions are estimated to account for between 5 and 15% of the variation in different populations, mostly attributed to the forms known as DEL1 and DEL2. Due to the lack of specific variant assays, little is known of the distribution of these variants. Here, we report a modification of a previous GYPB DEL1 assay and the development of a novel GYPB DEL2 assay as high-throughput PCR-RFLP assays, as well as the identification of the crossover/breakpoint for GYPB DEL2. Using 393 samples from three study sites in Ghana as well as samples from HapMap and 1000 G projects for validation, we show that our assays are sensitive and reliable for genotyping GYPB DEL1 and DEL2. To the best of our knowledge, this is the first report of such high-throughput genotyping assays by PCR-RFLP for identifying specific GYPB deletion types in populations. These assays will enable better identification of GYPB deletions for large genetic association studies and functional experiments to understand the role of this gene cluster region in susceptibility to malaria and other diseases.


Introduction
Malaria is still an important public health issue worldwide and the leading cause of death among children in sub-Saharan Africa (sSA). It is estimated that every year about 216 million cases of malaria and 445,000 deaths occur globally, with sSA being the most affected. 1 Plasmodium falciparum, which is responsible for most of these deaths, has evolved complex machinery for invading erythrocytes. The mechanism is mediated by multiple redundant parasite ligands and specific human host receptors on the surface of erythrocytes to facilitate invasion. [2][3][4] Many studies have shown that host-pathogen factors influence malaria outcomes and most certainly the parasite has affected the evolution of the human genome over the years. 5,6 Several genetic host factors from single nucleotide polymorphisms to large structural variants are known to influence an individual's susceptibility or resistance to malaria. [7][8][9][10] Glycophorins (GYP) are glycosylated sialoglycoproteins found on the surface of human and animal erythrocytes. 11 Human GYPA and GYPB are determinants of the major MNS blood group system while GYPC and GYPD are determinant of the Gerbich Blood Group. Some of these glycophorins have also been shown to be receptors on erythrocytes use by the P. falciparum to invade these cells. In addition, these glycophorins serve as receptors for other pathogens such as Babesia sp., Influenza virus, encephalomyocarditis virus, Vibrio cholerae El Tor Hemolysin, and E. coli. [11][12][13][14][15] These include GYPA which interacts with the P. falciparum protein erythrocyte binding antigen (EBA)-175, [16][17][18] GYPB which interacts with erythrocyte binding ligand 1 (EBL-1), 19,20 and GYPC which interacts with EBA-140, all in a sialic acid-dependent manner. [21][22][23] The GYPE gene is not known to be expressed as a protein on the erythrocyte surface. 24 The genes GYPE, GYPB, and GYPA are located in a gene cluster on the long arm of chromosome 4 (4q28-q31) approximately 360 kb long with each gene segmental duplication unit (SDU) spanning $120 kb comprised of a gene region of $30 kb and an intergenic region of $90 kb ( Figure 1). GYPC and GYPD are located on chromosome 2 and are not discussed here. The GYPA, GYPB, and GYPE genes are evolutionarily related, with at least 95% sequence homology between them resulting from duplication events, whereby GYPB evolved from GYPA, and GYPE evolved from GYPB. 9,25,26 Recently, a large structural variant in the chromosome 4 GYP region that gives rise to the Dantu glycophorin (DUP4) has been associated with about a 40% reduction in risk for severe malaria. 9,27 Interestingly, this Dantu variant was found predominantly in East African populations, but there are many other common structural variants across all the West and East African populations that have been studied. 9 The most common variants identified were deletions of the whole GYPB gene and the surrounding region known as GYPB DEL1 and GYPB DEL2 (Figure 1). However, little research has been conducted on these variants and their population distributions due to the lack of high throughput methods for genotyping these structural variants. With the difficulties in screening for these deletions and other variants, there is also lack of functional data on the effect of GYPB deletions on erythrocyte invasion, the growth of P. falciparum, and the changes that occur on the surface of the erythrocytes with respect to protein expression. Here, we show the development of two separate high-throughput assays for reliably detecting and genotyping GYPB deletions DEL1 and DEL2 that can be used to determine their distribution in populations and identify phenotypes functional investigations of these deletions on P. falciparum erythrocyte invasion and growth. Using data from 393 samples from different ethnic populations in southern Ghana as well as DNA samples from the HapMap and 1000 G projects, we show the development of high throughput assays for GYPB DEL1. We also DEL2 and show that these are sensitive and reliable for screening population samples with little interference from other glycophorin structural variants.

Materials and methods
Location of putative breakpoints for the GYPB whole-gene deletions DEL1 and DEL2 The breakpoints for DEL1 have previously been located on GRCh37 at chr4:144835160-144835280 (4:143914007- 143914127 in GRCh38) in the 3 0 region of the GYPE unit of the SDU, and chr4:144945398-144945517 (4:144024245-144024364 in GRCh38) in the 3 0 region of the GYPB unit of the SDU, 9 while the predicted location of the DEL2 breakpoint was given as 206,000 bases from the 5 0 end of the GYP region (GRCh37:4:144706830, GRCh38:4: 143785677). 9 We downloaded 4 kb of sequence surrounding the DEL1 or DEL2 putative breakpoint coordinates for each of the GYPE, GYPB, and GYPA SDUs, from Ensembl (http://grch37.ensembl.org/Homo_sapiens/). The sequences were aligned using Clustal Omega (www.ebi.ac.uk/ Tools/msa/clustalo/) and manually finished where required (Figure 1 and Supplementary Files 1 and 2). While initial designs were made using GRCh37, we have since compared our analysis with GRCh38 and provided coordinates with respect to GRCh38 or both where appropriate.

Assay design for DEL1 and DEL2 putative breakpoint regions
We developed a PCR-RFLP version of the published DEL1 assay 9 using a similar primer strategy. A forward primer was positioned in the unique sequence 3 0 to the GYPE gene with a common reverse primer positioned in the GYPE-GYPB and GYPB-GYPA regions (Figure 2(a) and 2(b)) but placed to generate a shorter PCR amplicon ($2 kb) than that published. From the human genome reference sequence alignments (Supplementary File 1) of the equivalent SDUs, a restriction enzyme (AciI) site was identified that distinguished between the wild-type and DEL1 sequences (Figure 2(a) and (b)).
The assay for GYPB DEL2 used a strategy similar to that of the GYPB DEL1 assay, but with a unique primer such that at the GYPA end of the sequence (reverse primer) and a common primer placed at the 5 0 end (forward primer) (Figure 2(c) and (d)). From the human genome reference sequence alignments (Supplementary File 2) of the equivalent SDUs, a restriction enzyme (BsrBI) site was identified that distinguished between the wild-type and GYPB DEL2 sequences (Figure 2(c) and (d)). For both GYPB DEL1 and DEL2, the restriction enzymes used were non-palindromic and therefore strand oriented (Table 1, Supplementary Files 2 to 5).

Generation of GYPB variant control DNA
Cell lines from the 1000 G and associated projects with known GYPB deletions or wild-type (identified from the Leffler et al. 9 study and Table 2) were identified and these Figure 2. Schematic representation of strategies for amplifying and testing for the GYPB DEL1 and DEL2 structural variants. (a) Schematic representation of the alignment for the GYP SDUs showing the location of PCR primers (blue rectangles), putative breakpoint (gold rectangle), and AciI restriction site (yellow rectangle). The forward primer GYP_DEL1_F10 is specific to upstream of GYPE in the GYPE-GYPB region. The reverse primer GYPB_DEL1_R2B5 binds to the upstream of the GYPB gene in the GYB-GYPA region. In a normal or wild type individual, the GYPB_DEL1_R2B5 in the GYPE-GYPB region and the GYP_DEL1_F10 forward primer forms a PCR product made of sequences in the GYPE_GYPB region. In the GYPB DEL1 state, the PCR product formed is made of sequences in the GYPE and GYPA region because the GYPB is deleted. (b) Alternate schematic representation of the GYPB DEL1 RFLP assay showing a normal chromosome and the GYPB DEL1 chromosomes aligned. Genes (green, orange, and purple rectangles) and primers (blue rectangle) are indicated as well as the AciI restriction site and PCR-digestion fragment lengths. (c) Schematic representation of the alignment for the GYP SDUs showing the location of PCR primers, putative breakpoint (gold rectangle), and BsrBI restriction site (yellow rectangle). The forward primer GYP_DEL2_F3 is common to the GYPA downstream of the GYPB-GYPA region. The reverse primer GYPB_DEL2_R3 specifically binds to the upstream of the GYPE gene in the GYPE-GYPB region. In a normal or wild type individual, the GYPB_DEL2_F3 in the GYPB-GYPA region and the GYP_DEL2_R3 primer forms a PCR product made of sequences in the GYPB_GYPA region. In the GYPB DEL2 state, the PCR amplicon formed is made of sequences in the GYPE and GYPA region because the GYPB is deleted. (d) Alternate schematic of the GYPB DEL2 RFLP assay showing a normal chromosome and the GYPB DEL2 chromosomes aligned. Genes (green, orange, and purple rectangles) and primers (blue rectangle) are indicated as well as the BsrBI restriction site (red dotted rectangular area) and PCR-digestion fragment lengths. Coordinates of sequences are given with respect to GRCh38.  Table 3a and 3b, with primers purchased from IDT (Leuven, Belgium). Restriction enzymes AciI (Catalog number: R0551L, NEB, UK) and BsrBI (Catalog number: R0102L, NEB, UK) were purchased (Table 4a). Reactions were prepared in 96-well plates (#AB-800, ThermoFisher Scientific, UK) and cycled on an MJ Tetrad (BioRad, UK) as described in Table 3a and 3b. The PCR products were digested with the relevant restriction enzymes (NEB, UK) for 2 hours at 37 C and then the digestion fragments were separated on 1% agarose gel electrophoresis containing ethidium bromide (3 ng/uL) for 2-2 1 = 2 hours. Products were visualized under UV light and photographed to allow genotype assignment. All plates contained control samples obtained from the NHGRI repository (Table 2).

Genotyping cell lines
We tested the DEL1 and DEL2 assays on several cell lines ( Table 2) that were identified from whole genome sequence analysis as having different GYP variants 9 to check for cross-reactions or aberrant products.

Screening for GYP Dantu (DUP4)
Samples were screened for the glycophorin variant Dantu (DUP4) using the assay described by Leffler et al. 9 Ethical approval and screening population in Ghana for GYPB DEL1 and DEL2 Ethical approval for two ongoing studies on glycophorins and malaria was granted by the Ethics Committee for Basic and Applied Sciences, College of Basic and Applied Sciences, University of Ghana (CPN: ECBAS 037/ [18][19], and the Noguchi Memorial Institute for Medical Research IRB, University of Ghana (CPN 004/11-12). Written informed consent was obtained from all the study participants or their parents/guardians in the case of the children. The assays developed were used to genotype DNA samples obtained from volunteers who had been enrolled in various ongoing studies in three areas of Ghana namely: Accra, Kintampo, and Hohoe.
Venous blood samples were collected and following curation of self-reported ethnicity to only include individuals of Ghanaian origin, comprised; Kintampo (n ¼ 147), Hohoe (n ¼ 43), and Accra (n ¼ 203) ( Figure 6, Table 5). Genomic DNA was extracted using the Qiagen QIAmp Blood Mini Kit or Chelex-100 as described for different batches of samples. 28 The DNA samples were quantified using Picogreen as described above. The gDNA samples were diluted to 20 ng/mL and stored in 96-well PCR plates at À20 C until ready for genotyping. GYPB DEL1 Table 1. Samples selected for testing and sequencing to identify the breakpoints for GYPB DEL1 and DEL2. Note: Primers were designed using Primer3 and purchased from IDT (see methods). Alignments are shown in the GYP region. One primer from each PCR (GYP_DEL1_F10 and DEL2_GYPBAs_R3) was unique to the GYPB DEL1 and GYPB DEL2 breakpoints respectively while the other primers were designed against homologous sequence (GYP_DEL1_R2B5 and DEL2_GYPEBAc_F3).

Use
Two primers were designed to internal regions of the GYPB DEL2 amplicon to aid with sequencing (DEL2_BP_seq_REV1 and DEL2_BP_seq_FWD)   II  II  GM06985  CEU  CEPH  USA  N/N  II  II  GM06986  CEU  CEPH  USA  N/N  II  II  GM06994  CEU  CEPH  USA  N/N  II  II  GM18522 YRI II  II  GM12829  CEU  CEPH  USA  DUP23/N  XX  XX  GM12249  CEU  CEPH  USA  DUP28/N  II  II  HG02554  ACB  Afro-Caribbean  Barbados  DUP4/N  II  II  HG02585  GWD  Mandinka  Gambia  DUP6/N  II  II  GM18545  CHB  Han  China  TRP1/N  XX  II  GM18620  CHB  Han  China  TRP1/N  XX  II  GM12341  CEU  CEPH  USA  TRP13/N  II  II  GM11894  CEU  CEPH  USA  TRP5/N  XX  II  GM18852  YRI  Yoruba  Nigeria  UNK  XX  II  GM19221 YRI and DEL2 genotyping assays were undertaken as given in Tables 3 and 4.

Sequence analysis of GYPB DEL1 and DEL2 PCR products
For Sanger sequencing, PCR amplicons were separated on the agarose gels and extracted from the gel using the Qiagen PCR gel-extraction kit (Qiagen QIAquick Gel Extraction) as described by the manufacturer. The concentration of the DNA recovered was determined by Quant-i Picogreen assay (Invitrogen, UK). Samples were prepared following instructions described by the sequencing company and sent for Sanger sequencing by Eurofins Genomics (Ebersberg, Germany [https:// www.eurofinsgenomics.eu/en/custom-dna-sequencing/ eurofins-services]). The sequence data were inspected and curated using Chromas (https://technelysium.com.au/ wp/chromas/) to generate FASTA files for the different sequencing reactions. These data were aligned using the multiple-sequence-alignment tool Clustal Omega (https://www.ebi.ac.uk/Tools/msa/clustalo/), after which pile-ups were manually curated and residues annotated according to the consensus sequence with respect to the PCR amplicon primers. Paralogous sequence  Note: 2 hours digest at 37 C then 5 minutes at 65 C to inactivate the enzyme (optional). Add 5 mL loading gel to the full reaction. Load 10 mL onto a 1% agarose gel with ethidium bromide. 100 V for 2 -2 1 =2 hours using Bioline Hyperladder 1 kB (BIO-33053).
Amuzu et al. Genotyping assays for identification of glycophorin B deletion variants 921 differences between the three genes SDUs were used to confirm the PCR products for both DEL1 and DEL2 and also to identify the putative breakpoint regions for GYPB DEL1 and DEL2.

Development of novel GYPB DEL1 and DEL2 PCR-RFLP assays
We have successfully designed a PCR-RFLP assay for GYPB DEL1 using an AciI restriction enzyme digest to differentiate GYPB DEL1 from the reference ("normal"/wild type) or non-DEL1 forms. A schematic representation of the strategy for the restriction digest is presented in Figure 2(a) and 2 (b) and Supplementary Figure 2. The DEL2 deletion assay was designed in a similar way as the DEL1 assay using the BsrBI restriction enzyme (Figure 2(c) and (d) and Supplementary Figure 3). Several cell lines identified as DEL1 or DEL2 homozygous or heterozygous, 9 were used to test both assays and the expected banding patterns were observed ( Figure 3). The non-DEL1 homozygous reference samples gave two visible PCR amplicons on agarose gelelectrophoresis after AciI digest (1.9 kb and 0.3 kb), while the DEL1 deletion homozygote samples were not cut and gave a single visible 2.2 kb amplicon. Samples that are heterozygous for DEL1 gave a combination of all three bands (0.3 kb, 1.9 kb, and 2.2 kb) (Figures 2(a), 2(b), 3(a) and Table  4). Four other smaller products (all less than 50 bp) are   It is worth noting that samples that show negative result for any of the deletions should be classified as non-DEL1 or non-DEL2, respectively, since the assays only detect variants positive as homozygous or heterozygous. Also, the PCR-RFLP assays that we developed were used on non-DEL1 and non-DEL2 celllines that carried other GYP structural variants to confirm specificity of reactions and assays (Figure 4). Across the other cell lines tested, only non-DEL1 and non-DEL2 banding patterns were observed after restriction digest and gel electrophoresis, thus confirming the specificity of the assays. in GRCh38). The 5 0 boundary was identified by a tandem repeat motif made up of CA and AT repeats (XXXXXXXX in Figure 5(a)), while the 3 0 end was marked by a single paralogous base difference (A/G, marked Y in Figure 5(a)) between the reference sequences. A further 62 bases upstream from the 3 0 boundary, there is also a 2-base paralogous difference between the reference sequences (ZZ in Figure 5(a)). The region bounded by these distinguishing motifs identifies a 111 base sequence within which the putative breakpoint occurs and is $8.5 kb from the GYPE ATG start site and $4.9 kb from the GYPB ATG start site, and deletes 110 kb to form DEL1 ( Supplementary Files 4 and 6).  Figure 5(b)). The 3 0 end is  ............................................................................................................................................................ marked by 3 separate single paralogous base differences between the reference sequences, all within 23 bases of each other (marked as Y in Figure 5(b)). This identifies a 129 base sequence within which the putative breakpoint occurs and is located $86 kb from the GYPE ATG start site and $76 kb from the GYPB ATG start site, and deletes 103 kb to form DEL2 (Supplementary File 5).

Distribution of GYPB DEL1 and DEL2 genotypes in Ghana
The assays developed were used to genotype DNA samples obtained from volunteers who had been enrolled into various ongoing studies in three areas of Ghana, namely Accra, Kintampo, and Hohoe. Individuals from Accra and Kintampo were all children between the ages of 1 to 15 years (mean 5.4 years from Accra [47:53% male:female ratio] and 3.25 years from Kintampo [41:59% male:female ratio]), while individuals from Hohoe were all adults (>18 years) with 39:61% male:female ratio (Table 5 and Figure 6). In total, 393 individuals were included for genotyping after curating for self-reported ethnicity to only include those of Ghanaian origin. There were 21 distinct self-reported ethnicities (Supplementary Table 1) of which 6 had N ! 20 individuals (No. chromosomes ! 40 providing a minimum detection of 2.5% allele frequency; Table 5).
The two new GYPB deletion assays as well as the assay for detecting the Dantu variant were used to measure the frequency of the DEL1, DEL2, and Dantu variants among the selected individuals at the three locations in Ghana (Table 6 and Supplementary Table 2). GYPB DEL1 was present at all three sites, with an overall GYPB DEL1 allele frequency of between 4.1% and 6.2%, while GYPB DEL2 was estimated at an overall allele frequency of between 1.3% and 5.0%. In Accra, the frequency of DEL1 was 4fold higher than DEL2, while in Hohoe and Kintampo they were similar (4.1% vs. 5.0%, and 5.7% vs. 4.2%, respectively). As expected, we did not identify any individuals carrying the Dantu variant 9 (Supplementary Table 2 influenced the overall allele frequencies which skewed the overall estimates. The data were therefore analyzed using the main ethnic groups represented in the dataset which were at least 20 individuals for any given group. Of the 393 samples across the 3 study sites in Ghana, 325 came from 6 ethnic groups (Akan, Ewe, Ga, Konkomba, Mo, and Dagarti; Tables 1 and 2). GYPB DEL1 varied between 1.8% and 9.0% overall, while GYPB DEL2 varied between 0.7% and 11.9% overall. When analyzed by the study site, the DEL1/2 frequency estimates became more unreliable due to small sample sizes but where sites-ethnic groups had N ! 20, the estimates ranged from 2.17% to 7.4% (DEL1) and 0.7% to 11.9% (DEL2) ( Table 6). In total, there were 21 ethnic groups represented of which 11/21 possess DEL1 (of the other 10 groups where DEL1 was not detected, all had 4 or fewer individuals). For DEL2, 7/21 groups showed the presence of DEL2 (but not necessarily the same groups as DEL1, Supplementary Table 2). GYPB DEL2 was not detected in 14/21 groups (of which 10 had 4 or fewer individuals and 4 groups between 8 and 19 individuals).

Discussion
Glycophorins on the surface of erythrocyte are used by malaria parasites to mediate invasion, 11 as such, variants of the gene may protect against malaria parasite infection through mechanisms such as slowing parasite growth or reducing chances of developing severe malaria. 8,9 To better assess the effects of these structural variants on resistance to malaria, there is a need for population surveys in malaria-endemic regions across sSA to generate prevalence data and identify phenotypes for conducting functional studies. However, surveys have been limited by the lack of reliable high-throughput assays for the identification of such phenotypes. The challenge of designing highthroughput PCR-based screening assays is due to the  Table 2). high sequence homology (>96%) between the GYPE, GYPB, and GYPA genes. 29,30 In this study, we overcame this challenge and successfully designed two high throughput PCR-RFLP protocols that detect the presence of the two most common GYPB deletion variants in West African populations, GYPB DEL1 and DEL2. The assays were confirmed by Sanger sequencing and analyzing the PCR products of the assays. Furthermore, these assays can be performed in a 96-well plate format, followed by agarosegel electrophoresis, making it possible to run over 90 samples in a single experiment. The assays were validated in the field by comparing their performance with an existing assay for the detection of the GYP Dantu variant (DUP4) in screening 393 individuals from three study sites in Ghana.
The specificity and sensitivity of the assays in the field demonstrate their applicability as less time-consuming and less expensive options to long-read sequencing for conducting large-scale studies on the population distribution of these variants. Details for the putative location of the breakpoints for GYPB DEL1 and DEL2 originally came from a previous study that used whole-genome sequencing from the 1000 G samples, plus additional African samples. 9 This initial information was used to align the reference sequences for GYPA, GYPB, and GYPE across the putative breakpoints and identify the paralogous differences. Due to the high homology between the three GYP regions, identifying primers that will specifically amplify each region was challenging. It was however easy to design a single primer that could anchor the three genes and their surrounding regions. These features were used to overcome the challenge of developing an assay to differentiate between the three genes and their surrounding regions. For GYPB DEL1, a forward primer was designed to bind to the unique sequences near the GYPE transcription start site, while the counterpart (reverse primer) was common to the three regions. In the case of DEL2, designing a specific primer was more challenging because the breakpoint was not close to any entirely unique region. To overcome this, the DEL2 specific primer was placed at a location on the gene cluster close to the GYPB DEL2 breakpoint with sufficient sequence variation to allow the PCR conditions to discriminate between the sequences. In view of the fact that both the resulting reference and deletion amplicons for each assay would be of the same length, restriction enzymes were used to distinguish between them. The restriction enzymes AciI for the digestion of GYPB DEL1 PCR product and BsrBI for GYPB DEL2 PCR products were identified and selected. Other restriction enzymes may work well but have not been explored or tested in this study. One problem with using restriction enzymes is the possibility that the recognition sites themselves may contain population variation that could complicate the interpretation of the assays; however, from current information in genome-browsers and variation databases (dbSNP153), this variation appears to be uncommon for the restriction sites used here.
The two assays were used to analyze DNA from HapMap and 1000 G cell-lines that had known GYPB types identified in the Leffler et al. study. 9 This allowed further optimization of the PCR conditions and also the use of Sanger sequencing to validate the amplicons produced by the PCRs. When the homozygous GYPB DEL1 or DEL2 samples were amplified, the sequences obtained could be seen to change from one reference sequence to another and the paralogous sequence differences were used to identify the region where the switch occurred from GYPA to GYPE. Further sequence data would be required to identify whether these breakpoint boundaries are the same for all GYPB DEL1 and DEL2 chromosomes and begin to understand whether the flanking sequences were important for the mismatches during chromosome replication. In both GYPB DEL1 and DEL2 deletions, the equivalent of a whole SDU was removed amounting to $100 kb each.
The performance of the two assay systems we have developed was evaluated by screening nearly 400 individuals from three different sites in Central to Southern Ghana. Considering the ethnic diversity at each study site and the ethnic group sample numbers, the overall allele frequency of GYPB DEL1 varied between 4.1% and 6.2%, while that of GYPB DEL2 varied between 1.3% and 5.0%. Work done by Gassner et al. 31 reported that allele frequencies of the other three distinct deletions within African ethnicities varied greatly, to the extent that among the Congolese Mbuti Pygmy populations, cumulative allele frequencies were as high as 23.3%. The frequencies reported in our current study are similar to frequencies reported in other West African populations. 9 In this study, analysis of the main ethnic groups with at least 20 individuals, showed the allele frequencies of DEL1 and DEL2 varied between $1% and $12%, with GYPB DEL2 mostly lower than DEL1. It is worth noting that the frequency estimation within ethnic groups where the sample sizes are below 100 (n ¼ 200 chromosomes) will require confirmation by screening larger sample sizes to have a high power of study and confidence. In general, larger sample sizes would be required for DEL2 surveys as the allele frequency was less than that of GYPB DEL1. In all the ethnic groups with more than 20 individuals, we detected GYPB DEL1 and DEL2; however, non-DEL1 and non-DEL2 individuals could only be identified by increasing the number of study participants across all the ethnic groups. These two new assays will thus allow surveys on a larger scale to determine the distribution of these two main variants in other West African populations and also identify phenotypes for functional assays to investigate the effects of GYPB DEL1 and DEL2 on the susceptibility of erythrocytes to being invaded by P. falciparum and the resulting impact on disease pathogenesis.
The ability to identify these genotypes of interest from large populations accurately and rapidly has been an important goal in high-throughput genetic screening assay development. 32,33 Therefore, the current DEL1 and DEL2 assays offer an opportunity for rapid screening of populations for these GYPB polymorphisms that are common especially in West Africa. Furthermore, for any malaria-related studies, it is important to collect alongside the GYP variants, other key genetic information such as sickle (rs334) which is present throughout Africa, 34 HbC (rs33930165; present in Ghana and other West African Countries), 35 and G6PD, 36 as these may act as confounders in studies examining associations with susceptibility to malaria or effects on malaria parasite invasion and growth.
Developing less expensive high throughput assays targeting the less common GYP variants will provide a better understanding of the distribution and functional effects of these variants on susceptibility and pathogenesis of malaria and other disease causing pathogens that also use GYPB as a receptor. Such understanding may prove useful in guiding the design of vaccines or other therapeutic interventions targeting the pathogen interactions with these GYP proteins on the erythrocyte surface. These assays are also important for identifying individuals with the various genotypes of GYPB (homozygous, heterozygous and wild type), which is necessary for investigating the functional significance of these gene variations.