DNA copy number analysis in gastrointestinal stromal tumors using gene expression microarrays.

We report a method, Expression-Microarray Copy Number Analysis (ECNA) for the detection of copy number changes using Affymetrix Human Genome U133 Plus 2.0 arrays, starting with as little as 5 ng input genomic DNA. An analytical approach was developed using DNA isolated from cell lines containing various X-chromosome numbers, and validated with DNA from cell lines with defined deletions and amplifications in other chromosomal locations. We applied this method to examine the copy number changes in DNA from 5 frozen gastrointestinal stromal tumors (GIST). We detected known copy number aberrations consistent with previously published results using conventional or BAC-array CGH, as well as novel changes in GIST tumors. These changes were concordant with results from Affymetrix 100K human SNP mapping arrays. Gene expression data for these GIST samples had previously been generated on U133A arrays, allowing us to explore correlations between chromosomal copy number and RNA expression levels. One of the novel aberrations identified in the GIST samples, a previously unreported gain on 1q21.1 containing the PEX11B gene, was confirmed in this study by FISH and was also shown to have significant differences in expression pattern when compared to a control sample. In summary, we have demonstrated the use of gene expression microarrays for the detection of genomic copy number aberrations in tumor samples. This method may be used to study copy number changes in other species for which RNA expression arrays are available, e.g. other mammals, plants, etc., and for which SNPs have not yet been mapped.


Introduction
Chromosomal aberrations are frequently observed in cancer, and whole-genome analysis of copy number change in tumor cells has become a useful tool for tumor classifi cation, tumor marker discovery, and for studying tumorigenesis. The initial application of chromosomal comparative genomic hybridization (CGH), co-hybridizing differentially labeled tumor and normal genomic DNA to normal metaphase spreads, identifi ed genomic regions of deletions and amplifi cations in various tumor samples and cell lines (1), allowing copy number estimation at around 10 megabase resolution. Recent advances in microarray technology have provided higher resolution tools for genome wide analysis of copy number estimations. An early array-based study used a spotted chromosome-specifi c library or cloned genomic fragments to investigate copy number changes in tumor samples (2). Later developments, using microarrays derived from genomic clones (3), cDNA (4), BAC clones (5) and oligonucleotides (6)(7)(8)(9)(10)(11)(12) provided higher resolution analyses. By using high-density SNP oligonucleotide microarrays, Bignell et al. (8) described an assay and algorithm for copy number analysis on various cancer cell lines to identify homozygous deletions and high-level amplifi cation. Other oligonucleotide-based microarray studies used longer oligos, 60-or 70-mers, to identify copy number changes in cancer cells (6,7).
Gene expression profiles have been used successfully to classify tumors (13), (14), including gastrointestinal stromal tumors (15). To better understand the role of DNA copy number aberration in tumorigenesis, efforts have been made to correlate gene expression patterns to specifi c genomic alterations (16)(17)(18)(19)(20)(21)(22). While genes in the altered genomic regions are not necessarily regulated by DNA dosage, copy number aberrations may infl uence genome-wide gene expression patterns. If both genomic DNA and RNA are available from the same sample, both copy number analysis and RNA expression analysis can be performed on the same arrays. Thus, it is possible to assess whether a probeset is in a region that is both amplifi ed and over-expressed. Such regions may be of greater interest for further study, both to understand the pathogenesis of disease and to explore the possibility of discovering diagnostic biomarkers.
Gastrointestinal stromal tumor (GIST) is the most common mesenchymal tumor of the intestinal tract (23). GISTs express KIT protein and show in a significant number of cases activating mutations in either KIT or PDGFRA genes, encoding for class III tyrosine kinase receptors (24,25). Cytogenetically, GISTs show a rather simple karyotype with common losses of chromosome 14, 22 and 1p, in most cases, regardless of the KIT genotype. Since these simple genomic copy number aberrations have been previously confi rmed by metaphase CGH (26)(27)(28) and BAC-array analysis (29), GIST represents an ideal tumor model for evaluating array-based methods for copy number analysis.
In this study, we describe an assay for detecting copy number changes by hybridizing genomic DNA to oligonucleotide microarrays designed for RNA expression profi ling. We applied this approach to examine the genomic copy number changes among various cell lines and GIST tumors. Algorithm and method development were performed on cell lines containing various numbers of X chromosomes and known deletions and amplifi cations. This method, Expression-Microarray Copy Number Analysis (ECNA), allowed us to readily identify genes that showed copy number alterations starting with as little as 5 ng genomic DNA. ECNA was validated on GIST tumors in which previously described as well as novel copy number aberrations were identifi ed.

Materials and Methods
Cell lines and DNA DNA samples used in this study fall into 3 categories: DNA extracted from cell lines, normal human blood, and GIST tumors. DNAs from cell lines containing different copy numbers for the X chromosome: 1X(NA01723A), 2X(NA09899), 3X(NA04626), 4X(NA01416), and 5X(NA06061) chromosomes and from a Chromosome 4 deletion cell line (NA04126) were purchased from the National Institute of General Medical Sciences (NIGMS) Human Genetic Cell Repository, Coriell Institute for Medical Research (Camden, NJ). A human breast cancer cell line, SK-BR-3, was obtained from the American Type Culture Collection (ATCC, Manassas, VA). DNA was extracted from the cultured cells using the DNA Maxi Kit (Qiagen, Inc., Valencia, CA). DNA from normal blood was obtained from AllCells, LLC (Emeryville, CA). GIST sample DNA was obtained using a standard organic phenol-chloroform procedure. There were 5 GIST samples from 4 patients. Three of the samples were taken from the primary tumor resection, and in one patient two abdominal recurrences removed at different time points were analyzed (GIST#159, 199). The diagnosis was confi rmed by pathologic review and immunoreactivity for KIT. Three samples had a KIT exon 11 mutation (GIST# 159,198,199) and 2 had a PDGFRA exon 18 deletion (GIST#171, 204). All samples used for ECNA are listed in Supplementary Table 1.
Whole genome amplifi cations, purifi cation, fragmentation and labeling 5-25ng genomic DNA was amplifi ed using QIA-GEN's REPLI-g ® kit (Qiagen) for 16 hours at 30 °C, according to the protocol provided by the RNA hybridization to Affymetrix U133A arrays RNA from 10 tumor samples (4 GISTs and 6 leiomyosarcomas) was analyzed on Affymetrix human genome U133A arrays. Leiomyosarcomas, which are malignant mesenchymal neoplasms of smooth muscle derivation closely resembling GIST morphologically, but genetically distinct from GIST, were used as a control reference. RNA was isolated using the protocol accompanying the RNAwiz™ RNA Isolation Reagent from Ambion (Austin, TX) and all samples were treated on the column with RNase-free DNase (Qiagen, Valencia, CA) according to the manufacturer's instructions. Twenty-fi ve to 50 nanograms of total RNA were tested for quality on an RNA 6000 Nano Assay (Agilent, Palo Alto, CA) using a Bioanalyzer 2100. RNA with an OD260/280 ratio greater than 1.8 was chosen for expression profi ling experiments. Two micrograms of high quality total RNA was then labeled according to protocols recommended by the manufacturer. Briefly, after reversetranscription with an oligo-dT-T7 (Genset), double stranded cDNA was generated with the Superscript double stranded cDNA synthesis custom kit (Invitrogen Life Technologies, Carlsbad, CA). In an in vitro transcription step with T7 RNA polymerase (MessageAmp™ RNA kit from Ambion) the cDNA was linearly amplifi ed and labeled with biotinylated nucleotides. Ten micrograms of labeled and fragmented cRNA were then hybridized onto a test array and a Human Genome U133A expression array (Affymetrix, containing probesets representing 18,000 transcripts and variants). Post hybridization staining and washing were processed according to the manufacturer (Affymetrix). Finally, chips were scanned with a GC3000 laser confocal scanner.

Data analysis of copy number change
The copy number analysis workfl ow is summarized in Figure 1. The following steps were carried out sequentially: data normalization, data fi ltering, chromosome data mapping, reference and validation data set selection, DNA copy number estimation by computing a Z score and Stouffer Z score for each probe set, method validation and GIST samples copy number estimation. Details of the method are described in the Supplementary Materials and Methods.
Data analysis of RNA samples hybridized to U133A arrays Analysis was performed using Affymetrix PLIER algorithm (Affymetrix Technical Note 1) (30). Principal Component Analysis (PCA) and oneway ANOVA were performed using Partek software. Data visualization was rendered either  was ethanol-precipitated, and resuspended in hybridization buffer. The probe mix was then denatured at 70 ºC for 10 minutes, followed by preannealing at 37 ºC for 30 minutes. The reference probe was denatured separately, without preannealing, and combined with the denatured reference probe on the slide for overnight incubation at 37 ºC. After standard post-hybridization washes, the slides were stained with 4', 6-diamidino-2-phenylindole (DAPI) and mounted in VECTASHIELD ® antifade mounting medium (Vector Laboratories). Analysis was done using a Nikon E800 epifluorescence microscope with MetaSystems Isis 3 imaging software. A minimum of 100 cells was scanned over separate regions for each slide. Image z-stacks were captured using a Zeiss Axioplan 2 motorized microscope controlled by Isis 5 software (Metasystems).

Detection of copy number changes
To confi rm that differences in signal are proportional to the differences in copy number, we performed the assay on cell lines with variable numbers of X chromosomes ranging from 1 to 5 copies. The probesets on the X chromosome show a proportional increase in signal ( Fig. 2A) when each cell line is compared to a 1X cell line. The Z score, which provides a point estimation of copy number for each probeset, is derived by comparing the signal of each probeset in a sample to that of a reference sample set (Fig. 2B). Chromosomes other than the X chromosome were analyzed and found not to have copy number variation in the samples tested (data not shown).
In chromosomal copy number estimations, a range of values is typically seen. It is important to choose thresholds or cut-off values, above or below which a region may be called amplifi ed or deleted. The 69 samples that passed the 67% present call rate cut-off value and used in these analyses (Supplementary Table 1), have known numbers of X chromosomes (Fig. 3). In this study, Z scores in windows of 500,000 bp were used to compute Stouffer Z values. The tighter distribution seen with the Stouffer Z sliding window approach refl ects the reduction of noise obtained (Figs. 3A, B) and was used for fi nal copy number estimations. A clear separation was observed between the median Stouffer Z scores for each of A B Figure 2. Detection of copy number changes in cell lines with variable numbers of X chromosomes. The X chromosome copy number of each sample is known, which is indicated on the x-axis. The box and whisker plot (35) represents the interquartile range (between 25% and 75%) and the line within the box denotes the median. The whiskers extend to the last observation before the outliers, which are plotted individually as dots. Outliers of values greater than 7 are not plotted. A. The observed pair-wise signal ratio of each probe set on X-chromosome against a 1X sample is shown here. B. The Z-score, calculated using a reference set of 37 samples, shows similar results compared to the single sample reference in A.   the 5 sample sets, bearing 1 to 5 X-chromosome copies, and 2-fold changes could be distinguished by this method (Fig. 3C). In this model, a 2-fold change between 2X and 4X was easier to determine than the change between 1X and 2X. However, 3-fold or greater changes can be distinguished much more easily than smaller changes. When assessing copy number changes in unknown samples, it is important to use thresholds with defi ned levels of confi dence. Since the median and mean Stouffer Z values were highly similar (Fig. 3D), in subsequent analyses we chose to use the mean values, plus or minus 2 S.D. as the threshold value to identify chromosomal deletions and amplifi cations. These threshold values are highlighted in Figure 3D.

Validation of known deletions and amplifi cations
Applying the cutoffs listed above, the known deletion on chromosome 4 (4p16.3) from the NA04126 cell line, derived from an individual with Wolf-Hirschorn syndrome, was detected with a number of probesets falling below the 2 S.D. line (Fig. 4). Of these, 4 probesets (shown as blue dots in the inset image) map to the WHCR gene, known to cause Wolf-Hirschorn syndrome. In contrast, analyzing the breast cancer cell line SK-BR-3, in which portions of chromosome 8 q are known to be amplifi ed, a number of probesets are shown to be highly amplifi ed, including those representing the c-myc oncogene (Fig. 5). c-Myc is commonly amplifi ed in breast cancers and is known to be amplified in SK-BR-3. These data demonstrate that known deletions and amplifi cations in cell lines can be accurately detected using ECNA.

Copy number changes in GIST
The next step was to apply this methodology on tumor samples. GIST is an ideal tumor model for testing the sensitivity of this system, since it has relatively few copy number changes, and these are well-documented using both low and high resolution approaches (29,36). We therefore analyzed 5 GIST genomic DNA samples for a global assessment of segmental gains and losses. Figure 6 shows the genomic view of the ECNA data for 3 of these tumors. There are chromosomal regions identifi ed as clearly changed in all three GIST samples compared to the control sample ( Fig. 6A). For example, in all tumors, the majority of probe sets in 1p are well below the 2X copy number line (0 on the y-axis.) The 1p-arm appears to fall at about a 1X copy number, indicating loss of 1p, while 1q appears to be gained in these samples. The 2 samples (GIST#159,199) originating from the same patient from two subsequent recurrences, at 14-month intervals, showed very good concordance overall between the copy number changes (Figs. 6C and 6D, and Supplementary Table 2). Interestingly, GIST#199, the later recurrence, showed additional losses at 5q23-35 and 8p12-23 as compared to GIST#159, suggesting the possibility of deletions of candidate tumor suppressor genes involved in tumor progression.
The copy number changes detected by ECNA are listed in Supplementary Table 2. Briefl y, the majority of GIST tumors showed losses of 14q (4/5 samples), 22q (3/5 samples) and 1p (5/5 samples). Furthermore, smaller regions of loss were consistently noted, such as 1p36 (seen in 4/5 GIST samples), 13q34 (4/5 samples) and 21q22 (seen in 3/5 GISTs). The two GIST samples harboring mutations in PDGFRA exon 18 did not show distinct fi ndings compared to the 3 samples harboring mutations in exon 11 of the KIT gene. GIST#171 showed the lowest number of alterations, while GIST#204 had the most copy number changes, more similar to the samples with KIT mutations. A summary of our fi ndings in comparison with other copy number analysis methods, such as CGH (1) and BAC-array CGH, for which results have been previously reported in GIST, is shown in Table 1.
To further validate our assay, we performed a comparison with another copy number technique with similarly high resolution. Figure 7 shows the concordance of our results on chromosome 1, with copy number analysis performed on the GeneChip Human Mapping100K arrays. Results are similar between the two methods; both reveal deletions and gains on chromosome 1p and 1q, respectively (Figs. 7A and B). The U133 Plus 2.0 arrays are gene-centric, whereas the SNP arrays span coding as well as non-coding regions of the genome. This complementarity of coverage is evident in the distribution of probesets or SNPs in the respective arrays. The SNP arrays additionally provide allele-specifi c information, as illustrated by the loss of heterozygosity (LOH) results in Figure 7C.   Comparison of copy number changes with expression data and validation using FISH One-way ANOVA identifi ed probesets that were signifi cantly over-or under-expressed in GIST samples compared to leiomyosarcoma tumor samples. Some regions were identifi ed that showed both copy number change and a corresponding difference in expression as indicated by a significant p-value. Some of these low p-value probesets were mapped back to the chromosomes and regions were selected that showed both copy number change as well as signifi cant difference in expression. Of these, two genes, PRKAR2B on chromosome 7 and PEX11B on chromosome 1 showed gains of approximately 5-fold by their respective Stouffer Z scores, and expression levels that were signifi cantly higher in GIST compared to leiomyosarcomas. FISH analysis showed increased copies of PRKAR2B in the majority of the cells of GIST 198, 4-5 signals/nucleus (Fig. 8A), GIST 199, 4 signals/nucleus, and GIST 159, 2-3 signals/nucleus. However the reference centromeric chromosome 7 probe also showed increased copy number in all 3 GIST cases tested, with the PRKAR2B to chromosome 7 centromere ratio close to 1 (range 0.8-1.2). This result correlates with the whole chromosome 7 gains observed in ECNA (Table 1). Similar fi ndings were found with PEX11B on 1q21.1, with extra copies of both PEX11B and chromosome 1, with a ratio close to 1. One tumor (GIST199) showed 3-4 PEX11B signals/nucleus (Fig. 8B) and the other (GIST159) 2-3 copies.

Discussion
In this study we designed a method to estimate chromosomal copy number by hybridizing genomic DNA to Human Genome U133 Plus 2.0 arrays, typically used to study levels of RNA expression. An important advantage of our novel assay is that it requires only a very limited amount of DNA, i.e. as little as 5 ng starting material. Additionally, many of the advantages of an established array platform such as whole genome representation, probe set annotations and algorithms to estimate probe set signals were available to us by using this expression array approach.
We developed this method by taking samples with known differences in X-chromosome copy number. We chose a sliding window approach to generate Stouffer Z scores that were used to estimate copy number changes. The approach was then applied to and confi rmed on cell lines with known chromosomal abnormalities. Finally, we used this approach to assess copy number changes in gastrointestinal stromal tumor (GIST) samples. GISTs are known to have copy number aberrations, some of which have been identified by other techniques.
In a recent study, Auer et al. (37) used a similar gene resolution analysis of copy number variation, the Affymetrix U133 Plus 2.0 arrays. Similarly, the authors conclude that this approach provides more reproducible results than custom-made BAC CGH arrays, that can be compared among different laboratories and can be combined with gene expression data using the same platform. Their results show a good concordance between the copy number changes detected by a 19k BAC high density microarray platform and the Affy expression arrays. Comparable with our approach, the authors choose various cell lines with known amplifi cations/deletions, such as neuroblastoma cell lines, to validate the variations in gene copy number. However they do not extend the use of this application to routine clinical tumor samples.
The most common fi ndings, which have been reported by both conventional and BAC-array CGH, include losses of part or all chromosome 14, loss of chromosome 22, and loss of 1p (26,29) ( Table 1). Our method confi rmed these results, showing a high incidence of 14q, 22q and 1p losses. Furthermore, we provide evidence that increased resolution in the current platform facilitated the identifi cation of small alterations that were missed by a lower resolution BAC-array CGH platform. Three areas of interest were pinpointed by this method including losses of 1p36, 13q34 and 21q22, the fi rst two previously highlighted by BAC array CGH, while the third locus being a novel fi nding. In addition, novel gains of PEX11 on 1q21 and PRKAR2B on 7q22 were confi rmed in this study via FISH and this chromosomal region was shown to have signifi cant difference in expression patterns when compared to a control sample.
Several copy number analysis methods are now available. The amount of DNA needed varies among the different methods, from as little as 5 ng as used in this study, to 400 ng (8) or 2 µg (7). ECNA uses whole genomic DNA without complexity reduction followed by amplifi cation with φ29 DNA polymerase, in contrast to the WGSA method used on SNP arrays (8,10,31). We analyzed the same GIST samples with the SNP array method. Despite the fact that the assay and array designs are distinct, the results obtained are highly similar (Fig. 7). While the SNP arrays have very dense genomic coverage, the HG U133 Plus 2.0 arrays are gene-centric, and so may have representation in regions where SNP coverage is limited or absent. Thus, in addition to showing good concordance, these methods are complementary. Most copy number analysis methods described thus far generate a list of genomic regions undergoing copy number alterations, without further details on their impact on gene expression. ECNA promises to link the areas of loss or gain with information related to the expression level of their corresponding probe sets, as RNA from the same sample can be used to analyze levels of RNA expression on the same platform.
Many of the advantages of an established array platform such as whole genome representation, probe set annotations and algorithms to estimate probe set signals were available to us by using this expression array approach. A clear advantage to this approach is that copy number alterations may be studied in other species, such as mouse, rat, and other model organisms for which expression arrays are available, and for which SNPs have not yet been mapped. This assay has been successfully used by other researchers to detect copy number changes on HG U133 Plus 2.0 arrays (38). Additionally, experimental evidence has shown that this assay may be used on Affymetrix tiling arrays (K. Wu, unpublished data) to assess copy number changes. We also believe that this approach will prove valuable in studying copy number aberrations in clinical samples because of the availability of relatively small amounts of starting material. Table S1. Experimental samples are listed by Experiment name, sample type, number of X chromosomes, gender, percent Present calls, and representation in the Reference set. All replicates (indicated by a, b or c) were at the sample preparation level, except for the Normal Blood DNA samples which were at the array hybridization level.  Normal_blood_DNA F