Genes with relevance for early to late progression of colon carcinoma based on combined genomic and transcriptomic information from the same patients.

BACKGROUND
Genetic and epigenetic alterations in colorectal cancer are numerous. However, it is difficult to judge whether such changes are primary or secondary to the appearance and progression of tumors. Therefore, the aim of the present study was to identify altered DNA regions with significant covariation to transcription alterations along colon cancer progression.


METHODS
Tumor and normal colon tissue were obtained at primary operations from 24 patients selected by chance. DNA, RNA and microRNAs were extracted from the same biopsy material in all individuals and analyzed by oligo-nucleotide array-based comparative genomic hybridization (CGH), mRNA- and microRNA oligo-arrays. Statistical analyses were performed to assess statistical interactions (correlations, co-variations) between DNA copy number changes and significant alterations in gene and microRNA expression using appropriate parametric and non-parametric statistics.


RESULTS
Main DNA alterations were located on chromosome 7, 8, 13 and 20. Tumor DNA copy number gain increased with tumor progression, significantly related to increased gene expression. Copy number loss was not observed in Dukes A tumors. There was no significant relationship between expressed genes and tumor progression across Dukes A-D tumors; and no relationship between tumor stage and the number of microRNAs with significantly altered expression. Interaction analyses identified overall 41 genes, which discriminated early Dukes A plus B tumors from late Dukes C plus D tumor; 28 of these genes remained with correlations between genomic and transcriptomic alterations in Dukes C plus D tumors and 17 in Dukes D. One microRNA (microR-663) showed interactions with DNA alterations in all Dukes A-D tumors.


CONCLUSIONS
Our modeling confirms that colon cancer progression is related to genomic instability and altered gene expression. However, early invasive tumor growth seemed rather related to transcriptomic alterations, where changes in microRNA may be an early phenomenon, and less to DNA copy number changes.


Introduction
Overwhelming amount of information is appearing in the literature on genetic alterations associated to invasive colorectal cancer. 1 It is so far unclear to what extent such findings are primary causes of neoplastic transformation and tumor progression or may rather represent events secondary to genetic instability. Unfortunately, it would require hundred thousands of patients with defined colon cancer disease and controlled follow up to discriminate and validate both genetic and epigenetic information by traditional multivariate analyses. [2][3][4][5] This fact became evident to us in previous evaluations of results based on genome wide DNA alterations in progressive colon cancer based on BAC CGH analyses in patients with different survival, 6 as also emphasized by others. 7 Thus, it seems practically impossible to rank appearing DNA sequence alterations in relationship to progressive disease and clinical outcome, accounting for defined and undefined standard elements including epigenetics. 3,4 A major part of correlates and relationships may after all only represent indirect or secondary phenomena to underlying critical cellular events despite sufficient statistical power or information on complete genome wide alterations. 8,9 Therefore, simplistic models are required as alternatives to traditional statistics in order to efficiently screen for and suggest candidate DNA regions of primary importance for appearing invasive growth and subsequent progression of colorectal cancer. In line with this speculation we found it interesting to relate significant DNA copy number changes to either significantly changed gene expressions or posttranscriptional control of RNA in tumor biopsies from colon cancer; all processed from the same patients. The present study provides such in silico analyses on well defined and quality controlled tumor material from selected patients with colorectal cancer of Dukes A, B, C and D tumor stage as surrogate markers for clinical outcome, in order to filter genes within regions with copy number gain and loss by statistical modeling in limited number of patients. 1

Patients and clinical details
Intentionally, the patient material comprised a limited number of patients (n = 24) operated on for primary colon carcinoma at Uddevalla Hospital, Sweden between 2001-2003 (Table 1). These patients were selected by chance from a cohort of 486 consecutive patients with colorectal cancer to represent 6 patients, with tumor stage Dukes A, B, C and D, respectively. (Modified Dukes A-D stages correspond to TNM I-IV in present histopathological evaluations). Dukes D tumors were all diagnosed at operations and subsequent histopathological staging. Patient selection was also dependent on the presence of a particular surgeon, patient acceptance to take part in the study, quality control of tissue extracted RNA and the absence of any pharmacological preoperative treatment deemed of importance for the investigation. Thus, none of the patients had experienced any additional specific treatment beside surgery at the time of operation. Patients with rectal and very low sigmoidal tumors were not considered for inclusion. There was no overall difference between the patients when grouped according to Dukes A, B, C and D stages, considering gender and 2 set. All labeled samples were checked by NanoDrop spectrophotometry prior to hybridization and arrays were scanned (Agilent scanner G2565 AA, Agilent Technologies). Analyses of scanned images from CGH two-color oligonucleotide arrays were performed in Feature Extraction 9.1.3.1 (Agilent Technologies). Feature Extraction result files were imported into the statistical language R 2.7.2 10 where both channels were normalized using median normalization implemented in the Bioconductor package 11 LIMMA. The technical replicates were averaged and then segmented by DNA copy package using the CBS algorithm with default parameter values. 12 Minimal common regions (MCR, defined in) 13 between the different Dukes types were identified using the cghMCR package. 13 Briefly, gained and lost regions were defined as segment of contiguous probes that showed log 2 values above or below a cut-off level, defined as one standard deviation of the probe variation calculated from all of the arrays. The cut-off values for both gained and lost segments were estimated to 0.1 (log 2 ), which corresponded approximately to the 20th and 80th percentiles of the segment alteration values respectively. 12

mrna expression analysis
Total RNA from tumor and normal tissue was separately pooled as described for CGH analyses; 200 ng of pooled total RNA was labeled with Agilent Two-Color RNA Spike-In Kit (Agilent Technologies), linearly amplified and synthesized to cRNA. Labeled products were checked in a NanoDrop and further hybridized in competition to Agilents Whole Human Genome Oligo Microarrays (Design 014850) with Gene Expression Hybridization Kit (Agilent Technologies). Arrays were washed with Gene Expression Wash Buffer Kit (Agilent Technologies) and scanned (Agilent scanner, Agilent Technologies).
Analyses of scanned images from two-color mRNA expression were performed in Feature Extraction 9.1.3.1 (Agilent Technologies). Feature Extraction result files were imported into the statistical language R 2.7.2 where replicated probes were averaged. 10 Each array was then normalized using Lowess normalization implemented in the Bioconductor package LIMMA. 11,14 A moderated t-statistic, based on an empirical Bayes model were calculated for each gene and the corresponding p-value was adjusted tumor location (Table 1), but Dukes D patients were younger as also observed in the entire cohort of 486 patients (p , 0.05). Six patients for each Dukes group were finally available according to above mentioned criteria considering a comparatively even distribution of patient characteristics and disease stage.

Tissue samples and extraction of Dna and rna
Biopsies from primary tumors and normal colon tissue were collected from each patient at operation, snap frozen in liquid nitrogen and stored at -80°C. Tissue biopsies were crusched in a mortar and two aliquotes of powdered tissue were used for DNA and total RNA extraction respectively. Genomic DNA and total RNA were from the same tissue source in each patient. DNA was extracted with QIamp DNA mini kit (Qiagen) according to instructions and total RNA was extracted with mirVana total RNA isolation kit (Ambion/Applied Biosystems). All material was quantified by NanoDrop ND-1000 spectrophotometry (NanoDrop Technologies) and total RNA samples were run in Bioanalyzer (Agilent Technologies) to confirm appropriate quality. mRNA expression arrays and DNA on oligo CGH arrays were run in triplicate. MicroRNA expression arrays were run in duplicate (167 or 307 ng DNA depending on array format, 33 ng RNA and 20 ng microRNA were used from each patient). Tumor tissue comprised around 80% malignant cells. 6

Cgh analysis
Genomic DNA from tumor and normal colon tissue from the 24 patients was separately pooled for analyses with 6 patients in each group according to Dukes A-D. Hybridization of tumor versus normal colon DNA was performed in competition to either 44 15,16 Absolute log foldchange of 1 and FDR of 0.05 were used as cut-off for subsequent analyses. Trends in mRNA expression according to the Dukes types were tested with linear regression within the empirical Bayes model. 15

microrna expression analysis
Total RNA from tumor and normal colon tissue was separately pooled as described; 120 ng of pooled total RNA was labeled with Agilent Cyanine 3-pCp reagent for direct labeling by Agilent microRNA Labeling Reagent and Hybridization Kit (Agilent Technologies). Labeled products were hybridized to Agilent Human microRNA single color microarrays (G4470A, Agilent Technologies, with 470 human, 64 viral probes), washed and scanned on an Agilent scanner. Analyses of scanned images from singlecolor microRNA expression were performed in Feature Extraction 9.5 (Agilent Technologies). The one-channel Feature Extraction 9.5 result files were imported into R. Identical probes were averaged and the data normalized using quantile-quantile normalization implemented in the Bioconductor R-package Affy. 17 As for the mRNA expression data, a moderated t-statistic was calculated for each microRNA as well as a p-value and the FDR. Cut-off values used in subsequent analyses were an absolute log fold-change of 0.5 and an FDR of 0.05.

statistics and mathematical interactions
Group analyses were performed by t-testing or ANOVA and frequency analysis by χ 2 . Statistical interaction analyses (correlations, co-variations, significant alterations) were based on DNA segments with copy number changes and significant alterations in expression of defined transcripts. Statistical interactions between altered DNA sequences and mRNA/ microRNA expression for a specific region were calculated as follow: First, probes from the microarray were mapped to NCBI Entrez (build 18) genes within the region. The proportion of differentially expressed genes was compared to the entire genome and enrichment was then tested using Fisher's exact test. The test of interactions was performed for all significant DNA alterations over the entire genome, each chromosome and each aberrant segment according to CGH analysis. Significant correlations between DNA events present in Dukes A plus B tumors versus findings in Dukes C plus D tumors in combination with altered expressions were regarded candidate DNA sequences, that may explain tumor progression. p , 0.05 was regarded statistically significant in twotailed tests.

Dna alterations
Tumor tissue vs. normal colon tissue Significant tumor DNA copy number changes increased with tumor progression defined as early (Dukes A plus B) versus late tumors (Dukes C plus D) (Fig. 1, Fig. 2, Table 2). Dukes A, B, C, and D tumors displayed DNA alterations in 4%, 4%, 21% and 16% respectively of the entire genome compared to normal colon tissue (p , 0.05) ( Table 2). Four chromosomes displayed alterations in Dukes A, 6 in Dukes B, 15 in Dukes C and 14 in Dukes D (X and Y excluded). Copy number loss was not observed in Dukes A. Early stage DNA alterations were gain on chromosome 7p, 13q, 20p/q and loss on 18q. Late stage alterations were gain on 7p/q, 8q, 13q, 20p/q and loss on 8p, 17p/q, 18p/q and 21q.
Chromosomes 1-11, 13-18 and 20-21 showed 102 Minimal Common Regions (MCRs) in Dukes A, B, C and D tumors; 78% represented gains and 22% lost regions (not shown). These aberrations equalized 30% of the entire genome (X and Y chromosomes excluded);. 14% of aberrant bases covered by MCR regions were altered in at least 3 out of 4 Dukes groups when analyzed in iterated combinations (ABCD, ABC, ACD, or BCD). These alterations were mainly located on chromosomes 7, 13, 18 and 20. Chromosomes 13 (1 Mb) and 20 (41 Mb) showed gains in all Dukes A-D tumors; 55% of MCRs were found in Dukes A and B tumors and may be considered most relevant for carcinogenesis and early tumor progression. Overall 75% of the MCRs were found in Dukes C and D tumors (not shown).

mrna expression
Tumor tissue vs. normal colon tissue Distribution of genes with altered expression among Dukes A-D tumors is summarized in Figure 3b and Table 3. There was no significant relationship between the number of expressed genes and tumor progression (Fig. 3b). Six, 8, 8 and 6 percent of all genes showed sig-of genes with increased expression in Dukes D tumors compared to Dukes A-C tumors (p , 0.01).

microrna expression
Tumor tissue vs. normal colon tissue There was no relationship between tumor stage and the number of differentially expressed microRNAs (Fig. 3c). Dukes A, B, C and D tumors showed 17%, 21%, 18% and 15% respectively of microRNAs with altered expression (FC . 0.5, FDR , 0.05) compared to normal colon tissue (Table 4). 173 microRNAs showed significantly altered expression in one or several combinations of Dukes stages and 55 microRNAs were altered in all Dukes groups located on chromosomes 1-9, 11, 13, 17-20 and 22. Six microRNAs showed significant changes in expression between Dukes A plus B vs. Dukes C plus D stages (Table 5).

Combined statistical analyses of Dna and rna alterations genome-wide interactions
Each Dukes tumor stage showed some genome wide statistical interactions between structural and transcriptional alterations (Fig. 3b), but only Dukes C and D tumors showed interactions accounting for DNA alterations that discriminated significantly between early (A plus B) and late (C plus D) tumors (Table 6). Altogether, 29% (6498/22094) of all genes had significant copy number changes or showed significantly altered expression in one or several combinations in Dukes A, B, C and D tumors. 1231 of these genes (19%, 1231/6498) showed chromosomal alterations in all four Dukes A-D stages and 406 genes (6%, 406/6498) showed combined interactions in the same direction (i.e. gain and upregulation or loss and downregulation).     Gained or lost bases (kb) per chromosome among Dukes tumor stages were detected by DNA copy segment algorithm. Significant thresholds were specified by the 80th and 20th percentile respectively.

segmental interactions
The number of significant segmental interactions increased with tumor progression as illustrated for chromosome 8 (Fig. 2). Dukes A comprised 3 segments (66 Mb), Dukes B 3 (23 Mb), Dukes C 5 (358 Mb) and Dukes D 7 segments (244 Mb) with interactions between DNA and RNA. Three segments on chromosomes 8p and 18q showed interactions between DNA segments with loss and downregulation of expression. Eight regions at chromosome 7p/q, 8q, 13q and 20p/q showed interactions between DNA segments with copy number gain and upregulation.
genes assumed important for carcinogenesis and tumor progression Sixteen genes with significant mathematical interaction and upregulation were found in all Dukes tumors and were all located on chromosome 20. The DNA segment covered 40 Mb on chromosome 20p11.21-20q13.33. These genes represented 0.2% of the total number of structurally altered genes on all chromosomes and may be relevant for the appearance of malignancy. Genome wide DNA segment alterations with mathematical interaction to gene expression contained all together 41 genes with significantly altered expression in a manner that statistically discriminated between early (Dukes A plus B) versus late (Dukes C plus D) tumors (not shown); 28 of these genes were expressed in Dukes C plus D tumors and 17 in Dukes D tumors and may thus be relevant for tumor progression ( Table 6). Ten of these genes (WDR67, RFXAP, RP11-50D16.3, CAB39L, THSD1, SPRY2, TGDS, CLDN10, SLC10A2, CD33L3) have been reported changed in tumor tissues, while only 2 (RP11-50D16.3, SLC10A2) have been reported to appear changed in colorectal cancer.

Discussion
Technology progress in cancer research has been extraordinary with generation of enormous amounts of information particularly related to genomic and epigenetic alterations. Therefore, it appears more or less unlikely that it is possible to describe isolated and well defined causes behind appearance of malignant transformation or progression of cancer. It is easily recognized that combined alterations in gene structure, expression and processing of genetic information and epigenetic control of regulatory elements, may represent an infinite number of alterations in ranking critical events related to clinical outcome. Therefore, in the present study we used surrogate markers for outcome such as well established Dukes tumor stage classification of colon carcinoma in purposely a small group of individuals selected by chance as applied by others, 18 since the relationship between Dukes stage and survival is well established worldwide. We combined DNA, RNA and microRNA arrays to identify tumor specific DNA copy number changes in relationship to early (Dukes A plus B) and late (Dukes C plus D) tumors. Tumor material and normal mucosa were all taken from the same individuals and genomic DNA and total RNA were processed from the same piece of tissue specimens. Statistical interaction analyses were based on DNA segments defined aberrant by DNA copy algorithm with subsequent determination of correlations to defined genes or transcripts with either significantly altered expression or content of tissue mRNA or microRNA. Pooled patient materials were intentionally used to stabilize for inter specimens variation, which enhances specificity but limits sensitivity in testing.
DNA sequence alterations in general and in early and late tumor stages agreed with our previous findings, where we used tiling BAC arrays to sub classify DNA sequence alterations in patients selected according to long and short term survival. 6 Frequent early stage DNA changes included gains on  68  61  49  1301  15  32  39  47  32  67  77  66  55  1273  16  38  69  51  34  89  75  88  62  1706  17  41  65  57  33  91  107  114  92  2226  18  8  13  20  7  32  43  33  33  665  19  35  66  57  63  91  105  98  61  2338  20  58  118  73  58  34  42  32  28  1055  21  10  17  9  12  16  16  22  9  482  22  15  18  14  13  58  68  55  47  929   TOT  857  1322  1349  1130  1586  1966  1701  1216  38896 Transcription was considered significantly altered with log fold change .1 and adjusted p-value (FDr) ,0.05 in total rna from tumor tissue versus normal colon tissue.   3  4  5  3  10  16  6  8  69  18  0  0  0  0  3  3  3  3  8  19  3  0  7  3  8  9  4  6  170  20  3  3  1  3  0  0  0  0  23  21  2  3  2  0  3  3  0  5  12  22  3  3  2  3  0  0  0  0  24   TOT  87  115  99  71  107  123  102  101  1125 chromosome 20 and parts of chromosomes 7p and 13q and loss in parts of chromosome 18q, while late tumor stages included gains of 7p, 7q, 8q, 13q and loss of 8p, 18p and 21q, suggesting great complexity within specific chromosomes as reported by others. 1 Structural DNA and RNA alterations, interacting statistically significantly, increased from early to late tumor stages at both chromosomal, sub-chromosomal and gene levels. Also, interactions between DNA and microRNA increased significantly at gene levels in a similar way across Dukes A-D tumors. Chromosome 20 showed interaction between DNA and RNA in all Dukes A, B, C and D stages with MCR across all tumor stages. Thus, 40% of the aberrant bases in 3 out of 4 Dukes groups were located on chromosome 20, which makes it likely related to carcinogenesis and early invasiveness. 19 DNA alterations on chromosome 20 have been reported by others indicating correlations between gains and transition from colon adenoma to carcinoma. [20][21][22][23] Among altered genes on chromosome 20 in the present study were AURKA and CSE1L, which were also reported by others related to colorectal cancer. 23,24 Thus, our results and conclusions agreed with findings reported by others based on genomic and transcriptomic information from different sources and patients, 18 when our computations were performed on specific chromosomes. However, a different pattern appeared when early versus late tumor stages were used as covariates; then it appeared that chromosomes 13 and 18 were most important for transcriptional alterations due to changed DNA. Copy number changes in DNA may reflect a natural adaptation of DNA to altered environmental conditions. This phenomenon may represent selections in development of life based on genetic recombination. Thus, cellular DNA that contains polymorphic regions may or may not represent future blue prints for improved functions. Based on this implication, it is not easy to judge what appearance of altered DNA sequences really imply in cells overriding contact inhibition and normal growth control including attenuated apoptosis. Such altered DNA structures may either represent appearing suitable adaptations to withstand hypoxia and other challenges; or it may only be a result of by chance events leading to further compromised cell function and growth control. A third explanation may be appearance of significant alterations without any impact at all on cell function; i.e. cells can continue to accumulate aberrant DNA as long as it does not compromise cell survival. However, DNA alterations important for carcinogenesis should be present in all subsequent tumor stages or tumor cell clones as long as the malignant cell remains. Late appearing DNA alterations may thus either imply changes determining tumor progression or simply that such changes are not destabilizing the genome too much. Therefore, a simplistic interpretation of our model approach was to discriminate and correlate early and late DNA copy number changes to statistically significant alterations in gene expression. This approach should exclude most structural DNA changes that are not translated into functional dynamics. Therefore, candidate DNA regions with interactions should contain a majority of copy number changes that could potentially influence on defined cellular functions by splicing and either increased or decreased translation. However, this simplistic approach would not identify DNA alterations that are related to as yet unconfirmed changes in gene expression. With this perspective it was also interesting to evaluate significant statistical interactions between microRNA and DNA copy number change, which may identify important interactions based on more recent dimensions of gene expression.
Genome-wide chromosomal copy number gain represented the only structural change that alone predicted progressive malignancy. Three genes showed inverse relationships between DNA structure and expression; i.e. gain and downregulation or loss and upregulation; a kind of combined alterations that make them less likely as functional adaptations. Also, we observed that altered expression in early stage tumors could disappear in later tumor stages, probably as a consequence of DNA loss. A majority of 28 genes with altered expression in Dukes C and D appeared to code for proteins in translation-and transcription control, cell transporting, membrane protein interactions and posttranslational modifications, although some genes had more or less unknown functions. Differentially expressed genes in Dukes A and B tumors did not correlate to confirmed aberrant DNA copy numbers (Table 6). A large proportion of genes with significantly altered expression and DNA interactions mapped to chromosome 13 (17/28), but 35% of these genes had unknown function. Our results indicated a clear-cut relationship between increasing number of combined genetic events (DNA and RNA or DNA and microRNA) and late Dukes stage, when we used a relative wide selection criteria for microRNA (FC , 0.5). However, as few as 6 microRNAs (including microR-602) were altered to discriminate between Dukes A and B versus Dukes C and D respectively. Only 4 microRNA genes were altered in Dukes A and B but not in C and D indicating few differences in microRNA between early and late tumor progression, although it has been reported that microR-602 and microR-373 may impact on systemic tumor spread. Accordingly, microR-373 was recently suggested a promoter of metastasis in breast cancer cells, 25,26 now with similar indication in colon cancer. Upregulation of microR-21 was reported to correlate to poor outcome in colorectal cancer patients, 27 but we did not find such implication in our present analysis accounting for tumor stage (Dukes A to D). Indeed, a lot of clinical and prognostic information appears to be confined to altered microRNAs in colon cancer, 28 but such alterations seemed indirectly less related to DNA copy number changes since similar findings occurred in embryonic and transformed cells. 29 Such observations agree with findings appearing in our present modeling. Only one of these six microRNAs (microR-486) was found to have a predicted target gene (CLDN10, Table 6, TargetScanHuman, the microR-Ontology Database), when search was performed among top 100 predicted target genes.
Our observations on deregulated expressed microR-NAs agreed to 70%-80% with selected sets of microRNAs from published reports. [30][31][32][33] In conclusion, our present and previous observations indicate thousands of aberrant DNA copy numbers in genome wide analysis on colon cancer as expected. 6,7,34 These numerous altered segments with potential importance for tumor progression were filtered by means of mathematical interaction analysis to a final group of 17 candidate genes (Dukes D) with hypothetical relevance for tumor progression. Our modeling supports that colon cancer progression is related to genomic instability accompanied by altered gene expression. However, new information is that carcinogenesis and early appearance of invasive tumor growth may rather be related to functional genomic alterations and less to DNA copy number changes. Our model may be a tool to accept or reject structural and functional genetic alterations in appearance and progression of colorectal cancer in small groups of patients.