Comprehensive Analysis of lncRNA–miRNA– mRNA Network Ascertains Prognostic Factors in Patients with Colon Cancer

Background: Non-coding RNAs are competing endogenous RNAs in the occurrence and development of tumorigenesis; numerous microRNAs are aberrantly expressed in colon cancer tissues and play significant roles in oncogenesis development and metastasis. However, large clinical and RNA data are lacking to further confirm the exact role of these RNAs in tumors. This study aimed to ascertain differential RNA expression between colon cancer and normal colon tissues. Materials and Methods: RNA sequencing and clinical data of patients with colon cancer were procured from The Cancer Genome Atlas database; differentially expressed long non-coding RNA, differentially expressed messenger RNAs, and differentially expressed microRNAs were achieved using the limma package in edgeR to generate competing endogenous RNAs networks. Then, Gene Ontology and Kyoto Encyclopedia of Genes and Genomes enrichment analysis were conducted with ggplot2 package, the Kaplan-Meier survival method was used to predict survival in patients with colon cancer. Results: In total, 1174 differentially expressed long non-coding RNAs, 2068 differentially expressed messenger RNAs, and 239 differentially expressed microRNAs were generated between 480 colon cancer and 41 normal colon tissue samples. Three competing endogenous RNA networks were established. Gene Ontology analysis indicated that the genes of the up-regulated microRNA network were involved in negative regulation of transcription, DNA-template, and those of down-regulated microRNA network were involved in transforming growth factor β receptor signaling pathways, response to hypoxia, cell migration, while Kyoto Encyclopedia of Genes and Genomes analyses of these networks turned out to be negative. Three long non-coding RNAs (AP004609.1, ARHGEF26-AS1, and LINC00491), 3 microRNAs (miRNA-141, miRNA-216a, and miRNA-193b) and 3 RNAs (ULBP2, PHLPP2, and TPM2) were detected to be associated with prognosis by the Kaplan-Meier survival analysis. Additionally, univariate and multivariate Cox regression analyses showed that the microRNA-216a of the competing endogenous RNA might be an independent prognostic factor in colon cancer. Conclusions: This study constructed the non-coding RNA-related competing endogenous RNA networks in colon cancer and sheds lights on underlying biomarkers for colon cancer cohorts.


Introduction
Colorectal cancer has become one of the most common diseases over the past years. Approximately 1.2 million patients worldwide are diagnosed with colorectal cancer every year, while more than 600 000 patients directly or indirectly die from colorectal cancer. 1 Its morbidity and mortality are very high in China and worldwide. The occurrence of colon cancer (CC) is highly heterogeneous, mostly due to the development of sporadic adenomas. [2][3][4] To be precise, about 2.6% to 5.6% of adenomas eventually develop into malignant tumors. The existing treatment methods for colorectal cancer include surgery, radiotherapy, and chemotherapy, and the choice of treatment is related to the stage of the tumor. Postoperative adjuvant radiotherapy and chemotherapy have been the standard treatment for patients with stage III postoperative treatment, and many studies have confirmed that patients with stage II postoperative treatment are in risk of recurrence and metastasis. 5,6 Approximately half of the patients who require adjuvant therapy eventually die of metastases from colorectal cancer, and metastasis is a major cause of treatment failure.
The competing endogenous RNA (ceRNA) hypothesis came up as a new type of regulatory mechanism between non-coding RNA (ncRNA) and coding messenger RNAs (mRNAs). 7 Long non-coding RNAs (lncRNAs), over 200 nucleotides long, are reported to participate in transcriptional and posttranscriptional management. MicroRNAs (miRNAs), 8 with aberrant expression in various tumors, modulate the posttranscriptional RNAs. However, existing studies on the role of miRNAs in CC are controversial, which may be attributed to the small sample size used in these studies. The Cancer Genome Atlas (TCGA) database contains large-sample data, the results of which are true and reliable, and these results are open to all researchers. 9 This study aimed to generate the differentially expressed miRNAs (DEmiRNAs) using the miRNA data downloaded from the TCGA database. In addition, we determined the relationship between DEmiRNAs and prognosis of patients with CC.

Materials and Methods
RNA sequencing raw data were downloaded from the TCGA database (https://portal.gdc.cancer.gov/, version 10.1), and 521 individuals with CC were eligible for inclusion. The RNA data of 480 CC and 41 para-carcinoma tissues and clinical data were also acquired from the TCGA database. All data were acquired from the TCGA database. Herein, ethical committee assessment is not required for the study. The clinical information for patients with CC is listed in Table 1.

Data Processing and Differential Expression Analysis
The raw RNA sequencing (lncRNA, miRNA, and mRNA) data were corrected, normalized, and its expression calculated. The edgeR package in R (version 3.4.4) statistical software program, on Bioconductor (http://www.bioconductor.org/) was used to identify the differentially expressed mRNAs, lncRNAs and miRNAs between the CC and adjacent-normal colon tissues, and |log2FC| ! 2 and P < .01 were considered statistically significant. Volcano plots were visualized using the ggplot2 packages in R.

Construction of the ceRNA Network
The ceRNA network was constructed according to the hypothesis that lncRNAs modulate the activity of mRNAs through being miRNA sponges. On the basis of these theories, the miRcode online tool (http://www.mircode.org) was applied to detect the relation between lncRNAs and miRNAs. At the same time, the MiRDB (http://www.mirdb.org/), picTarBase (https:// pictar.mdc.org), and Targetscan (http://www.targetscan.org//) programs were used to predict the target mRNAs of these miRNAs. Finally, lncRNAs, the miRNAs regulated by the lncRNAs, as well as mRNAs, were included to establish the ceRNA network visualized by Cytoscape (version 3.5.2).

Functional Enrichment Analysis
Gene Ontology (GO) function analyses Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were therefore conducted to detect the function of the differentially expressed messenger RNAs (DEmRNAs) in the ceRNA networks with the R clusterProfiler package. Fisher test was used to generate the notable terms, and P < .01 was considered to be statistically significant.

Prediction Model of the CC Prognostic Factors
The Kaplan-Meier (KM) method and log-rank test were applied to detect the relationship among the DEmRNAs, differentially expressed long non-coding RNAs (DElncRNAs) and DEmiRNAs of the ceRNA network and the overall survival (OS) curves of patients with CC were composed meeting the following requirement (P < .05). Subsequently, the univariate Cox model and multivariate Cox hazards regression model were imposed to generate the independent prognostic factors among patients with CC.

Data Processing and Identification of DEmRNAs, DElncRNAs, and DEmiRNAs
DEmRNAs, DElncRNAs, and DEmiRNAs between CC and adjoining normal colon tissues were determined, with P < .01 and |logFC| > 2 as the cut-off criteria. In total, 2067 DEmRNAs containing 1125 upregulated and 942 downregulated, 1174 DElncRNAs containing 933 upregulated, and 241 downregulated, and 239 DEmiRNAs containing 168 upregulated and 71 downregulated were generated between CC and normal mucosa tissues. The volcanic diagrams of DElncRNA, DEmiRNA, and DEmRNA distribution were generated by R. The images are presented in Figure 1.

Competing Endogenous RNA Network Construction and Function Analysis
The DElncRNAs interacting with DEmiRNA, DEmRNA targeted by DEmiRNA screened using the online database, and DEmiRNA were used to construct the ceRNA network. The relationships between the 1174 DElncRNAs and the 239 DEmiRNAs were firstly assessed, and the functional analysis of these mRNAs was shown in Figure S1, these genes mainly participate in neuroactive ligand-receptor interaction, bile secretion, fat digestion and absorption, as listed in Table S3. The miRcode tool was used to detect the potential relationship, and 29 CC-specific DEmiRNAs that constructively target 131 CCspecific DElncRNAs were then generated. The MiRDB, miR-TarBase and Targetscan projects were then used to determine the relationship between the 239 DEmiRNAs and the 2067 DEmR-NAs, and predict the mRNA targets of miRNAs. The ceRNA network comprising results of 131 DElncRNAs, 22 CC-specific miRNAs and 54 CC-specific mRNAs was established ( Figure  2). On the basis of up-regulated miRNAs are accompanied by down-regulated lncRNAs and mRNAs, 10 the other 2 ceRNA networks containing upregulated 24 miRNAs, 22 downregulated lncRNAs and 6 mRNAs( Figure 3A) and 5 downregulated miR-NAs, 68 lncRNAs, 6 mRNAs were also established and visualized using Cytoscape 3.5.2 ( Figure 4A). Additionally, we performed GO analysis and KEGG pathway enrichment analysis on the target mRNAs of miRNA in the ceRNA networks and selected meaningful GO entries but none meaningful pathway terms. The results of GO analysis are shown in Figure 2B, Figure 3B and Figure 4B. The KEGG pathway analysis revealed that mRNAs of the 3 ceRNA networks were not significantly enriched. To further enhance the bioinformatics analysis reliability, the overlapping genes were analyzed by the Discovery bioinformatics tool (https://david.ncifcrf.gov/). Discovery bioinformatics tool is a web-based online bioinformatics resource that aims to provide a comprehensive set of functional annotation tools for the researchers to understand the biological mechanisms associated with large lists of genes/proteins. Gene Ontology analyses were then performed for the target genes. The P value <.01 and gene count !3 were set as the cut-off criteria. The genes of the up-regulated miRNA network participated in negative regulation of transcription, DNA-template, and those of downregulated ones were involved in transforming growth factor Figure 1. Volcano plots showing the differential expression of RNAs (lncRNAs, miRNAs, and mRNAs) in colon cancer (CC), which were drawn using the R packages ggplot2. X axis reveals log transformed false discovery rate (FDR) values and Y axis implies the mean expression differences of lncRNAs, miRNAs, and mRNAs between 2 groups (|log2FC| ! 2 and P < .01).
b receptor signaling pathways, response to hypoxia, cell migration and positive regulation of transcription from RNA polymerase II promoter, while KEGG pathway enrichment analyses of these networks turned out to be negative. Herein, these GO terms are associated with CC pathogenesis and prognosis.

Correlations Between CC-Specific Differentially Expressed RNAs and OS
The ceRNA networks are receiving increasing attention, and the function of the differentially expressed RNAs (DERNAs) in the network is detected by the GO analysis; thus, it is suggested  that these DERNAs play an important role in the invasion and metastasis of CC. On the basis of these observations, we analyzed the connections between DERNAs and OS of CC by KM analysis. In total, 3 DElncRNAs (ARHGEF26-AS1, AP004609.1, and LINC00491; Figure 5A-C), 3 DEmiRNAs (hsa-miR-141, hsa-miR-216a, and hsa-miR-193b; Figure 5D-F), and 3 DEmRNA (TPM2, FJX1, and ULBP2; Figure 5G-I) were found to be related to OS (P < .05).
To further determine the relationship between these DERNAs and OS, all the above DERNAs were then fitted into the multivariate Cox regression model, which indicated that only microRNA-216a had a significant prognostic value in CC (Tables  2-4), and then a prognostic model containing miR-216a was established. A risk score analysis of the RNA was performed for each patient, and based on the risk scores, the patients were assigned into the "low risk" and "high risk" groups. It turns out that the OS of the high-risk group is worse than that of the low-risk group ( Figure S2a). The 5-year survival correlation of the miR216a signature was analyzed using a receiver operating characteristic curve (ROC) curve ( Figure S2b); at the same time, the area under the curve of the miRNA-216a signature was 0.756. Taken together, the findings of this study indicate that miRNA-216a may be an independent prognostic marker in CC.

Discussion
Colon cancer is one of the common malignant tumors of the digestive tract, and its high prevalence rate has been a major problem in developed countries in Europe and the United States. 11,12 Increasing studies confirmed that ncRNAs have a definite regulatory role in the occurrence of CC5. These ncRNAs are tightly involved in the occurrence, metastasis of tumors, and drug resistance. 13 Moreover, many studies have demonstrated that miRNAs play an important role in post-transcriptional modification of some mRNAs, which then regulate the function of key molecules, thereby affecting cell behaviors. 14,15 Along with the development of science and technology, tens of thousands of genes have been reported to associate with the invasion and metastasis of tumors.
However, these genes cannot be the key target molecules for tumor treatment, because they do not exist independently but are present in a common environment where there must be interactions between them. [16][17][18][19] This study aimed to discover the RNAs participating in the occurrence and development of CC, further affecting the prognosis of patients with CC by analyzing the ceRNA networks, and then further evaluate the prognosis of RNAs by a Cox regression analysis in order to determine exactly the molecules that would act as a guide for clinical treatment. 20,21 In the present study, we analyzed high-throughput data, and identified that 2 upregulated miRNAs (miR-141, miR-216a, and miR-193b), 3 lncRNAs (ARHGEF26-AS1, AP004609.1, and LINC00491), and 3 mRNAs (TPM2, FJX1, and ULBP2) were associated with the clinical outcome of patients with CC. Among these prognostic genes, ULBP2 is a cell-surface glycoprotein located on human chromosome 6. 22 The structure of this molecule is different from class major histocompatibility complex (MHC)class I molecules, which does not contain domain a3, but only a1,2, and is combined with the cell membrane by glycophosphatidylinositol (GPI). Moreover, it can also serve as a ligand of NKG2D, which can activate the human immune system to kill tumors and avoid damaging the normal tissues of the body. 23 PHLPP2, a kind of phosphatases rich in leucines, is a member of protein phosphatase PHLPP and an important regulator of protein kinase B (AKT), which can dephosphorylate AKT and thus inhibit tumor development. Additionally, 24 the heterozygous loss of PHLPP2 has been reported to be associated with reversing the development of breast cancer, liver cancer, and ovarian cancer. 25,26 TPM2, one of the tropomyosin proteins, participates in the regulation of muscle contraction by forming complexes with actin and troponin, and is involved in cellular biological activities (cell invasion, etc). 27 In addition, more than 40 subtypes of this type of tropomyosin are derived from the 4 categories (TPM1-4). Tm1 subtype is one of the 2 mutants of TPM2 gene, and studies have confirmed that Tm1 is low expressed in prostate cancer, 28 yet the exact molecular function of the genes remains unknown. Existing studies have confirmed that the occurrence, development, and metastasis of CC are closely related to the PI3K/AKT and Wnt signaling pathways, among others. DERNA GO analysis in the ceRNA networks of this study indicated that these DEmRNAs are mainly involved in transcription factor regulation, angiogenesis, and epithelial mesenchymal transformation, and these biological activities are closely related to the invasion and metastasis of CC, which provides a new perspective for clinical treatment of CC.
LncRNA-miRNA-mRNA ceRNA networks have been reported in various tumors 29 , such as renal clear cell carcinoma, 30 prostate cancer, 31 endometrial cancer 32 and so on. In some of these studies, the RNA-sequence data were obtained from in vitro models. Other studies constructed ceRNA network using TCGA database, 30 then confirm key markers through KEGG pathway enrichment and GO analysis, while no further survival analysis was performed in most of the researches. 31 We not only established the ceRNA networks, but found prognostic factors based on it, which were further verified by univariate and multivariate regression models.
A multivariate Cox regression model confirmed miR-216a to be the independent prognostic factor among all differential RNAs, and risk values were established through multivariate analysis to divide clinical patients into high-risk groups and low-risk groups. A further ROC curve analysis also suggested that mir-216a was of great significance for the prognosis of CC.
In conclusion, we constructed 3 ceRNA networks and identified various RNAs as potential prognostic predictors for patients with CC. Nevertheless, further in vitro and in vivo experiments are necessary to validate our findings, and further investigation into the functions is also required to explore the molecular mechanism of markers in CC progression.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.

Supplemental Material
Supplemental material for this article is available online.